Discovery and Ranking of Functional Dependencies

Show simple item record Wei, Z en Link, S en 2020-01-10T01:36:46Z en 2020-01-10T01:36:46Z en 2019 en
dc.identifier.citation CDMTCS Research Reports CDMTCS-531 (2019) en
dc.identifier.issn 1178-3540 en
dc.identifier.uri en
dc.description.abstract Computing the functional dependencies that hold on a given data set is one of the most important problems in data profiling. Our research advances state- of-the-art in various ways. Utilizing new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that outperforms the best algorithms in terms of efficiency, column-, and row-scalability. This is demonstrated on real-world benchmark data. We show that current outputs contain many redundant functional dependencies, but canonical covers greatly reduce output sizes. Smaller representations of outputs are easier to comprehend and use. We propose the number of redundant data values as a natural measure to rank the output of discovery algorithms. Our ranking assesses the relevance of functional dependencies for the given data set. en
dc.publisher Department of Computer Science, The University of Auckland, New Zealand en
dc.relation.ispartofseries CDMTCS Research Report Series en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri en
dc.source.uri en
dc.title Discovery and Ranking of Functional Dependencies en
dc.type Technical Report en
dc.subject.marsden Fields of Research en
dc.rights.holder Copyright: The author(s) en
dc.rights.accessrights en

Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record


Search ResearchSpace