Discovery Algorithms for Embedded Uniqueness Constraints

ResearchSpace/Manakin Repository

Show simple item record

dc.contributor.author Wei, Z en
dc.contributor.author Leck, U en
dc.contributor.author Link, S en
dc.date.accessioned 2019-01-09T02:23:05Z en
dc.date.available 2019-01-09T02:23:05Z en
dc.date.issued 2018 en
dc.identifier.citation CDMTCS Research Reports CDMTCS-524 (2018) en
dc.identifier.issn 1178-3540 en
dc.identifier.uri http://hdl.handle.net/2292/45070 en
dc.description.abstract Data profiling is an enabler for efficient data management and effective analytics. The discovery of data dependencies is at the core of data profiling. We conduct the first study on the discovery of embedded uniqueness constraints (eUCs), a recently introduced class of data dependencies that represent unique column combinations embedded in complete fragments of incomplete data. We show that the decision variant of finding a minimal eUC is NP-complete and W[2]-complete in the input size. We also characterize the maximum possible solution size, and show which families of eUCs attain that size. The size is much larger than for the special case of minimal SQL uniques. Despite these challenges, our column-efficient, rowefficient, and hybrid discovery algorithms perform effectively and fast on real-world benchmark and synthetic data. We also propose the computation of small semantic samples of given data sets as a new direction in data profiling. These samples satisfy the same eUCs as the given data set and we showcase how discovery and sampling together provide a pathway towards effective data cleansing and business rule acquisition. en
dc.publisher Department of Computer Science, The University of Auckland, New Zealand en
dc.relation.ispartofseries CDMTCS Research Report Series en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.source.uri https://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/index.php en
dc.title Discovery Algorithms for Embedded Uniqueness Constraints en
dc.type Technical Report en
dc.subject.marsden Fields of Research en
dc.rights.holder Copyright: The authors en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en


Full text options

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Advanced Search

Browse