Discovery and Ranking of Functional Dependencies

Wei, Z; Link, S

Discovery and Ranking of Functional Dependencies

Wei, Z ; Link, S

Identifier: http://hdl.handle.net/2292/49481

Issue Date: 2019

Reference: CDMTCS Research Reports CDMTCS-531 (2019)

Rights: Copyright: The author(s)

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Computing the functional dependencies that hold on a given data set is one of the most important problems in data profiling. Our research advances state- of-the-art in various ways. Utilizing new data structures and original techniques for the dynamic computation of stripped partitions, we devise a new hybridization strategy that outperforms the best algorithms in terms of efficiency, column-, and row-scalability. This is demonstrated on real-world benchmark data. We show that current outputs contain many redundant functional dependencies, but canonical covers greatly reduce output sizes. Smaller representations of outputs are easier to comprehend and use. We propose the number of redundant data values as a natural measure to rank the output of discovery algorithms. Our ranking assesses the relevance of functional dependencies for the given data set.

Show full item record

Files in this item

Name: 531.pdf

Size: 1.485Mb

Format: PDF

Description: Published version

This item appears in the following Collection(s)

CDMTCS Research Reports (1995+) [574]

Discovery and Ranking of Functional Dependencies

Discovery and Ranking of Functional Dependencies

Abstract:

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Discovery and Ranking of Functional Dependencies

Discovery and Ranking of Functional Dependencies

Abstract:

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics