Algorithms for the Discovery of Embedded Functional Dependencies

Wei, Z; Hartmann, S; Link, S

Algorithms for the Discovery of Embedded Functional Dependencies

Wei, Z ; Hartmann, S ; Link, S

Identifier: https://hdl.handle.net/2292/58005

Issue Date: 2020

Reference: CDMTCS Research Reports CDMTCS-542 (2020)

Rights: Copyright: The author(s)

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Embedded functional dependencies (eFDs) were recently introduced to tailor relational schema design to data completeness requirements of applications. They also facilitate data cleaning and data integration. A problem that is essential to unlocking these applications is the discovery of all eFDs that hold on a given data set. We show that the discovery problem of eFDs is NP-complete, W[2]-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Despite these computational challenges, we use novel data structures and search strategies to develop row-efficient, columnefficient, and hybrid algorithms that can efficiently solve the discovery problem for eFDs on large real-world benchmark data sets. Our experiments also demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. Finally, we demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for approximate and genuine functional dependencies.

Show full item record