Abstract:
We establish a robust schema design framework for data with missing values.
The framework is based on the new notion of an embedded functional dependency,
which is independent of the interpretation of missing values, able to express completeness and integrity requirements on application data, and capable of capturing
many redundant data value occurrences. We establish axiomatic, algorithmic, and
logical foundations for reasoning about embedded functional dependencies. These
foundations allow us to establish generalizations of Boyce-Codd and Third normal
forms that do not permit any redundancy in any future application data, or minimize their redundancy across dependency-preserving decompositions, respectively.
We show how to transform any given schema into application schemata that meet
given completeness and integrity requirements and the conditions of the generalized
normal forms. Data over those application schemata are therefore t for purpose by
design. Extensive experiments with benchmark schemata and data illustrate our
framework, and the effectiveness and efficiency of our algorithms, but also provide
quantified insight into database schema design trade-offs.