Abstract:
SQL is the de-facto industry standard for data management. Besides relational
data, SQL has been extended to also manage object-relational and Web-based data.
It is likely that many instances of big data will also be managed by extensions of the
current SQL standard. We introduce the classes of keys and functional dependencies over possibilistic databases with duplicate and missing information. Our main
contribution is to equip SQL with reasoning capabilities about the semantics of big
data that may feature the volume, variety, and veracity dimensions. These capabilities are fundamental to reason about entity integrity and essential for database
design as functional dependencies are sources of data redundancy, and keys prevent
data redundancy. Since SQL controls the occurrences of missing information with
NOT NULL constraints, we also include possibilistic extensions of this constraint in
our investigation. We illustrate applications, and establish axiomatic, algorithmic,
and logical characterizations to the PTIME-complete implication problem associated with the combined class of these integrity constraints. Specifically, we show
that keys behave just like goal clauses and FDs just like definite clauses in Boolean
propositional Horn logic, and we can therefore apply linear resolution to reason
about them.