Abstract:
In standard SQL database management systems primary key columns are NOT
NULL by default. While NULL columns may be included in unique constraints,
such constraints only ensure uniqueness for tuples which do not feature any null
marker occurrences in the columns involved, and do not fulfil the same function
as primary keys. In this work we investigate the notions of possible and certain
keys, which are intuitive and differ only in their treatment of null markers. It
turns out that possible keys capture the unique constraint of SQL, while certain
keys extend primary keys to include NULL columns, and can be used for similar
purposes. In addition to basic characterization, axiomatization, and simple discovery
approaches for possible and certain keys, we investigate the existence and
construction of Armstrong tables, extremal set problems, and describe an indexing
scheme for enforcing certain keys. Our experiments show that certain keys with
NULLs do occur in real-world databases, and that related computational problems
can be solved efficiently. Certain keys are semantically well-founded, achieve the
goal of Codd’s entity integrity rule and offer more flexibility for data entry than
primary keys.