Abstract:
Keys play a fundamental role in all data models. They allow database systems to uniquely identify data items, and therefore promote efficient data processing in most applications. Due to this role support is required to discover keys. These include keys that are semantically meaningful for the application domain, or are satisfied by a given database instance. Here, we study the discovery of keys from SQL tables. We investigate structural and computational properties of Armstrong tables for sets of SQL keys that are currently perceived as semantically meaningful. Inspections of Armstrong tables enable data engineers to consolidate their understanding of the semantics of the application domain, and communicate this understanding to other stake-holders of the database, e.g. domain experts or managers. The stake-holders may want to make changes to the tables or provide entirely different tables in order to communicate their expert views to the data engineers. For such purpose we propose data mining algorithms that discover keys from a given SQL table. Finally, we define formal measures to assess the distance between sets of SQL keys. The measures can be applied to empirically validate the usefulness of Armstrong tables, and to automate marking and feedback of non-multiple choice questions in database courses.