Abstract:
Relational database management systems have been developed for applications with certain data, such as inventory, payroll and airline reservation. New applications, such as financial risk assessment, information extraction, information integration, sensor data, and scientific data management, require database systems to handle uncertainty in data. A popular qualitative approach to uncertain data is based on possibility theory. In comparison to probability theory, which is the quantitative counterpart, possibilistic data management is expected to make uncertain data elicitation easier and uncertain data processing computationally more efficient. Recently, new classes of possibilistic database constraints have been proposed that make it possible to apply classical database schema design techniques to qualitatively uncertain data. These new classes of constraints apply to pure relations and are not compliant with the requirements of the industry-standard SQL. The first contribution of my research is a possibilistic SQL data model which combines the features of both the traditional SQL data model and the possibilistic data model mentioned above. As a second contribution, I propose a semantics for possibilistic SQL keys, possibilistic SQL functional dependencies and possibilistic NOT NULL constraints. This helps database designers enforce important domain semantics within industry-standard compliant database management systems that can accommodate uncertainty in data. In a third contribution, I establish axiomatic and linear-time algorithmic characterizations of the core reasoning problem for the combined class of these constraints, namely the implication problem. This contribution has important applications in database updates and query processing. As a fourth contribution, I establish structural and computational properties of possibilistic SQL tables that are Armstrong for any set of standard constraints from the combined class. Armstrong tables help database designers identify those constraints which are meaningful for the application domain at hand. The correct and complete identification of such constraints is of utmost importance in data management. As a fifth contribution, I present the graphical user interface of a prototype system that implements my algorithm for computing Armstrong tables. This contribution forms the foundation for transferring the theoretical aspects of my dissertation into database practice. As the final contribution, I report on detailed experiments with the algorithm for computing Armstrong tables as well as its optimizations. The results of the experiments provide insight into the best-, worst-, and average-case space and time complexity of the algorithms, the impact of the NOT NULL constraints and number of available possibilisty/certainty degrees, and the time savings that the optimizations achieve. The results of my research enable database users enforce important application domain semantics within database systems that support modern requirements of industry-standard compliance and uncertainty in data. When using such modern database systems, my results can help with the design of better database schemata, processing data more efficiently, making data-driven decisions more effectively, and saving costs for cleaning data in applications.