Robust data profiling and schema design for incomplete relational databases

Show simple item record

dc.contributor.advisor Link, S en
dc.contributor.advisor Khoussainov, B en
dc.contributor.author Wei, Ziheng en
dc.date.accessioned 2019-07-26T01:48:34Z en
dc.date.issued 2019 en
dc.identifier.uri http://hdl.handle.net/2292/47419 en
dc.description.abstract After more than 40 years in use, relational database systems provide a mature and sophisticated technology for effective and efficient data management. For example, SQL is the number one skill that data scientists possess. Relational database systems model a given domain of interest. One of the core problems that still remain is the correct handling of incomplete information. For example, we still do not know how database schemata for data with missing values should be designed. Many approaches to this problem exist, with most depending on a specific interpretation of missing values. Already in classical applications such as accounting, but even more so in modern day applications such as data integration, a reasonable approach should not be dependent on a specific interpretation of missing values. In this thesis, a new approach is established for the design of incomplete relational databases. The new approach is robust under different interpretations of missing values, and only depends on the complete fragments of an incomplete database. The thesis introduces the classes of embedded uniqueness constraints and embedded functional dependencies that enable users to declare completeness and integrity requirements of a given application within a single framework. Embedded functional dependencies are sources of redundant data value occurrences, while embedded uniqueness constraints avoid redundant data value occurrences. This makes them particularly interesting for the design of databases, as data redundancy largely determines how efficiently queries and updates can be processed. We study several computational problems related to the combined class, including axiomatic and algorithmic solutions to their implication problem, structural and computational aspects of Armstrong relations, their discovery problem, and the development of normal forms and decompositions that subsume the classical cases of Boyce-Codd and Third normal forms, respectively, as special cases. Many of our solutions are implemented as prototypes, and theoretical investigations of the computational complexity are complemented by empirical investigations in terms of performance studies on benchmark datasets. As a side result, we also develop a new algorithm that discovers all classical functional dependencies that hold on a given incomplete relation and show that it outperforms state-of-the-art for efficiency, row-, and column-scalability. Overall, the thesis offers a completeness-tailored approach to the design of relational databases. en
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof PhD Thesis - University of Auckland en
dc.relation.isreferencedby UoA99265159113302091 en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/nz/ en
dc.title Robust data profiling and schema design for incomplete relational databases en
dc.type Thesis en
thesis.degree.discipline Computer Science en
thesis.degree.grantor The University of Auckland en
thesis.degree.level Doctoral en
thesis.degree.name PhD en
dc.rights.holder Copyright: The author en
pubs.author-url http://hdl.handle.net/2292/47419 en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.elements-id 777284 en
pubs.org-id Science en
pubs.org-id School of Computer Science en
pubs.record-created-at-source-date 2019-07-26 en
dc.identifier.wikidata Q112950811


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics