Discovering Functional Dependencies From Possibilistic Data

Show simple item record

dc.contributor.advisor Link, S en
dc.contributor.author He, Senyang en
dc.date.accessioned 2016-11-06T20:38:22Z en
dc.date.issued 2016 en
dc.identifier.uri http://hdl.handle.net/2292/30970 en
dc.description Full text is available to authenticated members of The University of Auckland only. en
dc.description.abstract Functional dependencies can express important relationships in data. They can therefore help enforce the semantics of an application domain within a database system. This has applications in many database tasks, including database design, data consistency, update and query processing. Traditional functional dependencies (FDs) were introduced for applications in which data occurs with full certainty. An important problem, that has been studied over the last three decades, concerns the discovery of those functional dependencies that hold on a given data set. In modern applications, such as information extractions, financial risk assessment, or sensor management, data is inherently uncertain. Recently, a new notion of possibilistic functional dependency (pFD) has been introduced. In this data model, each tuple is assigned one of finitely many available degrees of possibility with which it is perceived to occur in the data. Each FD is assigned one of finitely many available degrees of certainty that says to which tuples the FD applies. The new notion of pFD has been shown to have important applications in the schema design of databases. In this thesis, we study the problem of discovering those pFDs that hold on a given possibilistic data set. For this purpose, we review several FD discovery algorithms. We then describe how to extend these algorithms to the discovery problem of pFDs. The extended algorithms have been implemented and their performance in terms of time and memory usage have been tested on real-world benchmark data sets whose tuples have been augmented by randomly assigned degrees of possibility. In addition, we apply canonical cover algorithms to reduce the size of the output of the discovery algorithms, showing that little time is necessary to significantly reduce the size of the output's representation. Given particular features of the input, such as the number of rows, number of columns and number of available possibility degrees, we identify those algorithms that perform the fastest on data sets with these features. This provides guidelines for practitioners to choose the most suitable algorithm for a given data set at hand. Our results support and extend recent findings for the performance of algorithms that address the traditional FD discovery problem. en
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof Masters Thesis - University of Auckland en
dc.relation.isreferencedby UoA99264880713602091 en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights Restricted Item. Available to authenticated members of The University of Auckland. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/nz/ en
dc.title Discovering Functional Dependencies From Possibilistic Data en
dc.type Thesis en
thesis.degree.discipline Computer Science en
thesis.degree.grantor The University of Auckland en
thesis.degree.level Masters en
dc.rights.holder Copyright: The author en
pubs.elements-id 544787 en
pubs.record-created-at-source-date 2016-11-07 en
dc.identifier.wikidata Q112924961


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics