dc.contributor.advisor |
Link, S |
en |
dc.contributor.author |
He, Senyang |
en |
dc.date.accessioned |
2016-11-06T20:38:22Z |
en |
dc.date.issued |
2016 |
en |
dc.identifier.uri |
http://hdl.handle.net/2292/30970 |
en |
dc.description |
Full text is available to authenticated members of The University of Auckland only. |
en |
dc.description.abstract |
Functional dependencies can express important relationships in data. They can therefore help enforce the semantics of an application domain within a database system. This has applications in many database tasks, including database design, data consistency, update and query processing. Traditional functional dependencies (FDs) were introduced for applications in which data occurs with full certainty. An important problem, that has been studied over the last three decades, concerns the discovery of those functional dependencies that hold on a given data set. In modern applications, such as information extractions, financial risk assessment, or sensor management, data is inherently uncertain. Recently, a new notion of possibilistic functional dependency (pFD) has been introduced. In this data model, each tuple is assigned one of finitely many available degrees of possibility with which it is perceived to occur in the data. Each FD is assigned one of finitely many available degrees of certainty that says to which tuples the FD applies. The new notion of pFD has been shown to have important applications in the schema design of databases. In this thesis, we study the problem of discovering those pFDs that hold on a given possibilistic data set. For this purpose, we review several FD discovery algorithms. We then describe how to extend these algorithms to the discovery problem of pFDs. The extended algorithms have been implemented and their performance in terms of time and memory usage have been tested on real-world benchmark data sets whose tuples have been augmented by randomly assigned degrees of possibility. In addition, we apply canonical cover algorithms to reduce the size of the output of the discovery algorithms, showing that little time is necessary to significantly reduce the size of the output's representation. Given particular features of the input, such as the number of rows, number of columns and number of available possibility degrees, we identify those algorithms that perform the fastest on data sets with these features. This provides guidelines for practitioners to choose the most suitable algorithm for a given data set at hand. Our results support and extend recent findings for the performance of algorithms that address the traditional FD discovery problem. |
en |
dc.publisher |
ResearchSpace@Auckland |
en |
dc.relation.ispartof |
Masters Thesis - University of Auckland |
en |
dc.relation.isreferencedby |
UoA99264880713602091 |
en |
dc.rights |
Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. |
en |
dc.rights |
Restricted Item. Available to authenticated members of The University of Auckland. |
en |
dc.rights.uri |
https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm |
en |
dc.rights.uri |
http://creativecommons.org/licenses/by-nc-sa/3.0/nz/ |
en |
dc.title |
Discovering Functional Dependencies From Possibilistic Data |
en |
dc.type |
Thesis |
en |
thesis.degree.discipline |
Computer Science |
en |
thesis.degree.grantor |
The University of Auckland |
en |
thesis.degree.level |
Masters |
en |
dc.rights.holder |
Copyright: The author |
en |
pubs.elements-id |
544787 |
en |
pubs.record-created-at-source-date |
2016-11-07 |
en |
dc.identifier.wikidata |
Q112924961 |
|