Possibilistic Data Cleaning

Show simple item record

dc.contributor.author Kohler, Henning
dc.contributor.author Link, Sebastian
dc.date.accessioned 2022-10-27T00:51:56Z
dc.date.available 2022-10-27T00:51:56Z
dc.date.issued 2021-01-01
dc.identifier.citation (2021). IEEE Transactions on Knowledge and Data Engineering, PP(99), 1-1.
dc.identifier.issn 1041-4347
dc.identifier.uri https://hdl.handle.net/2292/61689
dc.description.abstract Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. Based on the novel mining of the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable.
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartofseries IEEE Transactions on Knowledge and Data Engineering
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.
dc.rights © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
dc.rights.uri https://journals.ieeeauthorcenter.ieee.org/become-an-ieee-journal-author/publishing-ethics/guidelines-and-policies/post-publication-policies/#accepted
dc.subject 08 Information and Computing Sciences
dc.title Possibilistic Data Cleaning
dc.type Journal Article
dc.identifier.doi 10.1109/tkde.2021.3062318
pubs.issue 99
pubs.begin-page 1
pubs.volume PP
dc.date.updated 2022-09-28T19:41:07Z
dc.rights.holder Copyright: IEEE en
pubs.end-page 1
pubs.publication-status Published
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.subtype Journal Article
pubs.elements-id 845271
pubs.org-id Science
pubs.org-id School of Computer Science
dc.identifier.eissn 1558-2191
pubs.record-created-at-source-date 2022-09-29


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics