Selection Bias Identification and Mitigation With No Ground Truth Information

Dost, Katharina

dc.contributor.advisor	Wicker, Jörg
dc.contributor.advisor	Riddle, Patricia
dc.contributor.author	Dost, Katharina
dc.date.accessioned	2023-10-16T18:56:20Z
dc.date.available	2023-10-16T18:56:20Z
dc.date.issued	2022	en
dc.identifier.uri	https://hdl.handle.net/2292/66291
dc.description.abstract	Machine Learning should be able to support decision-making by focusing on purely logical conclusions based on historical data. If this data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground truth. This reliance is problematic, particularly if the user is not aware of the bias, no ground truth knowledge is available, or no concrete target task is defined yet, e.g., during data gathering. We argue that some indication of future problems is present in the historical dataset itself. Extracting it as early as during data gathering can help correct the flaws on-the-fly or create awareness in researchers working with the dataset. In this thesis, we aim to identify selection biases on the historical data alone when no ground-truth information is available. Selection biases stem from a non-uniform sampling process. To mitigate them, we generate additional data points that bridge the gap between sample and ground-truth distribution. Pioneering this research topic, we suggest three algorithms built on the assumption that the distribution of sufficiently large and unbiased datasets should be smooth, without any sudden drops in density. Extensive experiments and discussions highlight the need for such data analysis tools and illustrate that each of our methods has its own merits. Overall, we contribute to a better understanding of the data we use and trust and challenge existing procedures in machine learning that accept flawed data as given and treat symptoms rather than causes.
dc.publisher	ResearchSpace@Auckland	en
dc.relation.ispartof	PhD Thesis - University of Auckland	en
dc.relation.isreferencedby	UoA	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/nz/
dc.title	Selection Bias Identification and Mitigation With No Ground Truth Information
dc.type	Thesis	en
thesis.degree.discipline	Computer Science
thesis.degree.grantor	The University of Auckland	en
thesis.degree.level	Doctoral	en
thesis.degree.name	PhD	en
dc.date.updated	2023-10-11T22:33:20Z
dc.rights.holder	Copyright: The author	en
dc.rights.accessrights	http://purl.org/eprint/accessRights/OpenAccess	en