Detecting and Quantifying Concept Drift for Data Stream

Zhao, Di

dc.contributor.advisor	Koh, Yun Sing
dc.contributor.advisor	Liu, Jiamou
dc.contributor.author	Zhao, Di
dc.date.accessioned	2021-05-24T02:43:11Z
dc.date.available	2021-05-24T02:43:11Z
dc.date.issued	2021	en
dc.identifier.uri	https://hdl.handle.net/2292/55134
dc.description	Full Text is available to authenticated members of The University of Auckland only.	en
dc.description.abstract	Concept drift describes changes in the underlying distribution of streaming data. Concept drift research involves the development of methodologies and techniques for drift detection, understanding, and adaptation. Data analysis shows that if the drift is not addressed, machine learning in a concept drift environment will result in poor learning results. Most drift detection methods focus on supervised learning, but the labels of streaming data are sometimes expensive. Most drift understanding methods quantify drift by data distribution. These methods require a certain number of data. This thesis investigates two research streams: (1) An unsupervised drift detection method, which does not require prior knowledge of the data distribution, and (2) A framework that quanti es the severity of concept drift from model perspective. In the rst part, we focus on feature drift that shifts boundaries of mode and present an unsupervised framework to detect feature drift without labels. The framework detects abrupt and gradual feature drift by two distance functions, Wasserstein distance and Energy distance, and discusses feature changes in the data stream. A less explored area is describing the changes in the data stream. Crucially, the ability to describe changes in the data stream would enable a better understanding of the changing dynamics in the relationships that take place over time. In particular, we seek to answer the following question: Whether the distribution changes of important features will also cause concept drift. Experimental results show that the proposed framework detects and describes the feature drift. In the second part, we propose a framework to quantify the severity of concept drift from model perspective. Our framework is based on the most popular data stream mining algorithm - Hoe ding Tree. Our approach quanti es the concept drift without data. This reduces the probability of data leaks. The severity of concept drift can be used as a guideline for choosing drift adaptation strategies. Our framework maps Hoe ding trees into groups of vectors and measures similarity and distance between vector groups. The larger similarity/lower distance indicates two trees are similar, and the lower similarity/larger distance indicates two trees are di erent.
dc.publisher	ResearchSpace@Auckland	en
dc.relation.ispartof	Masters Thesis - University of Auckland	en
dc.relation.isreferencedby	UoA99265342313202091	en
dc.rights	Restricted Item. Full Text is available to authenticated members of The University of Auckland only.	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/nz/
dc.title	Detecting and Quantifying Concept Drift for Data Stream
dc.type	Thesis	en
thesis.degree.discipline	Computer Science
thesis.degree.grantor	The University of Auckland	en
thesis.degree.level	Masters	en
dc.date.updated	2021-05-14T01:31:22Z
dc.rights.holder	Copyright: the author	en
dc.identifier.wikidata	Q112957368