Detecting and Quantifying Concept Drift for Data Stream

Show simple item record

dc.contributor.advisor Koh, Yun Sing
dc.contributor.advisor Liu, Jiamou
dc.contributor.author Zhao, Di
dc.date.accessioned 2021-05-24T02:43:11Z
dc.date.available 2021-05-24T02:43:11Z
dc.date.issued 2021 en
dc.identifier.uri https://hdl.handle.net/2292/55134
dc.description Full Text is available to authenticated members of The University of Auckland only. en
dc.description.abstract Concept drift describes changes in the underlying distribution of streaming data. Concept drift research involves the development of methodologies and techniques for drift detection, understanding, and adaptation. Data analysis shows that if the drift is not addressed, machine learning in a concept drift environment will result in poor learning results. Most drift detection methods focus on supervised learning, but the labels of streaming data are sometimes expensive. Most drift understanding methods quantify drift by data distribution. These methods require a certain number of data. This thesis investigates two research streams: (1) An unsupervised drift detection method, which does not require prior knowledge of the data distribution, and (2) A framework that quanti es the severity of concept drift from model perspective. In the rst part, we focus on feature drift that shifts boundaries of mode and present an unsupervised framework to detect feature drift without labels. The framework detects abrupt and gradual feature drift by two distance functions, Wasserstein distance and Energy distance, and discusses feature changes in the data stream. A less explored area is describing the changes in the data stream. Crucially, the ability to describe changes in the data stream would enable a better understanding of the changing dynamics in the relationships that take place over time. In particular, we seek to answer the following question: Whether the distribution changes of important features will also cause concept drift. Experimental results show that the proposed framework detects and describes the feature drift. In the second part, we propose a framework to quantify the severity of concept drift from model perspective. Our framework is based on the most popular data stream mining algorithm - Hoe ding Tree. Our approach quanti es the concept drift without data. This reduces the probability of data leaks. The severity of concept drift can be used as a guideline for choosing drift adaptation strategies. Our framework maps Hoe ding trees into groups of vectors and measures similarity and distance between vector groups. The larger similarity/lower distance indicates two trees are similar, and the lower similarity/larger distance indicates two trees are di erent.
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof Masters Thesis - University of Auckland en
dc.relation.isreferencedby UoA99265342313202091 en
dc.rights Restricted Item. Full Text is available to authenticated members of The University of Auckland only. en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/nz/
dc.title Detecting and Quantifying Concept Drift for Data Stream
dc.type Thesis en
thesis.degree.discipline Computer Science
thesis.degree.grantor The University of Auckland en
thesis.degree.level Masters en
dc.date.updated 2021-05-14T01:31:22Z
dc.rights.holder Copyright: the author en
dc.identifier.wikidata Q112957368


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics