Koh, YSDobbie, GHuang, DT2015-12-1420152015https://hdl.handle.net/2292/27746In 2015, it is estimated that around 500 million Tweets are generated each day and more than 300 hours of video are uploaded to YouTube every minute. Characterized by large volume and fast speed of arrival, these data, arriving in the form of data streams, contain valuable knowledge that data scientists and businesses across the globe are desperately trying to gain access to. Mining these data using traditional techniques designed for databases is no longer feasible and new algorithms must be developed to overcome the constraints. Data streams are dynamic and fast changing and adapting the learning models to react to the presence of change is essential. Currently, change mining only discovers when changes occur and does not consider further characteristics such as how frequently changes occur and how severe or drastic the changes are. This thesis first studies change mining in combination with supervised classification learning and discovers additional change characteristics to further improve how the learning models adapt to the changes in the data stream. Second, the thesis studies change mining in combination with unsupervised association rule mining to find changes in rare association rules. In the first part, we propose a novel change detector, SEED, that finds when changes occur 8 times faster than the current state-of-the-art technique. We then propose and find stream volatility which characterizes how frequently changes occur and also discover the magnitude and slope of the changes which characterizes how severe or drastic the changes are. Further, we show, both empirically and theoretically, that we can use these additional characteristics to establish a more effective change detection approach with more than 90% false positive reduction and build a better learning model in the presence of changes in data streams. Change mining is traditionally studied in combination with supervised classification learning. Currently, there is limited research that investigates when changes occur in data streams in combination with unsupervised learning techniques such as association rule mining. Due to the inherent differences between supervised and unsupervised learning, current change detection methods cannot be directly applied to discover changes in association rules. In the second part, we propose a tree-structured technique that finds rare association rules in data streams and we further define the problem of finding changes in rare association rules. We propose a novel M measure that facilitates the discovery of changes in rare association rules when used in conjunction with SEED. We show experimentally that changes in rare patterns can be discovered with high true positive rate and low false positive rate. In answering the questions of when and how changes occur, we hope that we may be a step closer to figuring out the even more difficult question: exactly what has changed?Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htmhttps://creativecommons.org/licenses/by-nc-sa/3.0/nz/Change Mining and Analysis for Data StreamsThesisCopyright: The Authorhttp://purl.org/eprint/accessRights/OpenAccessQ112909252