Abstract:
The evolution of data such as changes in the underlying model known as concept drift present many challenges for data stream research. Currently most drift detection methods are able to locate the point of change, but are unable to provide meaningful information on the characteristics of change or utilize historical trends. In this thesis, we investigate two streams of research: (1) the magnitude of change which we refer to as drift severity, and (2) the rate of change which we refer to as the stream volatility [7]. In the rst part, we propose a drift detector, MAGSEED, for tracking the drift severity of a stream. Monitoring drift severity provides crucial information to users allowing them to formulate a more adaptive response. We show that our technique is capable of tracking drift severity with a high rate of true positives and a low rate of false positives and compare it to state-of-the art drift detectors ADWIN2 and DDM. In the second part, we explore ways to learn historical drift rate trends, and develop a proactive drift detection system. The main motivation for our work comes from the observation of volatility trends resulting from the application of current drift detection methods to real data streams. We observe that these patterns of change vary across di erent data streams. We use the term \volatility pattern" to describe change rates with a distinct distribution. We propose a novel drift prediction method, DPM, to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. We demonstrate that our method is able to learn and predict drift trends in streams with reoccurring volatility patterns. This allows the anticipation of future changes which enables users and detection methods to be more proactive. We then apply our drift prediction algorithm by incorporating the drift estimates into a drift detector, PROSEED, to improve its performance by decreasing the false positive rate.