Abstract:
Accurately detecting regular patterns in data streams is an interesting problem in data mining. The vastness of data streams creates a real challenge to extract reliable knowledge and can have considerable consequences to the accuracy of the results. Patterns that have temporal regularity have significant importance to business and online applications such as supermarkets and market basket analysis. Regular pattern information enables such applications to increase their efficiency while reducing costs and making more informed decisions. In this research we use the RPS-Tree model to detect regular patterns in data stream. We use the RPS-Tree in conjunction with a synthetic dataset to test its accuracy in detecting regular pattern from data streams. The dataset is created by a synthetic dataset generator that allows the user to predefine the shape of the data. We perform the experiment using three different types of datasets, each with one parameter altered in order to observe how the RPS-Tree reacts. The three parameters chosen are the average pattern length, total number of transactions, and the number of itemsets. We observed that the RPS-Tree maintained an impressive accuracy under all three conditions and in some cases found additional new regular patterns that were previously undefined by the synthetic dataset generator.