Naeem, M AsifDobbie, GillianWeber, Gerald2011-11-092010-07SE Software Engineering (UoA-SE-2010-2). Department of Computer Science, The Univerity of Auckland. 1-22 Jul 2010.http://hdl.handle.net/2292/8841In the field of real-time data warehousing updates occurring on the source systems need to be reflected in the data warehouse immediately. One important element in real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, MESHJOIN cannot deal with intermittent streams, because tuples could wait for an undetermined time, thus defying the real-time character of the stream. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. In this paper we introduce a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN) which combines the two approaches. As a theoretical result we show that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. We present performance measurements of our implementation. We use synthetic data, that we base on a Zipfian distribution, which is widely accepted as a plausible distribution for real world identifier sets in many domains. In our experiments, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings. Hence HYBRIDJOIN is a robust algorithm that generally performs at an acceptable speed.Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htmHYBRIDJOIN for Near-real-time Data WarehousingReportCopyright: Software Engineering University of Aucklandhttp://purl.org/eprint/accessRights/RestrictedAccess