X-HYBRIDJOIN for near-real-time data warehousing

Show simple item record

dc.contributor.author Naeem, MA en
dc.contributor.author Dobbie, Gillian en
dc.contributor.author Weber, Gerald en
dc.contributor.editor Fernandes, AAA en
dc.contributor.editor Gray, AJG en
dc.contributor.editor Belhajjame, K en
dc.coverage.spatial Manchester, UK en
dc.date.accessioned 2012-03-14T23:40:02Z en
dc.date.issued 2011 en
dc.identifier.citation Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7051:33-47 2011 en
dc.identifier.isbn 978-3-642-24576-3 en
dc.identifier.issn 0302-9743 en
dc.identifier.uri http://hdl.handle.net/2292/14402 en
dc.description.abstract In order to make timely and effective decisions, businesses need the latest information from data warehouse repositories. To keep these repositories up-to-date with respect to end user updates, near-realtime data integration is required. An important phase in near-real-time data integration is data transformation where the stream of updates is joined with disk-based master data. The stream-based algorithm Mesh Join (MESHJOIN) has been proposed to amortize disk access over fast stream. MESHJOIN makes no assumptions about the data distribution. In real world applications, however, skewed distributions can be found, e.g, certain products are sold more frequently than the remainder of the products. The question arises, how much does MESHJOIN loose in terms of performance by not adapting to data skew. In this paper we perform a rigorous experimental study analyzing the possible performance improvements while considering typical data distributions. For this purpose we design an algorithm Extended Hybrid Join (X-HYBRIDJOIN) that is complementary to MESHJOIN in that it can adapt to data skew and stores parts of the master data in memory permanently, reducing the disk access overhead significantly. We compare the performance of X-HYBRIDJOIN against the performance of MESHJOIN. We take several precautions to make sure the comparison is adequate and focuses on the utilization of data skew. The experiments show that considering data skew offers substantial room for performance gains that cannot be used by non-adaptive approaches such as MESHJOIN. en
dc.publisher Springer Verlag en
dc.relation.ispartof 28th British National Conference on Databases en
dc.relation.ispartofseries Lecture Notes in Computer Science en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. Details obtained from: http://www.sherpa.ac.uk/romeo/issn/0302-9743/ en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title X-HYBRIDJOIN for near-real-time data warehousing en
dc.type Conference Item en
dc.identifier.doi 10.1007/978-3-642-24577-0_5 en
pubs.begin-page 33 en
pubs.volume 7051 en
dc.rights.holder Copyright: Springer Verlag en
pubs.end-page 47 en
pubs.finish-date 2011-07-14 en
pubs.start-date 2011-07-12 en
dc.rights.accessrights http://purl.org/eprint/accessRights/RestrictedAccess en
pubs.subtype Proceedings en
pubs.elements-id 245466 en
dc.relation.isnodouble 12680 *
dc.relation.isnodouble 12681 *
pubs.org-id Science en
pubs.org-id School of Computer Science en
dc.identifier.eissn 1611-3349 en
pubs.record-created-at-source-date 2012-03-15 en


Files in this item

There are no files associated with this item.

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics