Skewed distributions in semi-stream joins: How much can caching help?

Show simple item record

dc.contributor.author Naeem, MA en
dc.contributor.author Dobbie, Gillian en
dc.contributor.author Lutteroth, Christof en
dc.contributor.author Weber, Gerald en
dc.date.accessioned 2017-06-18T22:38:34Z en
dc.date.issued 2017-03 en
dc.identifier.citation Information Systems 64:63-74 Mar 2017 en
dc.identifier.issn 0306-4379 en
dc.identifier.uri http://hdl.handle.net/2292/33577 en
dc.description.abstract Semi-stream join algorithms join a fast data stream with a disk-based relation. This is important, for example, in real-time data warehousing where a stream of transactions is joined with master data before loading it into a data warehouse. In many important scenarios, the stream input has a skewed distribution, which makes certain performance optimizations possible. We propose two such optimization techniques: (1) a caching technique for frequently used master data and (2) a technique for selective load shedding of stream tuples. The caching technique is fine-grained, operating on a tuple-level. Furthermore, it is generic in the sense that it can be applied to different semi-stream join algorithms to deal with data skew. We analyze it by combining it with various well-known semi-stream joins, and show that it improves the service rate by more than 40% for typical data with skewed distributions. The load shedding technique sheds the fraction of the stream that is most expensive to join. In contrast to existing approaches, the service rate improves under load shedding. We present experimental data showing significant improvements as compared to related approaches and perform a sensitivity analysis for various internal parameters. en
dc.publisher Elsevier Science & Technology en
dc.relation.ispartofseries Information Systems en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. Details obtained from http://www.sherpa.ac.uk/romeo/issn/0306-4379/ en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/ en
dc.title Skewed distributions in semi-stream joins: How much can caching help? en
dc.type Journal Article en
dc.identifier.doi 10.1016/j.is.2016.09.007 en
pubs.begin-page 63 en
pubs.volume 64 en
dc.description.version AM - Accepted Manuscript en
dc.rights.holder Copyright: The Authors en
pubs.end-page 74 en
pubs.publication-status Published en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.subtype Article en
pubs.elements-id 606242 en
pubs.org-id Science en
pubs.org-id School of Computer Science en
pubs.record-created-at-source-date 2017-06-19 en


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics