LSHiForest: A Generic Framework for Fast Tree Isolation based Ensemble Anomaly Analysis

Show simple item record

dc.contributor.author Zhang, Xuyun en
dc.contributor.author Dou, W en
dc.contributor.author He, Q en
dc.contributor.author Zhou, R en
dc.contributor.author Leckie, C en
dc.contributor.author Kotagiri, R en
dc.contributor.author Salcic, Zoran en
dc.coverage.spatial San Diego, USA en
dc.date.accessioned 2018-10-09T03:07:12Z en
dc.date.issued 2017-04-19 en
dc.identifier.isbn 978-1-5090-6543-1 en
dc.identifier.uri http://hdl.handle.net/2292/39794 en
dc.description.abstract Anomaly or outlier detection is a major challenge in big data analytics because anomaly patterns provide valuable insights for decision-making in a wide range of applications. Recently proposed anomaly detection methods based on the tree isolation mechanism are very fast due to their logarithmic time complexity, making them capable of handling big data sets efficiently. However, the underlying similarity or distance measures in these methods have not been well understood. Contrary to the claims that these methods never rely on any distance measure, we find that they have close relationships with certain distance measures. This implies that the current use of this fast isolation mechanism is only limited to these distance measures and fails to generalise to other commonlyused measures. In this paper, we propose a generic framework named LSHiForest for fast tree isolation based ensemble anomaly analysis with the use of a Locality-Sensitive Hashing (LSH) forest. Being generic, the proposed framework can be instantiated with a diverse range of LSH families, and the fast isolation mechanism can be extended to any distance measures, data types and data spaces where an LSH family is defined. In particular, the instances of our framework with kernelised LSH families or learning based hashing schemes can detect complicated anomalies like local or surrounded anomalies. We also formally show that the existing tree isolation based detection methods are special cases of our framework with the corresponding distance measures. Extensive experiments on both synthetic and real-world benchmark data sets show that the framework can achieve both high time efficiency and anomaly detection quality. en
dc.relation.ispartof 2017 IEEE 33rd International Conference on Data Engineering (ICDE) en
dc.relation.ispartofseries Data Engineering (ICDE), 2017 IEEE 33rd International Conference on en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title LSHiForest: A Generic Framework for Fast Tree Isolation based Ensemble Anomaly Analysis en
dc.type Conference Item en
dc.identifier.doi 10.1109/ICDE.2017.145 en
pubs.begin-page 983 en
dc.rights.holder Copyright: The author en
pubs.end-page 994 en
pubs.finish-date 2017-04-22 en
pubs.start-date 2017-04-19 en
dc.rights.accessrights http://purl.org/eprint/accessRights/RestrictedAccess en
pubs.subtype Proceedings en
pubs.elements-id 634160 en
pubs.org-id Engineering en
pubs.org-id Department of Electrical, Computer and Software Engineering en
dc.identifier.eissn 2375-026X en
pubs.record-created-at-source-date 2017-07-03 en
pubs.online-publication-date 2017-05-18 en


Files in this item

There are no files associated with this item.

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics