The Deluge of Spurious Correlations in Big Data

Show simple item record

dc.contributor.author Calude, CS en
dc.contributor.author Longo, G en
dc.date.accessioned 2016-01-04T21:55:58Z en
dc.date.available 2016-01-04T21:55:58Z en
dc.date.issued 2015 en
dc.identifier.citation CDMTCS Research Reports CDMTCS-488 (2015) en
dc.identifier.issn 1178-3540 en
dc.identifier.uri http://hdl.handle.net/2292/27857 en
dc.description.abstract Very large databases are a major opportunity for science and data analytics is a remarkable new field of investigation in computer science. The effectiveness of these tools is used to support a “philosophy” against the scientific method as developed throughout history. According to this view, computer-discovered correlations should replace understanding and guide prediction and action. Consequently, there will be no need to give scientific meaning to phenomena, by proposing, say, causal relations, since regularities in very large databases are enough: “with enough data, the numbers speak for themselves”. The “end of science” is proclaimed. Using classical results from ergodic theory, Ramsey theory and algorithmic information theory, we show that this “philosophy” is wrong. For example, we prove that very large databases have to contain arbitrary correlations. These correlations appear only due to the size, not the nature, of data. They can be found in “randomly” generated, large enough databases, which – as we will prove – implies that most correlations are spurious. Too much information tends to behave like very little information. The scientific method can be enriched by computer mining in immense databases, but not replaced by it. en
dc.publisher Department of Computer Science, The University of Auckland, New Zealand en
dc.relation.ispartofseries CDMTCS Research Report Series en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.source.uri https://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/index.php en
dc.title The Deluge of Spurious Correlations in Big Data en
dc.type Technical Report en
dc.subject.marsden Fields of Research en
dc.rights.holder The author(s) en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics