What is in our datasets? Describing a structure of datasets

Show simple item record

dc.contributor.author Rosli, MM en
dc.contributor.author Tempero, Ewan en
dc.contributor.author Luxton-Reilly, Andrew en
dc.date.accessioned 2018-10-07T22:45:35Z en
dc.date.issued 2016-02-01 en
dc.identifier.isbn 9781450340427 en
dc.identifier.uri http://hdl.handle.net/2292/39268 en
dc.description.abstract © 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. In order to facilitate research based on datasets in empirical software engineering, the meaning of data must be able to be interpreted correctly. Datasets contain measurements that are associated with metrics and entities. In some datasets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. The goal of this study is to determine a useful way to understand what datasets are actually intended to represent. We construct precise definitions of datasets and their potential elements. We develop a metamodel to describe the structure and concepts in a dataset, and the relationships between each concept. We apply the metamodel to a number of existing datasets from the PROMISE repository. We found that of the 70 existing datasets we studied, 61 datasets contained insufficient information to ensure correct interpretation for metrics and entities. Our metamodel can be used to identify such datasets and can be used to evaluate new datasets. It will also form the foundation for a framework to evaluate the quality of datasets. en
dc.relation.ispartofseries ACM International Conference Proceeding Series en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title What is in our datasets? Describing a structure of datasets en
dc.type Conference Item en
dc.identifier.doi 10.1145/2843043.2843059 en
pubs.volume 01-05-February-2016 en
dc.rights.holder Copyright: The author en
pubs.publication-status Published en
dc.rights.accessrights http://purl.org/eprint/accessRights/RestrictedAccess en
pubs.elements-id 527255 en
pubs.org-id Science en
pubs.org-id School of Computer Science en


Files in this item

There are no files associated with this item.

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics