Semantic indexing of wearable camera images: Kids’Cam concepts

Smeaton, A; McGuinness, K; Gurrin, C; Zhou, J; O'Connor, NE; Wang, P; Davis, B; Azevedo, L; Freitas, A; Signal, L; Smith, M; Stanley, J; Barr, M; Chambers, T; Ni Mhurchu, Cliona

dc.contributor.author	Smeaton, A	en
dc.contributor.author	McGuinness, K	en
dc.contributor.author	Gurrin, C	en
dc.contributor.author	Zhou, J	en
dc.contributor.author	O'Connor, NE	en
dc.contributor.author	Wang, P	en
dc.contributor.author	Davis, B	en
dc.contributor.author	Azevedo, L	en
dc.contributor.author	Freitas, A	en
dc.contributor.author	Signal, L	en
dc.contributor.author	Smith, M	en
dc.contributor.author	Stanley, J	en
dc.contributor.author	Barr, M	en
dc.contributor.author	Chambers, T	en
dc.contributor.author	Ni Mhurchu, Cliona	en
dc.date.accessioned	2017-11-16T01:16:50Z	en
dc.date.issued	2016	en
dc.identifier.citation	Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion 27-34 2016	en
dc.identifier.uri	http://hdl.handle.net/2292/36435	en
dc.description.abstract	In order to provide content-based search on image media, including images and video, they are typically accessed based on manual or automatically assigned concepts or tags, or sometimes based on image-image similarity depending on the use case. While great progress has been made in very recent years in automatic concept detection using machine learning, we are still left with a mis-match between the semantics of the concepts we can automatically detect, and the semantics of the words used in a user's query, for example. In this paper we report on a large collection of images from wearable cameras gathered as part of the Kids'Cam project, which have been both manually annotated from a vocabulary of 83 concepts, and automatically annotated from a vocabulary of 1,000 concepts. This collection allows us to explore issues around how language, in the form of two distinct concept vocabularies or spaces, one manually assigned and thus forming a ground-truth, is used to represent images, in our case taken using wearable cameras. It also allows us to discuss, in general terms, issues around mis-match of concepts in visual media, which derive from language mis-matches. We report the data processing we have completed on this collection and some of our initial experimentation in mapping across the two language vocabularies.	en
dc.relation.ispartofseries	Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.	en
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.title	Semantic indexing of wearable camera images: Kids’Cam concepts	en
dc.type	Journal Article	en
dc.identifier.doi	10.1145/2983563.2983566	en
pubs.begin-page	27	en
dc.rights.holder	Copyright: The author	en
pubs.end-page	34	en
dc.rights.accessrights	http://purl.org/eprint/accessRights/RestrictedAccess	en
pubs.subtype	Article	en
pubs.elements-id	546020	en
pubs.org-id	Medical and Health Sciences	en
pubs.org-id	Population Health	en
pubs.org-id	Pacific Health	en
pubs.record-created-at-source-date	2016-11-15	en