dc.contributor.author |
Smeaton, A |
en |
dc.contributor.author |
McGuinness, K |
en |
dc.contributor.author |
Gurrin, C |
en |
dc.contributor.author |
Zhou, J |
en |
dc.contributor.author |
O'Connor, NE |
en |
dc.contributor.author |
Wang, P |
en |
dc.contributor.author |
Davis, B |
en |
dc.contributor.author |
Azevedo, L |
en |
dc.contributor.author |
Freitas, A |
en |
dc.contributor.author |
Signal, L |
en |
dc.contributor.author |
Smith, M |
en |
dc.contributor.author |
Stanley, J |
en |
dc.contributor.author |
Barr, M |
en |
dc.contributor.author |
Chambers, T |
en |
dc.contributor.author |
Ni Mhurchu, Cliona |
en |
dc.date.accessioned |
2017-11-16T01:16:50Z |
en |
dc.date.issued |
2016 |
en |
dc.identifier.citation |
Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion 27-34 2016 |
en |
dc.identifier.uri |
http://hdl.handle.net/2292/36435 |
en |
dc.description.abstract |
In order to provide content-based search on image media, including images and video, they are typically accessed based on manual or automatically assigned concepts or tags, or sometimes based on image-image similarity depending on the use case. While great progress has been made in very recent years in automatic concept detection using machine learning, we are still left with a mis-match between the semantics of the concepts we can automatically detect, and the semantics of the words used in a user's query, for example. In this paper we report on a large collection of images from wearable cameras gathered as part of the Kids'Cam project, which have been both manually annotated from a vocabulary of 83 concepts, and automatically annotated from a vocabulary of 1,000 concepts. This collection allows us to explore issues around how language, in the form of two distinct concept vocabularies or spaces, one manually assigned and thus forming a ground-truth, is used to represent images, in our case taken using wearable cameras. It also allows us to discuss, in general terms, issues around mis-match of concepts in visual media, which derive from language mis-matches. We report the data processing we have completed on this collection and some of our initial experimentation in mapping across the two language vocabularies. |
en |
dc.relation.ispartofseries |
Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion |
en |
dc.rights |
Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. |
en |
dc.rights.uri |
https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm |
en |
dc.title |
Semantic indexing of wearable camera images: Kids’Cam concepts |
en |
dc.type |
Journal Article |
en |
dc.identifier.doi |
10.1145/2983563.2983566 |
en |
pubs.begin-page |
27 |
en |
dc.rights.holder |
Copyright: The author |
en |
pubs.end-page |
34 |
en |
dc.rights.accessrights |
http://purl.org/eprint/accessRights/RestrictedAccess |
en |
pubs.subtype |
Article |
en |
pubs.elements-id |
546020 |
en |
pubs.org-id |
Medical and Health Sciences |
en |
pubs.org-id |
Population Health |
en |
pubs.org-id |
Pacific Health |
en |
pubs.record-created-at-source-date |
2016-11-15 |
en |