Speech Emotion Recognition ‘in the Wild’ Using an Autoencoder

Show simple item record

dc.contributor.author Dissanayake, Vipula
dc.contributor.author Zhang, Haimo
dc.contributor.author Billinghurst, Mark
dc.contributor.author Nanayakkara, Suranga
dc.date.accessioned 2022-06-10T01:49:44Z
dc.date.available 2022-06-10T01:49:44Z
dc.date.issued 2020-10-25
dc.identifier.citation (2020). Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020-October, 526-530.
dc.identifier.issn 2308-457X
dc.identifier.uri https://hdl.handle.net/2292/59694
dc.description.abstract Speech Emotion Recognition (SER) has been a challenging task on which researchers have been working for decades. Recently, Deep Learning (DL) based approaches have been shown to perform well in SER tasks; however, it has been noticed that their superior performance is limited to the distribution of the data used to train the model. In this paper, we present an analysis of using autoencoders to improve the generalisability of DL based SER solutions. We train a sparse autoencoder using a large speech corpus extracted from social media. Later, the trained encoder part of the autoencoder is reused as the input to a long short-term memory (LSTM) network, and the encoder-LSTM modal is re-trained on an aggregation of five commonly used speech emotion corpora. Our evaluation uses an unseen corpus in the training & validation stages to simulate 'in the wild' condition and analyse the generalisability of our solution. A performance comparison is carried out between the encoder based model and a model trained without an encoder. Our results show that the autoencoder based model improves the unweighted accuracy of the unseen corpus by 8%, indicating autoencoder based pre-training can improve the generalisability of DL based SER solutions.
dc.publisher ISCA
dc.relation.ispartof Interspeech 2020
dc.relation.ispartofseries Interspeech 2020
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
dc.rights.uri https://www.isca-speech.org/archive/index.html#about
dc.title Speech Emotion Recognition ‘in the Wild’ Using an Autoencoder
dc.type Conference Item
dc.identifier.doi 10.21437/interspeech.2020-1356
pubs.begin-page 526
pubs.volume 2020-October
dc.date.updated 2022-05-30T03:49:14Z
dc.rights.holder Copyright: ISCA en
pubs.end-page 530
pubs.publication-status Published online
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.elements-id 833854
pubs.org-id Bioengineering Institute
dc.identifier.eissn 1990-9772
pubs.record-created-at-source-date 2022-05-30
pubs.online-publication-date 2020-10-25

Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record


Search ResearchSpace