Speech Emotion Recognition ‘in the Wild’ Using an Autoencoder

Dissanayake, Vipula; Zhang, Haimo; Billinghurst, Mark; Nanayakkara, Suranga

dc.contributor.author	Dissanayake, Vipula
dc.contributor.author	Zhang, Haimo
dc.contributor.author	Billinghurst, Mark
dc.contributor.author	Nanayakkara, Suranga
dc.date.accessioned	2022-06-10T01:49:44Z
dc.date.available	2022-06-10T01:49:44Z
dc.date.issued	2020-10-25
dc.identifier.citation	(2020). Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020-October, 526-530.
dc.identifier.issn	2308-457X
dc.identifier.uri	https://hdl.handle.net/2292/59694
dc.description.abstract	Speech Emotion Recognition (SER) has been a challenging task on which researchers have been working for decades. Recently, Deep Learning (DL) based approaches have been shown to perform well in SER tasks; however, it has been noticed that their superior performance is limited to the distribution of the data used to train the model. In this paper, we present an analysis of using autoencoders to improve the generalisability of DL based SER solutions. We train a sparse autoencoder using a large speech corpus extracted from social media. Later, the trained encoder part of the autoencoder is reused as the input to a long short-term memory (LSTM) network, and the encoder-LSTM modal is re-trained on an aggregation of five commonly used speech emotion corpora. Our evaluation uses an unseen corpus in the training & validation stages to simulate 'in the wild' condition and analyse the generalisability of our solution. A performance comparison is carried out between the encoder based model and a model trained without an encoder. Our results show that the autoencoder based model improves the unweighted accuracy of the unseen corpus by 8%, indicating autoencoder based pre-training can improve the generalisability of DL based SER solutions.
dc.publisher	ISCA
dc.relation.ispartof	Interspeech 2020
dc.relation.ispartofseries	Interspeech 2020
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm
dc.rights.uri	https://www.isca-speech.org/archive/index.html#about
dc.title	Speech Emotion Recognition ‘in the Wild’ Using an Autoencoder
dc.type	Conference Item
dc.identifier.doi	10.21437/interspeech.2020-1356
pubs.begin-page	526
pubs.volume	2020-October
dc.date.updated	2022-05-30T03:49:14Z
dc.rights.holder	Copyright: ISCA	en
pubs.end-page	530
pubs.publication-status	Published online
dc.rights.accessrights	http://purl.org/eprint/accessRights/OpenAccess	en
pubs.elements-id	833854
pubs.org-id	Bioengineering Institute
dc.identifier.eissn	1990-9772
pubs.record-created-at-source-date	2022-05-30
pubs.online-publication-date	2020-10-25