Does the inclusion of other modalities enhance the performance of speech emotion recognition systems?

Liu, Junchen; James, Jesin; Nathwani, Karan

Does the inclusion of other modalities enhance the performance of speech emotion recognition systems?

Reference

Nineteenth Australasian International Conference on Speech Science and Technology, Melbourne, Australia, 03 Dec 2024 - 05 Dec 2024. Proceedings of the Nineteenth Australasian International Conference on Speech Science and Technology. Australasian Speech Science and Technology Association. 32-36. 01 Dec 2024

Abstract

The pursuit of natural human-computer interaction has driven the advancement of emotion recognition technology. Speech emotion recognition (SER) has gained widespread attention due to its high applicability. Recently, some researchers have been interested in developing multi-modal emotion recognition (MER)systemsthat integrate speech with text and video modalities to enhance robustness and accuracy. We analyse the performance of these systems using the IEMOCAP and RAVDESS datasets, highlighting the impact of different modality combinations on emotion recognition accuracy. This paper aims to guide future research in optimising MER by leveraging the complementary advantages of various modalities.