Modeling Prosodic Features for Empathetic Speech of a Healthcare Robot

James, Jesin

Modeling Prosodic Features for Empathetic Speech of a Healthcare Robot

James, Jesin

Identifier: https://hdl.handle.net/2292/57045

Issue Date: 2020

Degree Name: PhD

Degree Grantor: The University of Auckland

Rights: Copyright: The author

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Healthcare robots that interact with humans are increasingly common these days. Many of these social robots can interact with humans using their voice. The synthesised voice of a social robot can impact on its acceptance by humans. This thesis identified the type of voice that is needed for a healthcare robot interacting with humans, which was the first research question. Once the type of voice needed for a healthcare robot was identified, then the second research question aimed to synthesise the voice. To identify the type of voice needed for a healthcare robot, a perception test was conducted. In this thesis, the focus is on empathetically speaking social robots. Here, empathy was included only via prosody modelling of the speech signal. In the perception test, the participants were asked questions related to their preference for an empathetically speaking healthcare robot. With the responses from the participants and a detailed analysis of a particular healthcare robot’s dialogues, the emotions needed for an empathetic voice were identified. To address the second research question, prosody features of speech were modelled for the emotions identified by addressing research question 1. Further, to evaluate the synthesised emotional speech, perception tests were conducted. The results obtained suggest that people prefer empathetically speaking healthcare robots. One of the major findings of this thesis is that the emotions needed for an empathetic voice are not only the primary emotions but also some secondary emotions – anxious, apologetic, confident, enthusiastic, worried. The development and acoustic analysis of an emotional speech corpus with these secondary emotions was done. Based on this acoustic analysis, the fundamental frequency contour was parametrically modelled. The speech rate and mean intensity were modelled using rules. Ensemble regression was used to predict these three prosody features for each of the secondary emotions. Using these prediction models and a Hidden Markov Model-based speech synthesis approach, secondary emotions were synthesised. Results of the perception test showed that participants could perceive the secondary emotions. Finally, to tie the whole story together,the participants could perceive high levels of empathy from a healthcare robot speaking with a synthesised voice containing the five secondary emotions. In short, the emotions needed for an empathetically speaking healthcare robot were identified. These emotions were secondary emotions, and they were synthesised by modelling prosody features using machine learning. Finally, participants agreed that they could perceive empathy from the synthesised voice of a healthcare robot.

Show full item record