Abstract:
In the application of robots in healthcare, where there is a requirement to communicate vocally with non-expert users, a capacity to generate intelligible and expressive speech is needed. The Festival Speech Synthesis System is used as a framework for speech generation on our healthcare robot. Expression is added to speech by modifying mean pitch and pitch range parameters of a statistical model distributed with Festival. US and UK English diphone voices are evaluated alongside a newly made New Zealand English accented diphone voice by human judges. Results show judges can discern different accents and correctly identify the nationality of the voice.