Abstract:
RODUCING new synthetic voices for speech synthesis systems is expensive in time, and requires expert knowledge. This has motivated research into voice conversion-automatically adapting a source speaker into a target speaker based on a small corpus of examples. In this thesis we consider a variant of voice conversion that focuses on the the task of localising speechenabled software: accent transformation. An accent transformation describes a single mapping that can be applied to speech from any voice in a source accent to produce similar speech in a target accent. We consider an approach to accent transformation suitable if it successfully changes the perceived accent of speech, maintains speech quality and naturalness, and requires minimal time and resources to develop. This thesis investigates potential approaches to the modification of pronunciation and intonation for accent transformation, and assesses their suitability. To avoid gross speech errors found to result from modifying formants by linear predictive pole rotation and by frequency warping, we describe an approach to pronunciation modification based on independent exponential functions. Modifications to the formants are determined from linguistic analyses of accents of English, and we detail empirically determined rule-based solutions to handling spectral slope, mapping phone labels to formant space, normalising vocal tract size, and modelling phone targets and transitions across time. We determine appropriate intonation modifications with an instance-based approach using limited data. We propose a representation of fundamental frequency contours with many suitable characteristics for instance-based corpora, based on the discrete cosine transform (DCT). We compare matching based on phrase features against matching DCT coefficients. We show results of a perceptual study indicating that speech intensity is an important feature in accent transformation. Methods for including intensity in intonation transformation are outlined. The overall approach, and the individual pronunciation and intonation trans formations, are evaluated in a number of perceptual tests. These show that the transformations are able to make significant changes in perceived accent from British RP toward a variety of target accents.