Abstract:
Speaker Recognition focuses on creating systems which can recognise human speakers based on their speech waveforms. At the present time, speaker recognition systems perform poorly when compared to the natural ability of humans. In this research, a number of phoneme based algorithms and methods have been explored to increase the performance of speaker recognition systems. Analysis has determined that there is great variability in how much certain phonemes contribute to the recognition accuracy of the system. A number of different segments of speech have been identified which are susceptible to corruption due to noise or perform poorly in terms of recognition. This includes silences, unvoiced speech and transition regions between two consecutive phonemes. Short intra-speech silences and unvoiced speech do provide physiological information regarding the speaker’s identity but we have determined that removing them increases the robustness of the system to noise. Transition sections of speech are detrimental to recognition under all conditions. For each of these identified regions of poor recognition performance, algorithms have been developed to attempt to exclude those regions. The most successful method involves using the cross-correlation function to segment speech signals and remove transition sections. Across a range of different types of noise, this method improved the recognition by an average of 11.00% when tested with a database of 100 speakers. Another approach is also investigated in this research. Traditional methods of speaker recognition rely on choosing who the most likely speaker based on maximum likelihood estimation (MLE). We developed a new decision making algorithm which relies on choosing who the speaker of the signal is, based on the ordinal of each speaker rather than the MLE. This method has been called Ordinal-based Decision Algorithm (OBDA). This method has, 10.58% average improvement over MLE based decision making when tested over a range of noise types and SNRs.