Abstract:
This thesis investigates different aspects of the CDMA mobile phone network and quantifies their impact on the performance of Forensic Voice Comparison (FVC) analysis. The term FVC refers to the process of comparing suspect and offender voice samples in order to asses the strength of the speech evidence for a court of law. There exists an assumption among some forensic speech scientists that all mobile phone networks are similar with respect to their underlying technology, and therefore in their potential impact on the speech signal. However, within this arena there are a number of network providers utilizing a variety of technologies, such as the Global System for Mobile Communications (GSM) and Code Division Multiple Access (CDMA). These technologies are fundamentally different and they incorporate different mechanisms with respect to handling the speech signal. Therefore any assumption that they impact similarly on the speech signal is not correct. This thesis focuses on the CDMA network. As will be discussed, the only component in these networks that can directly impact upon the quality of transmitted speech is the speech codec. Other factors such as poor wireless channel conditions, congestion related to the number of users, channel noise, etc., do not impact directly on the speech signal, but rather indirectly. In such cases, a set of instructions will be sent from the mobile phone network to the speech codec, which in turn will change its mode of operation to mitigate the impact of these factors. With the above facts in mind, a new software platform has been developed as part of this research to simulate CDMA mobile phone speech. It makes use of the publicly available routines for the CDMA speech codec. Using this platform, speech data can be passed through the codec under various modes of operation, while taking into account the underlying rules under which these might be initiated. This approach can encompass a large number of possible scenarios in the network and makes it possible to relatively easily transform any existing speech database into a CDMA-quality speech database. It also allows studying the impact of different aspects of the network in isolation from other factors. There are four key aspects of the CDMA mobile phone network which can directly impact the speech signal. These are: (i) Dynamic Rate Coding (DRC), (ii) handling of Frame Loss (FL), (iii) Background Noise (BN) at the transmitting end, and (iv) handling of Silence Frames (SF). The latter aspect has not been investigated in this study because in FVC only active speech is of interest. As far as possible, the impacts of the first three aspects have been investigated in isolation. As will be explained, though, this is not entirely possible because DRC is always occurring. But it is possible to constrain it to some extent. With respect to the analysis technique used in this study to quantify the strength of speech evidence, this thesis presents a new approach called Principal Component Analysis Kernel Likelihood Ratio (PCAKLR). This is essentially an alternative to one of the commonly used approaches, namely Multivariate Kernel Density (MVKD). It is shown that PCAKLR exhibits similar FVC performance to MVKD for a small number of parameters. Most importantly, though, it is computationally robust irrespective of the number of speech parameters used, an aspect of importance in terms of the speech parameter set used in this research. PCAKLR also has a feature which allows it to handle within-segment as well as between-segment correlations simultaneously. This provides an alternative way to fuse results from multiple speech segments instead of using the standard logistic regression. Among the various speech parameter sets commonly used in FVC, it is shown that Mel- Frequency Cepstral Coefficients (MFCCs) are one of the best performing sets when dealing with CDMA-quality speech. This is due to the fact that MFCCs are not a function of a particular component of the speech production model that could be removed during the CDMA coding process. Rather, they roughly estimate the energy in different frequency bands of the speech signal. As far as the impact of DRC on FVC analysis is concerned, surprisingly this is shown to improve the accuracy and reliability of FVC analysis results when compared to uncoded speech. It is argued that this improvement is linked to the quantisation process inherent in the speech codec which reduces within-speaker variation. With FL, this aspect is shown to negatively impact both same- and different-speaker comparisons of a FVC analysis when low speech coding quality is used. In the case of higher-quality speech coding, FL mainly negatively impacts different-speaker comparisons. High levels of BN at the transmitting end of a CDMA mobile phone network, with Signalto- Noise Ratios (SNRs) in the range of 9 to 15 dB, are shown to significantly worsen the accuracy of a FVC analysis. This is because the task of distinguishing BN from speech becomes a difficult task for the Noise Suppression (NS) process inherent in the CDMA speech codec, which begins to remove part of the original speech along with the BN present. However, the results of this study suggest that if a call is made from a highly congested area (e.g., a city centre), the negative impact of BN on FVC is likely to be less. This is due to the fact that when a large number of users access a cell site simultaneously, low speech coding quality is used to minimize the co-user interference in the CDMA network. This low-quality coding uses a different set of coding algorithms to the higher-coding qualities. As will be explained, specifically it repeats information from previous frames, which can mask some of the impact caused by BN. In order to examine more realistic scenarios in the CDMA network, all the three aspects have been brought together and their impact on FVC assessed. It is shown that degradation in FVC accuracy results and this can be even more significant under mismatch conditions between the suspect, offender and background data. It is also shown that an improved accuracy can be obtained by passing the background data through the CDMA codec prior to FVC analysis. Though this goes a long way to mitigating the impact of the CDMA mobile phone network, it is still not as good as analysis under matched conditions using clean speech.