Single-Channel Statistical Bayesian Short-Time Fourier Transform Speech Enhancement with Deterministic A Priori Information
Reference
Degree Grantor
Abstract
Emergency service workers are constantly required to communicate in environments with very low acoustic signal to noise ratios (SNRs), where both quality and intelligibility of speech are of critical importance. Attempts to improve such aspects of speech have long been investigated under the umbrella of speech enhancement. Bayesian short-time Fourier transform (STFT) speech enhancement algorithms are a key candidate for real-time radio communications applications, as relatively good increases in speech quality can be achieved with relatively low computational complexity. In the context of empirical Bayesian statistics, a predictable or deterministic component of a speech STFT coe cient, due to information sources external to data at a given time-frequency index, may be represented as a non-zero mean in the respective a priori speech pdf, about which, there is some uncertainty (i.e., nonzero variance). Additionally, considering that public service workers often encounter few, but recurring noise sources, non-zero mean a priori pdfs are also of interest in modelling noise, where they may exploit predictable characteristics of a known noise source. Such a unimodal non-zero mean a priori speech/noise pdf is novel to Bayesian STFT speech enhancement, and the research here establishes a framework for Bayesian STFT speech enhancement under this consideration. Here, this is restricted to non-zero means representing sinusoidal signal components in both speech and noise. These components are typically underexploited in Bayesian STFT speech enhancement, and in theory, the framework established here may also be extended to more arbitrary predictable signal components. Several novel methods for the estimation of the amplitude, phase and frequency of potentially non-stationary sinusoidal deterministic components in speech and noise are presented. These estimated signal features may then specify a non-zero mean a priori pdf, allowing the development of several novel estimators for the clean speech STFT. The parameter estimation methods, and the clean speech estimators that are dependent upon them, are then combined to form a number of speech enhancement algorithms. The ideas developed in this research result in both improved recovery of speech information and improved removal of undesirable noise features, according to a range of quality/intelligibility measures under a range of conditions.