dc.contributor.advisor |
Abdulla, WH |
en |
dc.contributor.author |
Bao, Feng |
en |
dc.date.accessioned |
2019-04-10T23:06:08Z |
en |
dc.date.issued |
2018 |
en |
dc.identifier.uri |
http://hdl.handle.net/2292/46383 |
en |
dc.description.abstract |
Computational auditory scene analysis (CASA) has shown a great potential for speech enhancement compared to some statistical model-based methods. A challenge for CASA is how to estimate binary mask or ratio mask effectively in each time-frequency (T-F) unit. In this thesis, four speech enhancement methods with binary mask or ratio mask estimation are proposed based on the spectral relationship among noisy speech, pure noise and clean speech. The common use of fixed thresholds in the conventional CASA method constrains segregation and T-F unit labeling, affecting the performance of de-noising. Thus, an adaptive factor is first derived from the power spectra of noisy speech and estimated noise to replace those fixed thresholds. As a result, noise reduction is achieved with improved pitch contour and T-F unit labeling. A new binary mask estimation method is proposed based on convex optimization to reduce artifacts and temporal discontinuity caused by the inaccuracy of binary mask estimation. Signal segregation and pitch estimation are not needed in this method; only speech power is considered as a key cue for labeling the binary mask. The cross-correlation between the noisy speech and estimated noise power spectra in each channel is employed to build the objective function. The T-F units of speech and noise are labeled with a decision factor derived from the powers of noisy speech, estimated speech, and pre-estimated noise respectively. Erroneous local masks are refined by time-frequency unit smoothing. As a consequence, noise is effectively reduced and the perceptual quality of the enhanced speech is improved. A new estimation method of ratio mask in terms of Wiener filtering is proposed in order to further increase the temporal continuity of reconstructed speech. In this method, the speech power of each T-F unit is obtained by a convex optimization method. The objective function depends also on the cross-correlation between the noisy speech and estimated noise power spectra. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time-frequency units and then smoothed by interpolating with the estimated binary masks. The results confirmed that the performances related to noise reduction, speech quality, and speech intelligibility are all improved. A novel ratio mask representation by exploiting the inter-channel correlation (ICC) among the noisy speech, pure noise and clean speech spectra is proposed to further improve enhancement performance. In this way, the power ratio of speech and noise is reallocated adaptively during the construction of ratio mask, so that more speech components are retained and more noise components are masked. In this method, the channel-weight contour based on the equal loudness hearing attribute is adopted to revise the ratio mask in each T-F unit. The developed ratio mask is utilized to train a five-layer Deep Neural Network (DNN) with other features. Experiments show significant improvements in speech quality and intelligibility compared to DNN-based methods without ICC. |
en |
dc.publisher |
ResearchSpace@Auckland |
en |
dc.relation.ispartof |
PhD Thesis - University of Auckland |
en |
dc.relation.isreferencedby |
UoA99265150807902091 |
en |
dc.rights |
Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. |
en |
dc.rights.uri |
https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm |
en |
dc.title |
Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation |
en |
dc.type |
Thesis |
en |
thesis.degree.discipline |
Electrical and Computer Engineering |
en |
thesis.degree.grantor |
The University of Auckland |
en |
thesis.degree.level |
Doctoral |
en |
thesis.degree.name |
PhD |
en |
dc.rights.holder |
Copyright: The author |
en |
dc.rights.accessrights |
http://purl.org/eprint/accessRights/OpenAccess |
en |
pubs.elements-id |
768532 |
en |
pubs.record-created-at-source-date |
2019-04-11 |
en |
dc.identifier.wikidata |
Q112935584 |
|