Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation

Bao, Feng

dc.contributor.advisor	Abdulla, WH	en
dc.contributor.author	Bao, Feng	en
dc.date.accessioned	2019-04-10T23:06:08Z	en
dc.date.issued	2018	en
dc.identifier.uri	http://hdl.handle.net/2292/46383	en
dc.description.abstract	Computational auditory scene analysis (CASA) has shown a great potential for speech enhancement compared to some statistical model-based methods. A challenge for CASA is how to estimate binary mask or ratio mask effectively in each time-frequency (T-F) unit. In this thesis, four speech enhancement methods with binary mask or ratio mask estimation are proposed based on the spectral relationship among noisy speech, pure noise and clean speech. The common use of fixed thresholds in the conventional CASA method constrains segregation and T-F unit labeling, affecting the performance of de-noising. Thus, an adaptive factor is first derived from the power spectra of noisy speech and estimated noise to replace those fixed thresholds. As a result, noise reduction is achieved with improved pitch contour and T-F unit labeling. A new binary mask estimation method is proposed based on convex optimization to reduce artifacts and temporal discontinuity caused by the inaccuracy of binary mask estimation. Signal segregation and pitch estimation are not needed in this method; only speech power is considered as a key cue for labeling the binary mask. The cross-correlation between the noisy speech and estimated noise power spectra in each channel is employed to build the objective function. The T-F units of speech and noise are labeled with a decision factor derived from the powers of noisy speech, estimated speech, and pre-estimated noise respectively. Erroneous local masks are refined by time-frequency unit smoothing. As a consequence, noise is effectively reduced and the perceptual quality of the enhanced speech is improved. A new estimation method of ratio mask in terms of Wiener filtering is proposed in order to further increase the temporal continuity of reconstructed speech. In this method, the speech power of each T-F unit is obtained by a convex optimization method. The objective function depends also on the cross-correlation between the noisy speech and estimated noise power spectra. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time-frequency units and then smoothed by interpolating with the estimated binary masks. The results confirmed that the performances related to noise reduction, speech quality, and speech intelligibility are all improved. A novel ratio mask representation by exploiting the inter-channel correlation (ICC) among the noisy speech, pure noise and clean speech spectra is proposed to further improve enhancement performance. In this way, the power ratio of speech and noise is reallocated adaptively during the construction of ratio mask, so that more speech components are retained and more noise components are masked. In this method, the channel-weight contour based on the equal loudness hearing attribute is adopted to revise the ratio mask in each T-F unit. The developed ratio mask is utilized to train a five-layer Deep Neural Network (DNN) with other features. Experiments show significant improvements in speech quality and intelligibility compared to DNN-based methods without ICC.	en
dc.publisher	ResearchSpace@Auckland	en
dc.relation.ispartof	PhD Thesis - University of Auckland	en
dc.relation.isreferencedby	UoA99265150807902091	en
dc.rights	Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher.	en
dc.rights.uri	https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm	en
dc.title	Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation	en
dc.type	Thesis	en
thesis.degree.discipline	Electrical and Computer Engineering	en
thesis.degree.grantor	The University of Auckland	en
thesis.degree.level	Doctoral	en
thesis.degree.name	PhD	en
dc.rights.holder	Copyright: The author	en
dc.rights.accessrights	http://purl.org/eprint/accessRights/OpenAccess	en
pubs.elements-id	768532	en
pubs.record-created-at-source-date	2019-04-11	en
dc.identifier.wikidata	Q112935584

Files in this item

Name: whole.pdf

Size: 9.974Mb

Format: PDF

This item appears in the following Collection(s)

Doctoral Theses [6824]

Show simple item record

Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation

Files in this item

This item appears in the following Collection(s)

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics

Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation

Files in this item

This item appears in the following Collection(s)

Share

Search ResearchSpace

Browse

All of ResearchSpace

This Collection

Statistics