Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation

Show simple item record

dc.contributor.advisor Abdulla, WH en
dc.contributor.author Bao, Feng en
dc.date.accessioned 2019-04-10T23:06:08Z en
dc.date.issued 2018 en
dc.identifier.uri http://hdl.handle.net/2292/46383 en
dc.description.abstract Computational auditory scene analysis (CASA) has shown a great potential for speech enhancement compared to some statistical model-based methods. A challenge for CASA is how to estimate binary mask or ratio mask effectively in each time-frequency (T-F) unit. In this thesis, four speech enhancement methods with binary mask or ratio mask estimation are proposed based on the spectral relationship among noisy speech, pure noise and clean speech. The common use of fixed thresholds in the conventional CASA method constrains segregation and T-F unit labeling, affecting the performance of de-noising. Thus, an adaptive factor is first derived from the power spectra of noisy speech and estimated noise to replace those fixed thresholds. As a result, noise reduction is achieved with improved pitch contour and T-F unit labeling. A new binary mask estimation method is proposed based on convex optimization to reduce artifacts and temporal discontinuity caused by the inaccuracy of binary mask estimation. Signal segregation and pitch estimation are not needed in this method; only speech power is considered as a key cue for labeling the binary mask. The cross-correlation between the noisy speech and estimated noise power spectra in each channel is employed to build the objective function. The T-F units of speech and noise are labeled with a decision factor derived from the powers of noisy speech, estimated speech, and pre-estimated noise respectively. Erroneous local masks are refined by time-frequency unit smoothing. As a consequence, noise is effectively reduced and the perceptual quality of the enhanced speech is improved. A new estimation method of ratio mask in terms of Wiener filtering is proposed in order to further increase the temporal continuity of reconstructed speech. In this method, the speech power of each T-F unit is obtained by a convex optimization method. The objective function depends also on the cross-correlation between the noisy speech and estimated noise power spectra. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time-frequency units and then smoothed by interpolating with the estimated binary masks. The results confirmed that the performances related to noise reduction, speech quality, and speech intelligibility are all improved. A novel ratio mask representation by exploiting the inter-channel correlation (ICC) among the noisy speech, pure noise and clean speech spectra is proposed to further improve enhancement performance. In this way, the power ratio of speech and noise is reallocated adaptively during the construction of ratio mask, so that more speech components are retained and more noise components are masked. In this method, the channel-weight contour based on the equal loudness hearing attribute is adopted to revise the ratio mask in each T-F unit. The developed ratio mask is utilized to train a five-layer Deep Neural Network (DNN) with other features. Experiments show significant improvements in speech quality and intelligibility compared to DNN-based methods without ICC. en
dc.publisher ResearchSpace@Auckland en
dc.relation.ispartof PhD Thesis - University of Auckland en
dc.relation.isreferencedby UoA99265150807902091 en
dc.rights Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated. Previously published items are made available in accordance with the copyright policy of the publisher. en
dc.rights.uri https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm en
dc.title Speech Enhancement Methods Based on CASA Incorporating Spectral Correlation en
dc.type Thesis en
thesis.degree.discipline Electrical and Computer Engineering en
thesis.degree.grantor The University of Auckland en
thesis.degree.level Doctoral en
thesis.degree.name PhD en
dc.rights.holder Copyright: The author en
dc.rights.accessrights http://purl.org/eprint/accessRights/OpenAccess en
pubs.elements-id 768532 en
pubs.record-created-at-source-date 2019-04-11 en
dc.identifier.wikidata Q112935584


Files in this item

Find Full text

This item appears in the following Collection(s)

Show simple item record

Share

Search ResearchSpace


Browse

Statistics