Speech Masker Design Using Dynamic Fundamental Frequency Matching

Lo, Charlene

Speech Masker Design Using Dynamic Fundamental Frequency Matching

Lo, Charlene

Identifier: https://hdl.handle.net/2292/61611

Issue Date: 2022

Degree Grantor: The University of Auckland

Rights: Copyright: the author

Rights (URI): https://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm

Abstract:

Speech privacy is paramount to providing confidential services such as medical appointments, and a major contributor to comfort in offices. By masking speech with an interfering noise (masker), speech privacy can be achieved without the need to rebuild spaces. This thesis explores new masker designs that improve the efficiency of speech masking while reducing resulting annoyance. Previous studies have proven that matching the fundamental frequencies of the target speech and the masker improves masking effectiveness, but these studies apply a single fundamental frequency for the masker by using the time-averaged fundamental frequency of the target speech. This thesis investigates the potential for a masker that instantaneously adapts to the property of the target speech, by matching the fundamental frequency of the masker to that of the target speech to improve masking effect while keeping the annoyance caused by the masker as low as possible. As such, there are two research questions: • Does matching dynamic fundamental frequency improve masking effectiveness? • Does matching dynamic fundamental frequency reduce resulting annoyance? To answer these research questions, a new masker design was proposed. A pitch estimation algorithm was applied to estimate the fundamental frequency of the target speech signal every ten milliseconds. The estimated frequency was used to tune a comb filter, which was applied to a seed signal of the masker along with a bandpass filter. Two types of noise, pink noise and babble noise, were used as the seed signal of the designed masker. Alongside this proposed design, stationary maskers based on the overall average fundamental frequency of the target speech were created. The proposed design, stationary and seed maskers were compared using an online subjective listening test. Twelve male and twelve female volunteers between the ages of 18-60 were recruited as participants. The test asked participants to rank annoyance on a scale from one to ten, one being the most annoying, then asked them to transcribe the sentence or any words they were able to hear to measure the intelligibility of masked speech. This intelligibility test was used to assess the effectiveness of the masker. The results of the intelligibility test showed that the effectiveness of the seed maskers and the proposed design were similar, but the stationary maskers were much less effective. For annoyance, the unprocessed babble noise was least annoying, followed by the proposed design then the stationary masker. There was no significant difference in annoyance between the three maskers originated from pink noise. These results did not support the hypothesis and may have been the result of allowing too many speech-dominant spectrotemporal regions (glimpses) to remain unmasked in the new masker design. The future recommendations of this research are to further investigate the effect of dynamically matching fundamental frequency while controlling the proportion of glimpses in each test sentence.

Show full item record