Abstract:
Speech privacy is paramount to providing confidential services such as medical
appointments, and a major contributor to comfort in offices. By masking
speech with an interfering noise (masker), speech privacy can be achieved without
the need to rebuild spaces. This thesis explores new masker designs that
improve the efficiency of speech masking while reducing resulting annoyance.
Previous studies have proven that matching the fundamental frequencies of
the target speech and the masker improves masking effectiveness, but these
studies apply a single fundamental frequency for the masker by using the time-averaged
fundamental frequency of the target speech. This thesis investigates
the potential for a masker that instantaneously adapts to the property of the
target speech, by matching the fundamental frequency of the masker to that
of the target speech to improve masking effect while keeping the annoyance
caused by the masker as low as possible.
As such, there are two research questions:
• Does matching dynamic fundamental frequency improve masking effectiveness?
• Does matching dynamic fundamental frequency reduce resulting annoyance?
To answer these research questions, a new masker design was proposed. A
pitch estimation algorithm was applied to estimate the fundamental frequency
of the target speech signal every ten milliseconds. The estimated frequency was
used to tune a comb filter, which was applied to a seed signal of the masker
along with a bandpass filter. Two types of noise, pink noise and babble noise,
were used as the seed signal of the designed masker.
Alongside this proposed design, stationary maskers based on the overall
average fundamental frequency of the target speech were created. The proposed
design, stationary and seed maskers were compared using an online subjective
listening test. Twelve male and twelve female volunteers between the ages
of 18-60 were recruited as participants. The test asked participants to rank
annoyance on a scale from one to ten, one being the most annoying, then asked
them to transcribe the sentence or any words they were able to hear to measure
the intelligibility of masked speech. This intelligibility test was used to assess
the effectiveness of the masker.
The results of the intelligibility test showed that the effectiveness of the
seed maskers and the proposed design were similar, but the stationary maskers
were much less effective. For annoyance, the unprocessed babble noise was
least annoying, followed by the proposed design then the stationary masker.
There was no significant difference in annoyance between the three maskers
originated from pink noise. These results did not support the hypothesis
and may have been the result of allowing too many speech-dominant spectrotemporal
regions (glimpses) to remain unmasked in the new masker design.
The future recommendations of this research are to further investigate the
effect of dynamically matching fundamental frequency while controlling the
proportion of glimpses in each test sentence.