Abstract:
Apart from recognition accuracy, decoding speed and vocabulary size, another point
of consideration when developing a practical ASR application is the adaptability of the
system. An ASR system is more useful if it can cope with changes that are introduced
by users, for example, new words and new grammar rules. In addition, the system can
also automatically update the underlying knowledge sources, such as language model
probabilities, for better recognition accuracy. Since the knowledge sources need to be
adaptable, it is in°exible to statically combine them. It is because on-line modi¯cation
becomes di±cult once all the knowledge sources have been combined into one static search
space.
The second objective of the thesis is to develop an algorithm which allows dynamic
integration of knowledge sources during decoding. In this approach, each knowledge source
is represented by a weighted ¯nite state transducer (WFST). The knowledge source that
is subject to adaptation is factorized from the entire search space. The adapted knowledge
source is then combined with the others during decoding. In this thesis, we propose a
generalized dynamic WFST composition algorithm, which avoids the creation of non-
coaccessible paths, performs weight look-ahead and does not impose any constraints to
the topology of the WFSTs. Experimental results on Wall Street Journal (WSJ1) 20k-
word trigram task show that our proposed approach has a better word accuracy versus
real-time factor characteristics than other dynamic composition approaches.