Abstract:
A language model is used by a speech recogniser to guide a search process or choose between alternative recognition hypotheses. The language model makes its choice on the basis of the likelihood that the hypothesis could be generated by the model. Language models that incorporate information about longer range grammatical effects, such as phrase structure grammars, tend to be computationally expensive and so are not commonly used. The hypotheses generated by the acoustic search component of a speech recogniser are typically represented as a simple list-called an N-best list-or as a wordgraph. A word-graph compactly stores the hypotheses in a network structure with a single branch for words or phases common to multiple hypotheses. In this thesis a new phrase structure grammar parser capable of operating on word-graphs is described. Based on the Earley parser, this graph parser parses every hypothesis in a word-graph in a single traversal of the graph. This results in a significant computational saving and an increase in recogniser accuracy as it allows more hypotheses to be considered. Experiments have been performed with the graph parser on word-graphs generated from the common Resource Management benchmark. The graph parser achieved a 8.0% reduction in word error rate on the speaker dependent task. At the same time, it required only 13% of the computation that processing an equivalent N-best list would require. During the course of the research a new, flexible, speech recognition architecture, called ARISTOTLE, was developed. ARISTOTLE is a speaker independent, large vocabulary, continuous speech system. Through the use of a built-in scripting language; modular structure; client-server communications; and implementation of the most commonly used algorithms, ARISTOTLE is a tool for research into new methods and techniques used in speech recognition.