Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics

dc.contributor.authorMa, Zongqiangen
dc.date.accessioned2006-07-28T19:25:20Z
dc.date.available2006-07-28T19:25:20Z
dc.date.issued2000en
dc.date.submitted2000en
dc.description.abstractThe conventional hidden Markov model (HMM) has achieved significant progress in speech recognition area. However, it is far from perfect. To overcome the well-known limitations of HMM, a new statistical dynamic model is developed and investigated in this dissertation. The main novelty of this new model is the introduction of the vocal tract resonance (VTR) as the internal, structured hidden state for representing phonetic reduction and target undershoot in human production of spontaneous speech and the incorporation of pre-knowledge about the VTR dynamics into the model design, training, and likelihood computation processes. The earliest nonlinear version of the model, originally proposed in [33], is first evaluated and investigated on the Switchboard speech database. Compared with a baseline HMM system it turns out better performance. Based on investigation on the nonlinear version and in consideration of the systematic variations in speech, two new versions are then developed. One is called a mixture linear dynamic model and the other one a mixture linear dynamic model with switching parameters on measurement equations. Both versions overcome the inefficiency in the parameter learning and likelihood computation process of the nonlinear version. The later version uses piece-wise linear functions rather than linear functions to alleviate the inaccuracy of the former version in approximating the physically nonlinear relationship between the hidden state space and acoustic space. The later version is a more general case of the former version. Evaluation experiments demonstrate that both versions produce large improvements. Search, a challenging problem for the new dynamic model, is finally addressed. Based on analyses of that problem for the new dynamic model, is finally addressed. Based on analyses of that problem, three decoding algorithms (a path-stack decoding algorithm, a second-order generalized pseudo-Bayesian decoding algorithm and an interacting multiple model decoding algorithm) are designed. Experiment results show that they all are effective. Consistent improvements are observed when the most efficient one is used on different versions of the dynamic model.en
dc.formatapplication/pdfen
dc.format.extent5976708 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10012/591
dc.language.isoenen
dc.pendingfalseen
dc.publisherUniversity of Waterlooen
dc.rightsCopyright: 2000, Ma, Zongqiang. All rights reserved.en
dc.subjectHarvested from Collections Canadaen
dc.titleSpontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamicsen
dc.typeDoctoral Thesisen
uws-etd.degreePh.D.en
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NQ53993.pdf
Size:
4.59 MB
Format:
Adobe Portable Document Format