Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics

Ma, Zongqiang

Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics

dc.contributor.author	Ma, Zongqiang	en
dc.date.accessioned	2006-07-28T19:25:20Z
dc.date.available	2006-07-28T19:25:20Z
dc.date.issued	2000	en
dc.date.submitted	2000	en
dc.description.abstract	The conventional hidden Markov model (HMM) has achieved significant progress in speech recognition area. However, it is far from perfect. To overcome the well-known limitations of HMM, a new statistical dynamic model is developed and investigated in this dissertation. The main novelty of this new model is the introduction of the vocal tract resonance (VTR) as the internal, structured hidden state for representing phonetic reduction and target undershoot in human production of spontaneous speech and the incorporation of pre-knowledge about the VTR dynamics into the model design, training, and likelihood computation processes. The earliest nonlinear version of the model, originally proposed in [33], is first evaluated and investigated on the Switchboard speech database. Compared with a baseline HMM system it turns out better performance. Based on investigation on the nonlinear version and in consideration of the systematic variations in speech, two new versions are then developed. One is called a mixture linear dynamic model and the other one a mixture linear dynamic model with switching parameters on measurement equations. Both versions overcome the inefficiency in the parameter learning and likelihood computation process of the nonlinear version. The later version uses piece-wise linear functions rather than linear functions to alleviate the inaccuracy of the former version in approximating the physically nonlinear relationship between the hidden state space and acoustic space. The later version is a more general case of the former version. Evaluation experiments demonstrate that both versions produce large improvements. Search, a challenging problem for the new dynamic model, is finally addressed. Based on analyses of that problem for the new dynamic model, is finally addressed. Based on analyses of that problem, three decoding algorithms (a path-stack decoding algorithm, a second-order generalized pseudo-Bayesian decoding algorithm and an interacting multiple model decoding algorithm) are designed. Experiment results show that they all are effective. Consistent improvements are observed when the most efficient one is used on different versions of the dynamic model.	en
dc.format	application/pdf	en
dc.format.extent	5976708 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/10012/591
dc.language.iso	en	en
dc.pending	false	en
dc.publisher	University of Waterloo	en
dc.rights	Copyright: 2000, Ma, Zongqiang. All rights reserved.	en
dc.subject	Harvested from Collections Canada	en
dc.title	Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Ph.D.	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: NQ53993.pdf
Size:: 4.59 MB
Format:: Adobe Portable Document Format

Download

Collections

Digitized University of Waterloo Theses