Show simple item record

dc.contributor.authorRomdhani, Sihem 13:55:21 (GMT) 13:55:21 (GMT)
dc.description.abstractGaussian Mixture Model-Hidden Markov Models (GMM-HMMs) are the state-of-the-art for acoustic modeling in speech recognition. HMMs are used to model the sequential structure and the temporal variability in speech signals. However, GMMs are used to model the local spectral variability in the sound wave at each HMM state. Attempts to use Artificial Neural Networks (ANNs) to substitute GMMs in HMM-based acoustic models led to dismal results for many years. In fact, ANNs could not significantly outperform GMMs due to their shallow architectures. In addition, it was difficult to train networks with many hidden layers on large amount of data using the back-propagation learning algorithm. In recent years, with the establishment of deep learning technique, ANNs with many hidden layers have been reintroduced as an alternative to GMMs in acoustic modeling, and have shown successful results. The deep learning technique consists of a two-phase procedure. First, the ANN is generatively pre-trained using an unsupervised learning algorithm. Then, it is discriminatively fine-tuned using the back-propagation learning algorithm. The generative pre-training intends to initialize the weights of the network for better generalization performance during the discriminative phase. Combining Deep Neural Networks (DNNs) and HMMs within a single hybrid architecture for acoustic modeling have shown promising results in many speech recognition tasks. This thesis aims to empirically confirm the capability of DNNs to outperform GMMs in acoustic modeling. It also provides a systematic procedure to implement DNN-HMM acoustic models for phoneme recognition, including the implementation of a GMM-HMM baseline system. This thesis starts by providing a thorough overview of the fundamentals and background of speech recognition. The thesis then discusses DNN architecture and learning technique. In addition, the problems of GMMs and the advantages of DNNs in acoustic modeling are discussed. Finally, DNN-HMM hybrid acoustic modes for phoneme recognition are implemented. The deployed DNN is generatively pre-trained and fine-tuned to produce a posterior distribution over the states of mono-phone HMMs. The developed DNN-HMM phoneme recognition system outperform the GMM-HMM baseline on the TIMIT core test set. An in-depth investigation into the major factors behind the success of DNNs is carried out.en
dc.publisherUniversity of Waterlooen
dc.subjectDeep Neural Networken
dc.subjectAcoustic Modelen
dc.subjectAutomatic Speech Recognitionen
dc.subjectPhoneme Recognitionen
dc.titleImplementation of DNN-HMM Acoustic Models for Phoneme Recognitionen
dc.typeMaster Thesisen
dc.subject.programElectrical and Computer Engineeringen and Computer Engineeringen
uws-etd.degreeMaster of Applied Scienceen

Files in this item


This item appears in the following Collection(s)

Show simple item record


University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages