Speech enhancement using voice source models

Yasmin, Anisa2006-07-282006-07-2819991999http://hdl.handle.net/10012/422Autoregressive (AR) models have been shown to be effective models of speech signal. However, although it is the most common model of speech, an AR process excited by white noise for speech enhancement, fails to capture the effects of source excitation, especially the quasi period nature of voiced speech. Speech synthesis researchers have long recognized this problem and have developed a variety of sophisticated excitation models. Such models have yet to make an impact in speech enhancement. We have concentrated our research on modifying the conventional white noise excited AR model for various speech classes and on establishing performance benchmarks by studying speech-enhancement, using the proposed models, in detail for individual phonemes under arbitrarily well-characterized circumstances. We have proposed three different types of impulsive excitation models for an AR model for various phoneme classes based on the type of excitation with which each class is associated. For voiced speech, the effect of the glottal excitation is simulated by a train of impulses spaced according to pitch periods. For unvoiced stops and unvoiced affricates, the excitation source is modeled by a single impulse marking the instant of the onset of the burst and a white noise term. For voiced stops and voiced affricates, a mixed excitation of the plosive driving term and a quasi-periodic train of impulses are used. For voiced fricatives a mix of excitation of white noise and a quasi-periodic train of impulses separated by pitch periods is used. In each case, impulsive AR models outperformed their white-noise-driven counterparts. The success of the tentative impulsive excitation models has motivated us towards applying a more sophisticated excitation model. We have chosen one of the most common excitation source models, the four-parameter model of Fant, Liljencrants and Lin[1], which is also known as an LF model and applied it to the enhancement of individual voiced phonemes. We have proposed a novel two step optimization algorithm for estimating the parameters for an LF model. Among the AR models with three different types of excitation models (a conventional white-noise excitation, an impulsive excitation and an LF model), the LF excitation model yields the best performance in speech enhancement in terms of the output signal-to-noise ratios (SNRs).application/pdf6180813 bytesapplication/pdfenCopyright: 1999, Yasmin, Anisa. All rights reserved.Harvested from Collections CanadaSpeech enhancement using voice source modelsDoctoral Thesis