Biologically inspired methods in speech recognition and synthesis: closing the loop

Bekolay, Trevor

Biologically inspired methods in speech recognition and synthesis: closing the loop

dc.contributor.advisor	Eliasmith, Chris
dc.contributor.author	Bekolay, Trevor
dc.date.accessioned	2016-02-18T14:17:17Z
dc.date.available	2016-02-18T14:17:17Z
dc.date.issued	2016-02-18
dc.date.submitted	2016-02-16
dc.description.abstract	Current state-of-the-art approaches to computational speech recognition and synthesis are based on statistical analyses of extremely large data sets. It is currently unknown how these methods relate to the methods that the human brain uses to perceive and produce speech. In this thesis, I present a conceptual model, Sermo, which describes some of the computations that the human brain uses to perceive and produce speech. I then implement three large-scale brain models that accomplish tasks theorized to be required by Sermo, drawing upon techniques in automatic speech recognition, articulatory speech synthesis, and computational neuroscience. The first model extracts features from an audio signal by performing a frequency decomposition with an auditory periphery model, then decorrelating the information in that power spectrum with methods commonly used in audio and image compression. I show that the features produced by this model implemented with biologically plausible spiking neurons can be used to classify phones in pre-segmented speech with significantly better accuracy than the features typically used in automatic speech recognition systems. Additionally, I show that this model can be used to compare auditory periphery models in terms of their ability to support phone classification of pre-segmented speech. The second model uses a symbol-like neural representation of a sequence of syllables to generate a trajectory of premotor commands that can be used to control an articulatory synthesizer. I show that the model can produce trajectories up to several seconds in length from a static syllable sequence representation that result in intelligible synthesized speech. The trajectories reflect the high temporal variability of human speech, and smoothly transition between successive syllables, even in rapid utterances. The third model classifies syllables from a trajectory of premotor commands. I show that the model is able to classify syllables online despite high temporal variability, and can produce the same syllable representations used by the second model. These two models can be connected in future work in order to implement a closed-loop sensorimotor speech system. Unlike current computational approaches, all three of these models are implemented with biologically plausible spiking neurons, which can be simulated with neuromorphic hardware, and can interface naturally with artificial cochleas. All models are shown to scale to the level of adult human vocabularies in terms of the neural resources required, though limitations on their performance as a result of scaling will be discussed.	en
dc.identifier.uri	http://hdl.handle.net/10012/10269
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	speech	en
dc.subject	spiking neural networks	en
dc.subject	computational neuroscience	en
dc.title	Biologically inspired methods in speech recognition and synthesis: closing the loop	en
dc.type	Doctoral Thesis	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Eliasmith, Chris
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Bekolay_Trevor.pdf
Size:: 10.54 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.17 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science