Show simple item record

dc.contributor.authorDib, Mohammad
dc.date.accessioned2021-10-25 17:18:38 (GMT)
dc.date.available2022-10-26 04:50:06 (GMT)
dc.date.issued2021-10-25
dc.date.submitted2021-10-08
dc.identifier.urihttp://hdl.handle.net/10012/17666
dc.description.abstractSpeaker diarization is the process of identifying who spoke when in an audio stream, and it is applied in many fields, such as information retrieval and psychotherapy. Speaker embeddings extraction is a crucial step in any diarization system, where the goal is to extract highly discriminative speaker embeddings (d-vectors). Most of the existing methods are based on deep neural networks (DNNs) and they rely on engineered features, which may not guarantee optimal performance for all cases. This led to the development of the SincNet model, which can effectively and efficiently process raw input audio signals. The SincNet model was successfully used to perform embeddings extraction in a speaker diarization system, where it resulted in a high diarization performance. Its successor, the AM-SincNet model, which combines SincNet with an improved loss function, outperformed the standard SincNet on the speaker diarization task. This shows the importance of enhancing the loss function of SincNet to achieve better diarization performance. Thus, the goal of this thesis is to improve the ability of the SincNet model to extract discriminative embeddings such that it results in a better diarization performance by experimenting with different architectures and state-of-the-art loss functions. In this thesis, 16 different SincNet based models were proposed as follows: four models that combine the SincNet architecture with four different loss functions, six models that combine the Res-SincNet architecture (a recently proposed architecture) with six different loss functions, and six models that combine the Res-SincNet-FC architecture (proposed in this thesis) with six different loss functions. The results show that the proposed MV-AM-SincNet model gives the best diarization performance compared to all the models discussed in this thesis. This shows the high capability of the MV-Softmax loss at extracting highly discriminative embeddings compared to the other losses. Additionally, the speaker recognition performance was reported, since all the models were trained for speaker recognition before being applied in speaker diarization. It was found that the proposed Res-SincNet-FC architecture resulted in the lowest frame error rate (FER) when combined with the different loss functions, where the D-Res-SincNet-FC and Arc-Res-SincNet-FC achieved the lowest FER. The Visualization of the extracted embeddings and the diarization output of the MV-AM-SincNet model showed its ability to extract highly discriminative embeddings. However, the visualization showed that having a large number of overlapping segments and/or small speaker segments impacts the diarization performance negatively. In this thesis, significant improvements on the SincNet model were made, which assists in achieving higher speaker recognition and diarization performance, where the raw audio signals were processed efficiently and effectively, without the need for feature engineering.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectspeaker diarizationen
dc.subjectspeaker recognitionen
dc.subjectSincNeten
dc.subjectspeaker embeddings extractionen
dc.titleSpeaker Diarization Using Improved SincNet Models to Extract Speaker Embeddingsen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.embargo.terms1 yearen
uws.contributor.advisorBasir, Otman
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages