Show simple item record

dc.contributor.authorXu, Ruizhou
dc.date.accessioned2022-08-19 17:39:57 (GMT)
dc.date.available2022-08-19 17:39:57 (GMT)
dc.date.issued2022-08-19
dc.date.submitted2022-08-10
dc.identifier.urihttp://hdl.handle.net/10012/18586
dc.description.abstractNeural networks are powerful solutions to help with decision making and solve complex problems in recent years. In the domain of natural language processing, BERT and its variants significantly outperform other network structures. It learns general linguistic knowledge from a large corpus in the pre-training stage and utilizes that to solve downstream tasks. However, the size of this kind of model is enormous and causes the issue of overparameterization. This makes the model deployment on small edge devices less scalable and flexible. In this thesis, we study how to compress the BERT model in a structured pruning manner. We proposed the neural slimming technique to assess the importance of each neuron and designed the cost function and pruning strategy to remove neurons that make zero or less contribution to the prediction. After getting fine-tuned on the downstream tasks, the model can learn a more compact structure, and we name it SlimBERT. We tested our method on 7 GLUE tasks and used only 10% original parameters to recover 94% of the original performance. It also reduced the run-time memory and increased the inference speed at the same time. Compared to knowledge distillation methods and other structured pruning methods, the proposed approach achieved better performance under different metrics with the same compression ratio. Moreover, our method also improved the interpretability of BERT. By analyzing neurons with a significant contribution, we can observe that BERT utilizes different components and subnetworks according to different tasks.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://gluebenchmark.com/en
dc.subjectmodel compressionen
dc.subjectNLPen
dc.subjectpre-trained language modelen
dc.subjectstructured pruningen
dc.subjectBERTen
dc.titleCompression and Analysis of Pre-trained Language Model using Neural Slimmingen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.embargo.terms0en
uws.contributor.advisorKarray, Fakhri
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages