Show simple item record

dc.contributor.authorChaudhry, Zaid
dc.date.accessioned2022-04-29 13:36:15 (GMT)
dc.date.available2022-04-29 13:36:15 (GMT)
dc.date.issued2022-04-29
dc.date.submitted2022-04-18
dc.identifier.urihttp://hdl.handle.net/10012/18197
dc.description.abstractTransformers are the state-of-the-art for machine translation and grammar error correction. One of the most important components of transformers are the attention layers, but they require significant computational power. We suggest a new way of looking at the “mixing” mechanisms of tokens by doing a multi-resolution implementation of attention, which maintains inference results while also improving training and inference speed, thus getting the best of both worlds. This approximation can be applied in symmtrical and asymmetrical manner within and across attention layers. We also suggest an interesting alternative for the softmax layer in attention. We also analyzed some other hyperparameters in detail. For example, our experiments indicate that we can have asymmetry among the attention layers w.r.t. number of heads, while still achieving similar results. In many cases, reducing the number of heads improves inference results. We also explored the role of weighting matrices for query, key, and value vectors; and show that in case of self-attention, absence of these matrices results in the collapse of the attention layers to an identity matrix.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.uricLang-8en
dc.subjectapproximationen
dc.subjectAttentionen
dc.subjectMachine Translationen
dc.titleMulti-Resolution and Asymmetric Implementation of Attention in Transformersen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorPoupart, Pascal
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages