UWSpace staff members will be away from May 5th to 9th, 2025. We will not be responding to emails during this time. If there are any urgent issues, please contact GSPA at gsrecord@uwaterloo.ca. If any login or authentication issues arise during this time, please wait until UWSpace Staff members return on May 12th for support.
 

Multi-Resolution and Asymmetric Implementation of Attention in Transformers

dc.contributor.advisorPoupart, Pascal
dc.contributor.authorChaudhry, Zaid
dc.date.accessioned2022-04-29T13:36:15Z
dc.date.available2022-04-29T13:36:15Z
dc.date.issued2022-04-29
dc.date.submitted2022-04-18
dc.description.abstractTransformers are the state-of-the-art for machine translation and grammar error correction. One of the most important components of transformers are the attention layers, but they require significant computational power. We suggest a new way of looking at the “mixing” mechanisms of tokens by doing a multi-resolution implementation of attention, which maintains inference results while also improving training and inference speed, thus getting the best of both worlds. This approximation can be applied in symmtrical and asymmetrical manner within and across attention layers. We also suggest an interesting alternative for the softmax layer in attention. We also analyzed some other hyperparameters in detail. For example, our experiments indicate that we can have asymmetry among the attention layers w.r.t. number of heads, while still achieving similar results. In many cases, reducing the number of heads improves inference results. We also explored the role of weighting matrices for query, key, and value vectors; and show that in case of self-attention, absence of these matrices results in the collapse of the attention layers to an identity matrix.en
dc.identifier.urihttp://hdl.handle.net/10012/18197
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.uricLang-8en
dc.subjectapproximationen
dc.subjectAttentionen
dc.subjectMachine Translationen
dc.titleMulti-Resolution and Asymmetric Implementation of Attention in Transformersen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorPoupart, Pascal
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chaudhry_Zaid.pdf
Size:
1 MB
Format:
Adobe Portable Document Format
Description:
Masters Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: