Multi-Resolution and Asymmetric Implementation of Attention in Transformers

Chaudhry, Zaid

Multi-Resolution and Asymmetric Implementation of Attention in Transformers

dc.contributor.advisor	Poupart, Pascal
dc.contributor.author	Chaudhry, Zaid
dc.date.accessioned	2022-04-29T13:36:15Z
dc.date.available	2022-04-29T13:36:15Z
dc.date.issued	2022-04-29
dc.date.submitted	2022-04-18
dc.description.abstract	Transformers are the state-of-the-art for machine translation and grammar error correction. One of the most important components of transformers are the attention layers, but they require significant computational power. We suggest a new way of looking at the “mixing” mechanisms of tokens by doing a multi-resolution implementation of attention, which maintains inference results while also improving training and inference speed, thus getting the best of both worlds. This approximation can be applied in symmtrical and asymmetrical manner within and across attention layers. We also suggest an interesting alternative for the softmax layer in attention. We also analyzed some other hyperparameters in detail. For example, our experiments indicate that we can have asymmetry among the attention layers w.r.t. number of heads, while still achieving similar results. In many cases, reducing the number of heads improves inference results. We also explored the role of weighting matrices for query, key, and value vectors; and show that in case of self-attention, absence of these matrices results in the collapse of the attention layers to an identity matrix.	en
dc.identifier.uri	http://hdl.handle.net/10012/18197
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	cLang-8	en
dc.subject	approximation	en
dc.subject	Attention	en
dc.subject	Machine Translation	en
dc.title	Multi-Resolution and Asymmetric Implementation of Attention in Transformers	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Mathematics	en
uws-etd.degree.department	David R. Cheriton School of Computer Science	en
uws-etd.degree.discipline	Computer Science	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Poupart, Pascal
uws.contributor.affiliation1	Faculty of Mathematics	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Chaudhry_Zaid.pdf
Size:: 1 MB
Format:: Adobe Portable Document Format
Description:: Masters Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science