Studying Transformer Behavior Under Markovian Input

dc.contributor.authorSoltan Mohammadi, Negar
dc.date.accessioned2026-01-20T14:06:52Z
dc.date.available2026-01-20T14:06:52Z
dc.date.issued2026-01-20
dc.date.submitted2026-01-13
dc.description.abstractTransformers have achieved remarkable success in modeling sequential data, yet a principled theoretical understanding of their behavior remains limited. A recent framework has analyzed transformers through the lens of first-order Markov chains, providing four theorems that characterize the loss landscape and the conditions under which global minima and bad local minima arise. However, the theoretical analysis in that work was restricted to first-order processes, leaving open questions about the behavior of transformers on higher-order Markovian data. This thesis extends the theoretical framework to the case of second-order Markov chains. Specifically, all four theorems originally established for first-order chains are formally proven for second-order chains, thereby broadening the mathematical foundation for analyzing transformers on sequential data. Also, we show that these theorems are also true on another type of transformer architecture (attention-only transformers), given first- and second-order Markovian input data. Furthermore, experimental evaluations demonstrate that the empirical learning dynamics for second-order chains align closely with the simplified second-order model proposed in prior work, confirming that the theoretical predictions hold in practice. By closing the gap between first-order theory and second-order behavior, this study contributes to a deeper understanding of transformers’ sequential modeling capabilities. The findings highlight the conditions under which transformers correctly capture second-order dependencies and provide new insights into their limitations and potential extensions for higher-order processes.
dc.identifier.urihttps://hdl.handle.net/10012/22851
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.titleStudying Transformer Behavior Under Markovian Input
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentManagement Sciences
uws-etd.degree.disciplineManagement Sciences
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorGhadimi, Saeed
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Soltan_Mohammadi_Negar.pdf
Size:
828.67 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections