Language Model Inference on FPGA with Integer-only Operations

dc.contributor.authorBekmyrza, Marat
dc.date.accessioned2025-04-17T12:34:04Z
dc.date.available2025-04-17T12:34:04Z
dc.date.issued2025-04-17
dc.date.submitted2025-04-14
dc.description.abstractLarge Language Models (LLMs) are currently dominating the field of Artificial Intelligence (AI) applications, but their integration for edge computing purposes is rather limited due to computational complexity and power consumption. This thesis addresses this challenge by investigating the integer-only acceleration of transformer models on FPGAs, focusing on the BERT architecture. We demonstrate that by removing the floating-point operations from the inference pipeline, especially from non-linear functions like GELU, Softmax, and Layer Normalization, we can improve the performance without sacrificing accuracy. Our pipelined, batched architecture processes multiple sequences in parallel and optimizes the FPGA resources. We achieve a 2.6x throughput improvement over a single-sequence inference and at least 10x speedup over the offloading to CPU approach. The results of the experiments show that our implementation has comparable accuracy to the floating-point models for the GLUE benchmark tasks with INT8 quantization. These findings reveal that integer-only transformer inference on FPGAs is a feasible way of implementing complex language models on resource-limited edge devices, with the potential for new privacy-conscious, low-latency AI applications.
dc.identifier.urihttps://hdl.handle.net/10012/21600
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectLarge Language Models
dc.subjectArtificial Intelligence
dc.subjectFPGA
dc.subjectInference Acceleration
dc.titleLanguage Model Inference on FPGA with Integer-only Operations
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentElectrical and Computer Engineering
uws-etd.degree.disciplineElectrical and Computer Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms4 months
uws.contributor.advisorKapre, Nachiket
uws.contributor.advisorPatel, Hiren
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Bekmyrza_Marat.pdf
Size:
629.92 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: