WasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysis

dc.contributor.advisorLam, Patrick
dc.contributor.authorRobati Shirzad, Mohammad
dc.date.accessioned2023-05-05T18:34:17Z
dc.date.available2023-05-05T18:34:17Z
dc.date.issued2023-05-05
dc.date.submitted2023-05-05
dc.description.abstractWebAssembly, or Wasm, is a low-level binary language that enables execution of near-native-performance code in web browsers. Wasm has proven to be useful in applications including gaming, audio and video processing, and cloud computing, providing a high-performance, low-overhead alternative to JavaScript in web development. The fast and widespread adoption of WebAssembly by all major browsers has created an opportunity for analysis tools that support this new technology. In this study, we performed an empirical analysis on the root-to-leaf paths of the abstract syntax trees in the WebAssembly Text format of a large dataset of WebAssembly binary files compiled from over 4,000 source packages in the Ubuntu 18.04 repositories. After refining the collected paths, the initial number of over 800,000 paths was reduced to only 3,352 unique paths that appeared across all of the binary files. With this insight, we propose two novel code representations for WebAssembly binaries. These novel representations serve not only to generate fixed-size code embeddings but also to supply additional information to sequence-to-sequence models. Ultimately, our approach seeks to help program analysis models uncover new properties from Wasm binaries, expanding our understanding of their potential. We evaluated our new code representation on two applications: (i) method name prediction and (ii) recovering precise return types. Our results demonstrate the superiority of our novel technique over previous methods. More specifically, our new method resulted in 5.36% (11.31%) improvement in Top-1 (Top-5) accuracy in method name prediction and 8.02% (7.92%) improvement in recovering precise return types, compared to the previous state-of-the-art technique, SnowWhite.en
dc.identifier.urihttp://hdl.handle.net/10012/19423
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://zenodo.org/record/7763463en
dc.titleWasmWalker: Path-based Code Representations for Improved WebAssembly Program Analysisen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0en
uws.contributor.advisorLam, Patrick
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RobatiShirzad_Mohammad.pdf
Size:
430.65 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: