Computer Science
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9930
This is the collection for the University of Waterloo's Cheriton School of Computer Science.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item ReSlide: Towards Effective Presentation Authoring for Evolving Narratives and Contextual Constraints(University of Waterloo, 2025-09-29) Zeng, LinxiuAuthoring slides for a public presentation requires speakers to navigate multiple contextual constraints that directly shape how content is structured and delivered. We first conducted a formative study with ten experienced presenters to understand what constraints are prioritized and how they influence slide deck creation. We identified three key constraints---time, audience, and communicative intent---and challenges in integrating them into slide authoring for a single presentation and long-term needs with diverse narratives. We designed ReSlide, a presentation authoring tool that helps presenters create and reuse slides by bridging contextual constraints with evolving narratives. We evaluated ReSlide in a within-subjects study with 12 participants against a baseline tool, and in an exploratory study with eight professional presenters using only ReSlide. Results indicate that Reslide's novel features of constraint awareness and multi-granular slide reuse helped presenters effectively craft presentations in both one and multiple authoring cycles.Item Oblivious Multi-Way Band Joins: An Efficient Algorithm for Secure Range Queries(University of Waterloo, 2025-09-22) Wei, RuidiThis thesis introduces the first efficient oblivious algorithm for acyclic multi-way joins with band conditions, extending the classical Yannakakis algorithm to support inequality predicates (>, <, ≥, ≤) without leaking sensitive information through memory access patterns. Band joins, which match tuples over value ranges rather than exact keys, are widely used in temporal, spatial, and proximity-based analytics but present challenges in oblivious computation. Our approach employs a dual-entry technique that transforms range matching into cumulative sum computations, enabling multiplicity computation in an oblivious manner. The algorithm achieves O(N log N + k · OUT log OUT) complexity, where k is the number of tables in the join query, N is the input size, and OUT is the output size, matching state-of-the-art oblivious equality joins up to a factor of k while supporting full band constraints. We implement the method using Intel SGX with batch processing and evaluate it on the TPC-H benchmark dataset, demonstrating practical performance and strong obliviousness guarantees under an honest-but-curious adversary model.Item Fine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging(University of Waterloo, 2025-09-19) Carbone, KathrynImage analysis in domains that produce large amounts of complex visual data, like medicine, is challenging due to time and labour-constraints on domain experts. Visual entity linking (VEL) is a preliminary image processing task which links regions of interest (RoIs) to known entities in structured knowledge bases (KBs), thereby using knowledge to scaffold image understanding. We study a targeted VEL problem in which a specific user-highlighted RoI within the image is used to query a textual KB for information about the RoI, which can support downstream tasks such as similar case retrieval and question answering. For example, a doctor reviewing an MRI scan may wish to obtain images with similar presentations of a medically relevant RoI, such as a brain tumor, for comparison. By linking this RoI to its corresponding KB document, search of an imaging database with VEL-guided automatically-generated tags can be performed in a knowledge-aware manner based on exact or semantically similar entity tag matching. Cross-modal embedding models like CLIP present straightforward solutions through the dual encoding of KB entries and either whole images or cropped RoIs, which can then be matched by a vector similarity search between these respective learned representations. However, using the whole image as the query may retrieve KB entries related to other aspects of the image besides the RoI; at the same time, using the RoI alone as the query ignores context, which is critical for recognizing and linking complex entities such as those found in medical images. To address these shortcomings, this thesis proposes VELCRO—visual entity linking with contrastive RoI alignment—which adapts an image segmentation model to VEL using contrastive learning by aligning the contextual embeddings produced by its decoder with the KB. This strategy preserves the information contained in the surrounding image while focusing KB alignment specifically on the RoI. To accomplish this, VELCRO performs segmentation and contrastive alignment in one end-to-end model via a novel loss function that combines the two objectives. Experimental results on medical VEL show that VELCRO achieves an overall linking accuracy of 95.2% compared to 83.9% for baseline approaches.Item Precise and Scalable Constraint-Based Type Inference for Incomplete Java Code Snippets in the Age of Large Language Models(University of Waterloo, 2025-09-08) Dong, YiwenOnline code snippets are prevalent and are useful for developers. These snippets are commonly shared on websites such as Stack Overflow to illustrate programming concepts. However, these code snippets are frequently incomplete. In Java code snippets, type references are typically expressed using simple names, which can be ambiguous. Identifying the exact types used requires fully qualified names typically provided in import statements. Despite their importance, such import statements are only available in 6.88% of Java code snippets on Stack Overflow. To address this challenge, this thesis explores constraint-based type inference to recover missing type information. It also proposes a dataset for evaluating the performance of type inference techniques on Java code snippets, particularly large language models (LLMs). In addition, the scalability of the initial inference technique is improved to enhance applicability in real-world scenarios. The first study introduces SnR, a constraint-based type inference technique to automatically infer the exact type used in code snippets and the libraries containing the inferred types, to compile and therefore reuse the code snippets. Initially, SnR builds a knowledge base of APIs, i.e., various facts about the available APIs, from a corpus of Java libraries. Given a code snippet with missing import statements, SnR automatically extracts typing constraints from the snippet, solves the constraints against the knowledge base, and returns a set of APIs that satisfies the constraints to be imported into the snippet. When evaluated on the StatType-SO benchmark suite, which includes 267 Stack Overflow code snippets, SnR significantly outperforms the state-of-the-art tool Coster. SnR correctly infers 91.0% of the import statements, which makes 73.8% of the snippets compilable, compared to Coster’s 36.0% and 9.0%, respectively. The second study evaluates type inference techniques, particularly of LLMs. Although LLMs demonstrate strong performance on the StatType-SO benchmark, the dataset has been publicly available on GitHub since 2017. If LLMs were trained on StatType-SO, then their performance may not reflect how the model would perform on novel, real-world code, but rather result from recalling examples seen during training. To address this, this thesis introduces ThaliaType, a new, previously unreleased dataset containing 300 Java code snippets. Results reveal that LLMs exhibit a significant drop in performance when generalizing to unseen code snippets, with up to 59% decrease in precision and up to 72% decrease in recall. To further investigate the limitations of LLMs in understanding the execution semantics of the code, semantic-preserving code transformations were developed. Analysis showed that LLMs performed significantly worse on code snippets that are syntactically different but semantically equivalent. Experiments suggest that the strong performance of LLMs in prior evaluations was likely influenced by data leakage in the benchmarks, rather than a genuine understanding of the semantics of code snippets. The third study enhances the scalability of constraint-based type inference by introducing Scitix. Constraint-solving becomes computationally expensive using a large knowledge base in the presence of unknown types (e.g. user-defined types) in code snippets. To improve scalability, Scitix represents certain unknown types as Any, ignoring such types during constraint solving. Then an iterative constraint-solving approach saves on computation and skips constraints involving unknown types. Extensive evaluations show that the insights improve both performance and scalability compared to SnR. Specifically, Scitix achieves F1-scores of 96.6% and 88.7% on StatType-SO and ThaliaType, respectively, using a large knowledge base of over 3,000 jars. In contrast, SnR consistently times out, yielding F1-scores close to 0%. Even with the smallest knowledge base, where SnR does not time out, Scitix reduces the number of errors by 79% and 37% compared to SnR. Furthermore, even with the largest knowledge base, Scitix reduces error rates by 20% and 78% compared to state-of-the-art LLMs. This thesis demonstrates the use of constraint-based type inference for Java code snippets. The proposed approach is evaluated through a comprehensive analysis that contextualizes its performance in the current landscape dominated by LLMs. The ensuing system, Scitix, is both precise and scalable, enhancing the reusability of Java code snippets.Item Predicting Cardiovascular Events or Death in People with Dysglycemia Using Machine Learning Methods(University of Waterloo, 2025-08-15) Zhang, YuanhongCox regression is commonly used to analyze time-to-event for patients in tabular medical data. Hazard ratio can then be calculated to show how much riskier an event may occur for a patient in one group versus the other. Nonetheless, the hazard ratio and Cox regression rely on the proportional hazards assumption, which is not guaranteed. In this paper, we investigate the use of machine learning models to predict patients’ outcomes and identify key factors that may influence the outcomes. We focused on the ORIGIN Trial dataset because it has undergone extensive analysis using Cox regression, allowing us to compare the machine learning model results with previous findings. Three outcomes, major adverse cardiovascular events (MACE), expanded composite outcome (COPRIM2), and all-cause death (ALLDTH), were analyzed in this thesis. The machine learning models we used are Neural Network (NN), Random Forest (RF) and Gradient Boosted Trees (GBT), which were trained with nested cross-validation to tune their hyperparameters. When testing the trained models for all three outcomes, we found that machine learning models had higher Area-Under-the-Curve scores (AUCs) than Cox regression (0.91-0.95 vs 0.63-0.65), and Random Forest and Gradient Boosted Trees had excellent recall scores (0.80 - 0.88). Subsequently, we used SHAP values, mean decrease in AUC, and partial dependency plots (PDPs) to further examine variable importance for RF and GBT. For MACE and COPRIM2, prior cardiovascular events (priorcv), cancer, and blood lipid measures are the most important variables, while for ALLDTH, cancer and kidney functions related measures are the most important variables. The PDPs are harder to analyze than hazard ratio due to having no assumptions and fewer restrictions, but it is useful to estimate the non-linear relation between an explanatory variable and the average probability of the outcome occurring to patients in the dataset.Item Data Structures of Nature: Fermionic Encodings(University of Waterloo, 2025-08-15) Dyrenkova, EmiliiaA compelling application of quantum computers with thousands of qubits is quantum simulation. Simulating fermionic systems is both a problem with clear real-world applications and a computationally challenging task. In order to simulate a system of fermions on a quantum computer, one has to first map the fermionic Hamiltonian to a qubit Hamiltonian. The most popular such mapping is the Jordan-Wigner encoding, which suffers from inefficiencies caused by the high weight of some encoded operators. As a result, alternative local encodings have been proposed that solve this problem at the expense of a constant factor increase in the number of qubits required. Some such encodings possess local stabilizers, i.e., Pauli operators that act as the logical identity on the encoded fermionic modes. A natural error mitigation approach in these cases is to measure the stabilizers and discard any run where a measurement returns a -1 outcome. Using a high-performance stabilizer simulator, we classically simulate the performance of a local encoding known as the Derby-Klassen encoding and compare its performance with the Jordan-Wigner encoding and the ternary tree encoding. Our simulations use more complex error models and significantly larger system sizes (up to 18x18) than in previous work. We find that the high sampling requirements of postselection methods with the Derby-Klassen encoding pose a limitation to its applicability in near-term devices and call for more encoding-specific circuit optimizations.Item Exploring How AI-Suggested Politeness Strategies Influence Email Writing and Social Perception Among Native and Non-Native Speakers(University of Waterloo, 2025-08-07) Zhang, ZiboAs AI writing assistants are increasingly used for interpersonal communication, they may have profound impacts on interpersonal relationships. Politeness is one important aspect of social communication that is grounded in people’s perception of relational dynamics and significantly shapes social interactions. We investigate how politeness strategies in AI-generated suggestions affect people’s email writing and alter their perception of the social situation. Through a within-subject online experiment (N = 52), we found that human writers tend to mirror the type of politeness strategies used in the AI suggestions when writing their own messages. Non-native English speakers are more affected by AI compared to native speakers, and this greater susceptibility to AI influence is partly mediated by higher reliance on AI tools. In addition, writers’ social perception is also influenced by AI. When writers are exposed to more deferential politeness strategy suggestions, they tend to perceive the social relationship as more distant. These findings highlight the need for better design of AI writing assistants that account for social contexts and individual differences.Item NP-hardness of testing equivalence to sparse polynomials and to constant-support polynomials(University of Waterloo, 2025-07-22) Baraskar, Omkar BhalchandraGiven a list of monomials of a n-variate polynomial f over a field F, and an integer s, decide whether there exists an invertible transform A and a b such that f(Ax + b) has less than s monomials. This problem is called the Equivalence testing to sparse polynomials (ETsparse). It was studied in [GrigorievK93] over Q, in this work, they give an exponential in n^4 time algorithm for the problem. The lack of progress in the complexity of the problem over last three decades raises a question, is ETsparse hard? In this thesis we give an affirmative answer to the question by showing that it is NP-hard over any field. Sparse orbit complexity of a polynomial f is the smallest integer s_0 such that there exists an invertible transform A such that f(Ax) has s_0 monomials. Since ETsparse is NP-hard hence computing the sparse orbit complexity is also NP-hard. We also show that approximating the sparse orbit complexity upto a factor of s_f^{1/3-\epsilon} for any \epsilon \in (0,1/3) is NP-hard, where s_f is the number of monomials in f. Interestingly, this approximation result has been shown without invoking the celebrated PCP theorem. [ChillaraGS23] study a variant of the problem which focus on shift equivalence. More precisely, given f over some ring R (the input has the same representation as in ETsparse) and an integer s, does there exists a b such that f(x + b) has less than s monomials. It is called the SETsparse problem, [ChillaraGS23] showed that SETsparse is NP-hard when R is an integral domain which is not a field; we extend their result to the case when R is a field. Finally, we also study the problem of testing equivalence to constant-support polynomials; more precisely, given a polynomial f as before and with support \sigma, does there exists an invertible transform A such that f(Ax) has support \sigma -1. We call this problem ETsupport. We show that ETsupport is NP-hard for \sigma >= 5 and over any field.Item Algorithmic Tools for Network Analysis(University of Waterloo, 2025-07-08) Chen, JingbangNetwork analysis is a crucial technique used in various fields such as computer science, telecommunications, transportation, social sciences, and biology. Its importance includes optimizing network performance, understanding social and organizational structures, and detecting fraud or misinformation. In this thesis, we propose algorithmic results on several aspects of network analysis. The Abelian sandpile model is recognized as the first dynamical system discovered exhibiting self-organized criticality. We present algorithms that compute the final state of the sandpile instance on various classes of graphs, solving the \textit{sandpile prediction} problem on: (1) general graphs, with further analyses on regular graphs, expander graphs, and hypercubes. (2) trees and paths, surpassing previous methods in time complexity. To analyze the structure and dynamics of networks, counting motifs is one of the most popular methods, as they are considered the basic construction block of the network. In this thesis, we introduce several tools developed for counting motifs on bipartite networks. Despite its importance, counting (p,q)-bicliques is very challenging due to its exponential increase with respect to p and q. We present a new sampling-based method that produces a high-accuracy approximate counting of (p,q)-bicliques, with provably error guarantee and unbiasedness. In another line of work, we consider the temporal bipartite graphs, which edges carry timestamps. To capture the dynamic nature of relationships, we consider counting butterflies ((2,2)-bicliques) in temporal bipartite graphs within specified time windows, called the historical butterfly counting problem. We present a hardness result between memory usage and query time for this problem and a new index algorithm that surpasses the hardness when applied to power-law graphs, with outstanding empirical performance. Lastly, we discuss tools that find the polarized community in the network. A classical model that applies to networks to deal with polarization is the signed graphs, which have positive and negative edges between vertices. A signed graph is balanced if it can be decomposed into two disjoint sets such that positive edges are between vertices in the same set while negative edges are between vertices from different sets. This notion of balance is strict in that no edge can disobey the condition, which seldom appears in reality. To address this phenomenon, we propose a new model for identifying balanced subgraphs with tolerance in signed graphs and a new heuristic algorithm that computes maximal balanced subgraphs under the new tolerance model.Item Using a Capability Sensitive Design Approach to Support Newcomers Well-being(University of Waterloo, 2025-07-03) Bin Hannan, NabilNewcomers transitioning to a new country face many challenges, and their well-being is impacted due to unfamiliarity with self-navigating in a new environment. This thesis explores how Capability Sensitive Design (CSD) can be operationalized to guide the end-to-end design and evaluation of technologies that support the well-being of newcomers during life transitions. While the CSD framework has recently been investigated in Human Computer Interaction (HCI) for its ethical focus on supporting what individuals have reason to value, there remains a gap in how it can be translated into concrete, scalable technology design processes. To address this, we present a multi-stage methodology that includes formative interviews, co-design sessions, prototype development, and a longitudinal field study to evaluate the application prototype. We begin by mapping the lived experiences of newcomers using a capability-oriented interview protocol and with the use of a capability board to surface valued goals and challenges. This informed a co-design process using modified capability cards, where both newcomers and organizational stakeholders ideated design features aligned with the ten central capabilities. Drawing on these insights, we developed the Newcomer App—a multilingual mobile platform offering four core features: goal-oriented planning, capability-aligned suggestions, resource search and browsing, and reflective tracking. We evaluated this platform in an eight-week field study that included in-app activity logging and post study interviews. Our findings show that newcomers were able to identify capability-aligned goals which they found helpful, translate them into intentional plans, and reflect on both their achievements and the conversion factors that influenced outcomes. Importantly, we observed how CSD-informed features constructed self-discovery, increased agency, and facilitated social contribution, particularly in the capabilities of social connection, emotional well-being, and community participation. The study also highlighted the importance of contextual and social barriers in determining whether users could turn suggestions into meaningful actions. This thesis contributes an operational model for applying CSD across the full design lifecycle, offering insights for researchers and practitioners. By translating ethical commitments into deployable technologies, our work extends prior research in HCI and design social justice, demonstrating how technologies can support equitable pathways toward wellbeing for marginalized groups, such as newcomers in navigating complex transitions.Item Efficient Algorithms for RDV graphs(University of Waterloo, 2025-05-22) Gokhale, Prashant AbhijitIn this thesis, we study the maximum matching and minimum dominating set problem in RDV graphs, i.e., graphs that are vertex-intersection graphs of downward paths in a rooted tree. A straightforward implementation of these algorithms would require $O(n+m)$ time. We improve their efficiency by transforming the question about the neighborhood of $v$ into a type of range query amid a set of horizontal and vertical line segments. Our algorithms run in $O(n \log{n})$ time, presuming a $O(n)$-sized intersection representation of the graph is given. In addition, our techniques can also be used to obtain faster algorithms for maximum independent set and perfect $k$-clique packing in RDV graphs.Item Puck Possession and Net Traffic Metrics in Ice Hockey(University of Waterloo, 2025-05-22) Pitassi, MilesThis thesis investigates two elements of hockey widely believed to be critical to success: puck possession and traffic (skaters who are in or near the triangular area formed between the puck and the posts during a shot attempt). Our analysis draws on puck and player tracking (PPT) data from the 2023–2024 and 2024–2025 NHL regular seasons. We determine average team puck possession percentage, defined as the average proportion of total game time that each team has possession of the puck. We find that this metric has only a moderate correlation with average goal differential (r=0.56). To further explore how different aspects of possession relate to team success, we compute additional metrics, including Average Offensive Zone Possession Time Differential (Avg. OZPTD). This captures the difference between the time a team spends with possession in the offensive zone and the time their opponents spend with possession in their offensive zone. We find a strong correlation (r=0.77) between Avg. OZPTD and average goal differential. Further analysis shows that Avg. OZPTD is stable across games, effectively distinguishes between teams, and, despite being correlated with existing metrics like Shot Attempt Percentage (SAT%), offers additional predictive value for goal differential. SAT% (also known as Corsi) refers to the percentage of total shot attempts that each team takes. We also study the relationship between the amount of skaters creating traffic during shot attempts and shot outcomes. Our findings show that increased levels of traffic significantly increase the percentage of shot attempts that are blocked and reduce the chance of a shot attempt resulting in a shot on goal. Overall, we find that 29% of all shot attempts are blocked and that the highest goal rates occur from the center of the ice on short-to-mid-range attempts with no traffic present. For long-distance shot attempts that reach the goaltender, scoring probability increases with traffic. We also show that defensive skaters primarily reduce shot-on-goal rates but can inadvertently increase goal likelihood on mid-range shot attempts (23-45 feet) presumably due to screening their own goaltender. Together, the findings in this thesis offer valuable insights into how puck possession contributes to team success and how traffic influences shot outcomes. In addition to these empirical results, we contribute a suite of methodological techniques that can support future analysis of possession and traffic. We present a comprehensive pipeline for cleaning, filtering, and processing individual possession data sourced from the NHL’s puck and player tracking system which is an essential foundation for our findings and a resource for future research. We also describe how we assemble a set of shot events by aligning official NHL API shot data with shot attempts in the PPT data. This involves adapting the NHL’s own inference algorithms to identify undetected shot attempts and applying custom techniques to improve timestamp accuracy.Item What Slows Down FMware Development? An Empirical Study of Developer Challenges and Resolution Times(University of Waterloo, 2025-05-16) Wang, ZitaoFoundation Models (FMs), such as GPT-4, have revolutionized software engineering by enabling the development of FMware — applications and infrastructures built around these powerful models. Despite their transformative potential, FMware solutions face significant challenges in their development, deployment, and maintenance, particularly across cloudbased and on-premise platforms; this is because many of the goals, processes, tools, and technical assets of FMware development are different from those of traditional software systems. This study presents an empirical investigation of the current FMware ecosystem, focusing on three key questions: (1) what topics are most prevalent in issue discussions of FMware systems, (2) what specific challenges are commonly faced by FMware developers, and (3) what kinds of issues in FMware development have the longest resolution times? Our analysis uses data extracted from both GitHub repositories of FMware systems as well as systems hosted on popular FMware platforms such as HuggingFace, GPTStore, Ora, and Poe. Our findings reveal a strong emphasis on education, content creation, and business strategy, alongside critical technical challenges such as memory errors, dependency management, and tokenizer configurations. We further identify bug reports and core functionality issues as the most common problem types on GitHub, and show that topics concerning code review, similarity search, and prompt template design require the longest time to resolve. By uncovering insights into developer practices and pain points, this research highlights opportunities for improving FMware development tools, workflows, and community support. These insights contribute to a deeper understanding of the current FMware landscape and provide actionable recommendations for practitioners and researchers.Item Embedded System Anomaly Detection via Boot Power Trace Analysis(University of Waterloo, 2025-05-14) Qiao, SkyEmbedded systems play a crucial role in safety-critical domains, and it is essential to maintain their integrity. This thesis presents a robust framework for detecting hardware and firmware anomalies in embedded systems through boot-phase power consumption analysis. The proposed Sliding Window Anomaly Detection (SWAD) method establishes a nominal boot power profile and compares new boot traces against this baseline using sliding windows. By analyzing localized power dynamics, SWAD detects deviations caused by hardware or firmware modifications while accommodating natural variations in power behaviour. Experimental validation on single-board computers and flight controllers demonstrates the method’s effectiveness in identifying diverse hardware and firmware attacks, achieving overall F1 scores of 98\%, 96\%, and 85\% across three systems used in the case studies. These results highlight the promising role of power side-channel analysis in enhancing security in complex embedded systems.Item LiDAR-based 3D Perception from Multi-frame Point Clouds for Autonomous Driving(University of Waterloo, 2025-05-13) Huang, Chengjie3D perception is a critical component of autonomous driving systems, where accurately detecting objects and understanding the surrounding environment is essential for safety. Recent advances in Light Detection and Ranging (LiDAR) technology and deep neural network architectures have enabled state-of-the-art (SOTA) methods to achieve high performance in 3D object detection and segmentation tasks. Many approaches leverage the sequential nature of LiDAR data by aggregating multiple consecutive scans to generate dense multi-frame point clouds. However, the challenges and applications of multi-frame point clouds have not been fully explored. This thesis makes three key contributions to advance the understanding and application of multi-frame point clouds in 3D perception tasks. First, we address the limitations of multi-frame point clouds in 3D object detection. Specifically, we observe that increasing the number of aggregated frames has diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To overcome this performance trade-off, we propose an efficient adaptive method termed Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. This adaptive approach reduces the inherent trade-offs of fixed aggregation, improving detection accuracy. Next, we tackle the challenge of applying multi-frame point cloud to 3D semantic segmentation. Point-wise prediction on dense multi-frame point clouds can be computationally expensive, especially for SOTA transformer-based architectures. To address this issue, we propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. MFSeg aggregates point cloud sequences at the feature level and regularizes the feature extraction and aggregation process to reduce computational overhead without compromising accuracy. Additionally, by employing a lightweight MLP-based point decoder, MFSeg eliminates the need to upsample redundant points from past frames, further improving efficiency. Finally, we explore the use of multi-frame point clouds for cross-sensor domain adaptation. Based on the observation that multi-frame point clouds can weaken the distinct LiDAR scan patterns for stationary objects, we propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for 3D object detection in a target domain. In contrast to the current SOTA in-domain practice of aggregating few input frames, SOAP utilizes entire sequences of point clouds to effectively reduce the sensor domain gap.Item Finding Behavioural Biometrics Scripts on the Web Using Dynamic Taint Analysis(University of Waterloo, 2025-05-13) Bara, AlexandruIn an era of escalating cyber threats, behavioural biometrics have emerged as a transformative security mechanism, leveraging user interaction patterns like keystrokes and mouse movements for continuous authentication on the web. However, detecting these scripts at scale remains challenging due to obfuscation, dynamic execution, and overlap with analytics tools. This thesis addresses these challenges through three interconnected contributions: (1) enhancing FoxHound, a dynamic taint analysis tool, to achieve 97% effectiveness in tracking behavioural biometric data flows; (2) developing the first open-source checkout crawler to navigate e-commerce workflows with upwards of 78% accuracy; and (3) creating a machine learning classifier to distinguish behavioural biometric scripts from other tracking scripts. Large-scale analyses reveal that behavioural biometric scripts are deployed on 0.3% of top websites, with significantly higher adoption on sensitive pages (4.55% of banking logins). The work concludes with ethical recommendations to balance security benefits with privacy risks, advocating for transparency, deobfuscation, and regulatory oversight.Item Program Reduction: Versatility, Insights, and Efficacy(University of Waterloo, 2025-05-08) Zhang, MengxiaoGiven a program P and a property ψ it preserves, the goal of program reduction is to minimize this program while ensuring that the minimized version Pmin still preserves ψ. Program reduction is a widely used technique to facilitate compiler debugging, as most compiler bugs are triggered by program inputs, which often contain a significant amount of code unrelated to the bug. Although program reduction automates eliminating such bug-irrelevant components from a program, it can take hours to complete due to its trial-and-error strategy. Furthermore, even if the reduction process produces a smaller program after a long wait, the resulting program is not guaranteed to be optimal, as program reduction is an NP-complete problem. As a result, compiler developers consistently seek program reduction approaches that offer faster reduction speeds or produce smaller outputs. It is critical to design superior program reduction approaches to further explore the potential of this field. This thesis aims to help enhance program reduction approaches in the following three ways. The first study aims to enhance the versatility of program reduction approaches. While existing techniques, such as C-Reduce and Perses, can effectively reduce a bug-triggering program as a whole, they fail to consider the varied degrees of relevance each remaining element has to the bug. To address this limitation, this study proposes PPR, a new program reduction technique designed to minimize a pair of programs w.r.t. certain properties. Given a seed program Ps and its variant Pv with differing properties (e.g., Pv crashes a compiler while Ps does not), PPR produces a pair of minimized programs that preserve these properties separately, with reduced differences highlighting bug-related elements. Evaluation results demonstrate that PPR effectively reduces pairs of programs, producing outputs of comparably small size to those generated by Perses and C-Reduce. In addition, PPR significantly outperforms the Delta Debugging algorithm in isolating bug-related differences. The second study concentrates on the understanding of a state-of-the-art program reduction approach ProbDD. ProbDD, a variant of ddmin, employs a probabilistic model to estimate the likelihood of each element being relevant to ψ, achieving superior performance compared to the Delta Debugging algorithm. However, the theoretical probabilistic model underlying ProbDD is intricate and not fully understood. To address this, the study conducts the first in-depth theoretical analysis of ProbDD, comprehending and demystifying this state-of-the-art approach from multiple perspectives. Building upon these insights, the study further proposes CDD, a simplified counter-based model that retains the core strengths of ProbDD. The evaluation demonstrates that CDD can achieve the same performance as ProbDD, despite being much simplified. The third study integrates large language models (LLMs) to improve the efficacy of program reduction. Existing program reduction techniques typically fall into two categories: generic approaches applicable to a wide range of languages but lacking domain-specific knowledge, and language-specific approaches that exploit detailed language knowledge but fail to generalize across multiple languages. However, effectively combining both language generality and language-specific expertise in program reduction is yet to be explored. To this end, this chapter proposes LPR, the first LLM-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages. The key insight is to synergize the language generality of existing program reducers, such as Perses, with the language-specific semantics learned by LLMs. The evaluation shows that LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs, while reducing the reduction time by 10.77%, 34.88%, 36.96% on benchmarks in C, Rust and JavaScript, separately. Collectively, these studies advance the field of program reduction by enhancing its versatility, insights, and efficacy. By making reduction more effective, faster, and easier to understand, this thesis facilitates efficient debugging and robust compiler development.Item Paidian Playful Interaction in Non-game User Interfaces(University of Waterloo, 2025-05-07) Lakier, MatthewI investigate "paidian" play in the context of non-game software user interfaces. Paidian play means activities that are open-ended, exploratory, and free-form. In an initial investigation, I characterize 16 types of playful experiences, 7 characteristics of play in software, and guidelines for the role of play in user interfaces, based on a qualitative analysis of a series of surveys and a brainstorming session with experts. As a case study of inspiring design with paidian play, a second investigation focuses on Easter eggs (hidden features in software applications), an interface feature associated with one of the playful experience types. I analyze source code repositories of open-source software containing Easter eggs, scrape online Easter egg databases, and interview developers of well-known software containing Easter eggs, to characterize 14 different Easter eggs purposes. Results show that Easter eggs provide significant value to developers and users, for example, by enabling recruitment of new developers and teaching users transferable knowledge and skills. I also propose implications for how Easter eggs could be applied in new ways, such as providing educational value. In a third investigation, as a design case study for paidian play, I define and implement "digital knick-knacks" as a form of playful digital possession (e.g., virtual pet). I deploy three exemplar designs in a diary study, in which participants customize and install a digital knick-knack on a personal device. The investigation reveals implications for how playful digital knick-knacks can bring joy and even support mental health. Taken together, the investigations show how user interfaces can be designed to provide social and emotional value to users through paidian play, including in workplace contexts.Item Optimizing ORAM Datastores for Scalability, Fault Tolerance, and Performance(University of Waterloo, 2025-05-01) Setayesh, AminOblivious RAM (ORAM) mitigates access pattern attacks, where adversaries infer sensitive data by observing access patterns. These attacks can compromise privacy even when data is encrypted. While ORAM ensures privacy by obfuscating these patterns, its adoption in cloud environments faces significant challenges, particularly related to scalability, fault tolerance, and performance. This thesis presents Treebeard: an ORAM-based datastore that addresses these challenges through a novel multi-layer architecture. Unlike traditional ORAM systems that rely on a centralized proxy to manage data access and security, this design separates responsibilities across specialized layers that are independently scalable. Each layer handles distinct functionalities and efficiently batches and processes requests. Treebeard facilitates horizontal scaling, and adds fault tolerance by eliminating single points of failure. Experiments show that Treebeard is scalable, highly performant, and fault-tolerant. Treebeard outperforms existing ORAM systems in terms of throughput while simultaneously addressing scalability and fault tolerance in its design.Item Latra: A Template-Based Language-Agnostic Transformation Framework for Program Reduction(University of Waterloo, 2025-04-29) Wang, YiranEssential for debugging compilers and interpreters, existing reduction tools face a fundamental trade-off. Language-specific reducers, such as C-Reduce and ddSMT, offer highly effective reductions but require substantial engineering effort for each target language. Conversely, language-agnostic reducers, like Vulcan, sacrifice effectiveness for broad applicability. To bridge this gap, we present Latra, a novel template-based framework that balances both aspects, enabling general, effective, targeted program reduction. Latra combines language-agnostic reduction with user-defined, language-specific transformations. It facilitates user-defined transforms through a user-friendly domain-specific language based on simple matching and rewriting templates. This minimizes the need for deep formal grammar knowledge. Latra empowers users to tailor reductions to specific languages with reduced implementation overhead. Evaluation shows that Latra significantly outperforms Vulcan. It reduces 33.77% more tokens in C and 9.17% more tokens in SMT-LIB, with 32.27% faster execution in SMT-LIB. Notably, Latra closely matches the effectiveness of language-specific reducers C-Reduce and ddSMT (89 vs. 85, 103 vs. 109 tokens), while significantly reducing engineering effort (167 vs. 5,508, 62 vs. 118 lines of code). We strongly believe that Latra provides a practical and cost-efficient approach to program reduction, effectively balancing language-specific effectiveness with language-agnostic generality.