Systems Design Engineering
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914
This is the collection for the University of Waterloo's Department of Systems Design Engineering.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item type: Item , Learning to Reach Goals from Suboptimal Demonstrations via World Models(University of Waterloo, 2026-01-14) Ali, QasimA central challenge for training autonomous agents is the scarcity of high-quality and long-horizon demonstrations. Unlike fields such as natural language or computer vision—where abundant internet data exists—many robotics and decision-making domains lack large, diverse, and high-quality datasets. One underutilized resource is leveraging suboptimal demonstrations, which are easier to collect and potentially more abundant. This limitation is particularly pronounced in goal-conditioned reinforcement learning (GCRL), where agents must learn to reach diverse goal states from limited demonstrations. While methods such as contrastive reinforcement learning (CRL) show promising scaling behavior when given access to abundant and high-quality training demonstrations, they struggle when demonstrations are suboptimal. In particular, when training demonstrations are short or exploratory, CRL struggles to generalize beyond the training demonstrations, and the resulting policy exhibits lower success rates. To overcome this, we explore the use of self-supervised representation learning to extract general-purpose representations from demonstrations. The intuition is that if an agent can first learn robust representations of environment dynamics—without relying on demonstration optimality—it can then use these representations to guide reinforcement learning more effectively. Such representations can serve as a bridge between noisy demonstrations and goal-directed control, allowing policies to learn faster. In this thesis, we propose World Model Contrastive Reinforcement Learning (WM-CRL), which augments CRL with representations from a world model (WM). The world model is trained to anticipate future state embeddings from past state–action pairs, thereby encoding the dynamics of the environment. As the world model aims to only learn environment dynamics, it can leverage both high and low quality demonstrations. By integrating these world model embeddings into CRL’s framework, it can help CRL more easily comprehend the environment dynamics and select actions that more effectively achieve its goals. We evaluate WM-CRL on tasks from the OGBench benchmark. We explore performance on multiple locomotion and manipulation environments and multiple datasets varying in quality. Our results show that WM-CRL can substantially improve performance over CRL in suboptimal-data settings, such as stitching short trajectories or learning from exploratory behavior. However, we find our method offers limited benefit when abundant expert demonstrations are available. Ablation studies further reveal that success depends critically on the stability of world model training and on how its embeddings are integrated into the agent’s architecture.Item type: Item , Label-free optical microscopy: Photon Absorption Remote Sensing (PARS) and other methods for label-free histopathological imaging of tissues(University of Waterloo, 2026-01-08) Ecclestone, BenjaminEmerging label-free microscopy methods offer promising new avenues to view cells and tissues in their native environment, minimizing external influences. These label-free techniques are an exciting departure from gold standard methods for visualizing microscopic cellular and tissue structures, which rely on centuries-old chemical staining processes. In current practice, chemical labelling can unavoidably interfere with specimens’ physical and biochemical integrity. As a result, samples are effectively consumed by staining with only a single stain set normally applied to each sample. This limitation is especially impactful in applications such as clinical oncology and medical histopathology. In these settings, irreversible staining processes can severely limit the diagnostic utility of samples; especially when there is limited sample volume (e.g., brain tumor biopsies). As an alternative, label-free imaging techniques offer a potential avenue to visualize subcellular tissue anatomy while preserving samples in their entirety. Subsequently label-free microscopy methods have significant potential to greatly increase the diagnostic utility of each specimen, thereby enhancing patient outcomes. This thesis focuses on developing new methods for label-free microscopy, specifically emphasizing techniques for label-free histopathology. As a starting point the targeted objective is to develop a label-free analog to chemical hematoxylin and eosin (H&E) staining. This objective is chosen as H&E represents the gold standard contrast applied in effectively every clinical diagnostic case. Subsequent developments in this thesis can be broken into three major sections, which focus on (1) developing label-free microscopy methods for H&E-like imaging, (2) exploring the biomolecular specificity of developed methods to validate the label-free H&E-like contrast, and (3) producing a label-free microscopy architecture capable of meeting the imaging requirements necessary for clinical adoption. The first collection of works explores the development of a range of label-free microscopy methods. These studies establish new variations and combinations of optical absorption and scattering microscopes to visualize microscopic tissue anatomy label-free. These efforts ultimately resulted in the development of a new optical absorption microscopy modality, Photon Absorption Remote Sensing (PARS). This comprehensive technique provides biomolecule-specific visualizations characterizing the dominant photophysical effects caused when photons are absorbed by a biomolecule. As a direct result, novel PARS specific contrasts are developed as the total absorption (TA) and quantum efficiency ratio (QER). These PARS measurements may provide unique views into biomolecules’ excited state dynamics, accessing characteristics related to the quantum yield. By specifically probing specimens’ response to the absorption of deep ultraviolet light, PARS is shown to provide label-free contrast directly reminiscent of gold standard chemical H&E staining methods. As a proof of concept, the initial PARS architecture is applied to capture submicron resolution images of key H&E-like diagnostic markers across a variety of human and animal tissue specimens. The second section of this thesis expands the basis for PARS histopathology by validating PARS capacity to produce H&E-like visualizations. Two main avenues of exploration are pursued in this effort. The first endeavor explores the underlying biomolecular contrast of the PARS measurements. Established statistical methods are applied to develop characteristic PARS profiles for biomolecules. These PARS signatures are then applied to map the abundance of molecules label-free inside complex specimens. As a proof of concept, key diagnostic features including nuclei, red blood cells, and connective tissues are directly characterized and unmixed label-free. Resulting statistical abundance mappings are directly validated against chemically stained ground truth counterparts. The second endeavor introduces an end-to-end pipeline which uses deep learning-based image-to-image transforms to emulate chemical H&E visualizations from label-free PARS data. Resulting PARS emulated H&E-like visualizations are validated against chemical H&E staining through a clinical concordance study. In this diagnostic validation study, statistical analysis is applied to determine if pathologists produce the same diagnoses on both PARS and chemical H&E images. In this preliminary test, the PARS-based virtual staining method achieves > 90% concordance with very high statistical confidence (Kappa > 0.7) across all measured diagnostic tests. The final thesis section develops a new PARS architecture which achieves pragmatic imaging performance, nearing the requirements for clinical diagnostic settings. The presented system features a hybrid opto-mechanical scanning architecture which allows for high-speed MHz rate imaging. This results in imaging speeds which are more than an order of magnitude faster than earlier PARS embodiments developed in the PhotoMedicine Labs (at the University of Waterloo). This work simultaneously develops an end-to-end control system and imaging workflow which enables fully automated PARS imaging of whole specimens. Deep learning methods are applied to the resulting PARS images to produce virtual H&E-like visualizations. Qualitative and quantitative methods are applied to validate the imaging performance across a range of human and animal tissue samples. Results indicate the PARS virtual H&E images are largely indistinguishable from chemically H&E-stained ground truth images. Notably, the presented system forms the basis for a commercially available clinically ready prototype for label-free PARS histopathology imaging. In total, the findings presented across this thesis encompass the development of a new variation of microscopy technique (PARS). This method provides unique views into the absorption and scattering characteristics of specimens opening a new avenue of label-free contrast. For the presented histopathology application, PARS can provide powerful H&E-like images which may circumvent key challenges of chemical staining. In clinical histopathology, this method could enhance the diagnostic utility of tissue specimens directly improving patient outcomes. Beyond histopathology, the principles of PARS may be directly applicable to a wide range of imaging applications spanning material science, biological research, and clinical diagnostics. Overall, the methods developed in this thesis lays the groundwork for new label-free optical absorption microscopy techniques, which are already achieving real-world commercial and clinical success in histopathology applications.Item type: Item , Advancing Semi-Supervised Domain Adaptive Semantic Segmentation Through Effective Source Integration Strategies(University of Waterloo, 2026-01-08) Kurien, JoshuaSemantic segmentation is a highly valuable visual recognition task with applications across fields such as medical imaging, remote sensing, and manufacturing. However, training segmentation models is challenging because it requires large-scale, densely labeled data specific to the target. Semi-supervised learning (SSL) addresses this challenge by leveraging unlabeled data alongside limited labeled data, reducing the reliance on fully labeled datasets. Semi-supervised domain adaptation (SSDA) further mitigates this issue by incorporating labeled data from a source domain alongside minimally labeled target data. While existing SSDA methods often underperform compared to fully supervised approaches, recent SSL methods that utilize foundation models achieve near fully supervised performance. Given the strength of current SSL methods using foundation models, this thesis investigates effective strategies for integrating source-domain data from a different distribution into existing pipelines to improve segmentation performance. First, we explore a simple source transfer mechanism that merges target and source data into a single unified labeled set for SSL pipelines. Our analysis demonstrates the accuracy benefits of this setup while also highlighting some downsides, particularly in terms of training efficiency. We also examine the use of ensembling SSL and SSDA models to enhance target-domain performance. This ensemble combines a model trained solely on target data with a source-transferred SSDA model. We find that ensembling can improve performance in certain cases but is less effective in others, and training efficiency remains suboptimal due to the need to train two models. Given the training inefficiencies of simple source transfer and ensembling, we propose a dual-curriculum source integration strategy to address and improve these limitations. This approach consists of two complementary learning strategies: curriculum retrieval, which progressively samples source examples from easy to hard, and curriculum pasting, which increases the diversity of target-labeled data. Across our experiments, we compare against and outperform state-of-the-art SSL and SSDA methods on a variety of benchmarks, including synthetic-to-real and real-to-real scenarios. Our findings highlight the benefits of effective source data integration into modern SSL pipelines for boosting segmentation performance, opening a new avenue for label-efficient semantic segmentation.Item type: Item , REMind: A Robot Role-Playing Game To Promote Bystander Intervention(University of Waterloo, 2026-01-07) Sanoubari, ElahehPeer bullying is a pervasive social problem, with bystanders' inaction being a critical challenge despite widespread disapproval of bullying. Effective intervention strategies must move beyond explanation-based instruction to facilitate embodied situated learning. This dissertation explores how social robots can serve as mediators for applied drama to foster prosocial bystander intervention in the context of peer bullying. It introduces Robot-Mediated Applied Drama (RMAD): an innovative framework that integrates drama-based pedagogy with social robotics to create safe, reflective, and embodied learning experiences. Using a Research through Design (RtD) methodology, this work advances through an iterative sequence of design studies that culminate in the development and evaluation of REMind (short for Robots Empowering Minds): a mixed-reality role-playing game where children engage in dramatized bullying scenarios performed by social robots. In REMind, three robots enact a conflict involving a bully, a victim, and a passive bystander. Players are invited to assume control of robotic avatar, reflect on the unfolding narrative, and improvise an intervention by using the robot as a proxy in order to change the story’s outcome. Through this structure, children rehearse bystander intervention strategies within a psychologically safe, yet emotionally engaging environment. The iterative design process of REMind unfolded across complementary empirical inquiries. A crowdsourced feasibility study first established that observers perceive aggression toward robots as morally wrong, validating the viability of using robots in the intervention. A narrative co-design study with children revealed storytelling patterns such as preferences for emotionally expressive and customizable robot characters. Interviews with teachers grounded the design in classroom realities, identifying gaps in existing programs. A game design focus group study further examined what makes educational robot role-play games pleasurable for children, leading to identifying concrete design elements that informed REMind’s interactive components such as core mechanics, use of tangible props, world aesthetics and narrative structure. This dissertation presents the resulting artifact, REMind, as a system consisting of five interconnected components: Learning Goals, Mechanics, Narrative, Technology, and Aesthetics. The learning goals were defined through consultation with subject-matter experts to ensure grounding in evidence-based best practices. By deliberate aligning the game pleasures identified in prior studies with the learning objectives, REMind introduces a suite of game mechanics that scaffold socio-emotional skills (such as robot-mediated spect-actorship or "puppet mode" for moral intervention, interpretation of immersive affective displays for empathy-training and perspective taking, and custom-made logic-gate puzzles for moral reasoning). Narrative design is scaffolded by borrowing a five-step cognitive model of bystander intervention from social psychology. The technical implementation is realized through StorySync, a novel spreadsheet-based scripting toolkit developed to synchronize multimodal cues (including multiple robots, graphical interfaces, ambient lighting, and sound) and manage narrative branching for live interactive robot drama. Finally, the aesthetic elements leverages emotional design, ambient cues, and digital scenography to create an emotionally resonant learning experience. This concrete high-fidelity prototype serves as a proof of concept for RMAD. This research contributes a theoretical and practical foundation for designing robot-mediated experiential learning systems, offering RMAD as a new direction for social robotics and educational technology. It further illustrates how embodied storytelling and interactive systems design might cultivate reflective, prosocial action in a complex domain of social-emotional learning. More broadly, it advocates for a shift in Human-Robot Interaction (HRI) research toward systems thinking, positioning game design as a powerful systems lens for creating and analyzing holistic user experiences.Item type: Item , Robust and Hierarchy-Aware Classification(University of Waterloo, 2025-12-18) Pellegrino, NicholasThe BIOSCAN project, led by the International Barcode of Life (iBOL) Consortium, is an international, multi-year, and multidisciplinary effort seeking to catalogue all multicellular life on Earth by 2045 to enable the global-scale study of changes in biodiversity, species interactions, and species dynamics. Access to this information has the potential to inform strategies to mitigate the damaging ecological effects of climate change. In the near term, the goal is to catalogue all insects. Each sample is imaged, genetically barcoded, and taxonomically classified by domain experts, a time- and resource-intensive process that is becoming increasingly impractical as collection rates surpass five million samples annually. Addressing such needs is among the foundational motivations for the research of this thesis. This thesis presents several contributions motivated by the challenges of the BIOSCAN project. Over five million insect samples were organized into a machine-learning-ready dataset, and a deep neural network classifier was developed to establish a baseline for image-to-taxonomy classification performance. To mitigate the harmful impacts of mislabelled samples in training data, a study of neural network architecture robustness was conducted alongside the development of two novel loss functions: Blurry and Piecewise-zero loss. Blurry loss de-weights and reverses the gradient of samples likely to be mislabelled, while Piecewise-zero loss disregards these samples. These improvements strengthen model robustness and enhance label error detection, enabling the referral of suspicious samples for expert review and correction. Additional work investigates the hierarchical structure of biological data and its integration into classification models, specifically through Hyperbolic neural networks, and measures the benefits of doing so in comparison to using conventional architectures. Finally, this thesis explores aligning image, genetic, and taxonomic representations in a hierarchy-aware manner to improve retrieval across modalities. The contributions of this thesis advance the application of machine learning to facilitate the ongoing global-scale cataloguing of insect life. As challenges such as label errors, hierarchical structures in data, and incomplete annotations are present across many domains, the contributions are valuable to both the machine learning community and the global network of BIOSCAN collaborators.Item type: Item , Charting Skills in Uncharted Domains: Evaluating How Video Game Competence is Viewed Outside Competitive Desktop Gaming Environments(University of Waterloo, 2025-12-16) Senthil Nathan, KaushallPlayer competence heavily shapes multiplayer gameplay experiences, from team success to avoiding frustration, yet existing research focuses predominantly on competitive esports contexts on PC platforms. This lack of research leaves players in understudied domains without a clear understanding of competence. Therefore, I examined the contexts of casual, cooperative games and VR multiplayer games to uncover how competence is conceptualized within them. In study 1, I conducted a mixed-methods experiment with 23 participants playing Overcooked 2 with a competent or incompetent teammate, to examine competence, frustration, and cooperative behaviour. The results of study 1 showed that players evaluated teammate performance comparatively rather than through absolute metrics, and that current frustration and cooperation measures were insufficient in capturing the nuances of player experience. In study 2, I surveyed 111 VR multiplayer gamers to identify novel skill clusters, how skills are adapted from PC to VR, and whether player rank affects the importance of these skills. Findings revealed five new VR-specific skills, highlighted the body’s central role in skill adaptation, and found no significant rank-based rating differences. The overarching contribution is in demonstrating that an evaluation of competence drawn from competitive esports is insufficient in describing competence in these domains. Casual, cooperative players judge competence in primarily in relation to their teammates, while VR multiplayer gamers regard physicality and embodied interaction as essential to displaying relevant skills. My thesis puts forward new definitions of competence in casual, cooperative games and VR multiplayer games, as the first step to chart skills in uncharted domains.Item type: Item , Integrating Cognitive Work Analysis into an ACT-R Model for Cybersecurity Applications(University of Waterloo, 2025-12-16) He, FanCybersecurity is a trending concern with the rapid development of many systems. While humans are often considered vulnerable targets, research on human factors remains limited compared to the extensive technical focus on defense and mitigation strategies. Human-focused cognitive research in this domain faces two primary challenges: the evolving and complex nature of the cybersecurity landscape, and the domain-specific characteristics of the systems under attack. These challenges point to the need for modeling human performance in identifying vulnerabilities, with both precise dynamic measurement and domain-specific fidelity. Accordingly, we proposed a solution by integrating CWA into ACT-R models. A detailed elaboration on the CWA and ACT-R's structural compatibility across dimensions, their fundamental strengths as complements, and the functional competencies with integration was presented. This conceptual exploration demonstrated the feasibility of integrating the CWA and ACT-R, leading to improvements in model construction efficiency and domain-specific validity. We explored CWA and ACT-R for modeling humans in vehicle cybersecurity. While we were able to demonstrate a model, a follow-up study with human participants showed that drivers may not actively identify vulnerabilities and mitigate cyber threats. We then practically implemented and applied the integrated model, from model construction preparation to detailed rule development, guided by CWA’s Work Domain Analysis, Control Task Analysis, and Strategies Analysis, to simulate the SOC analysts' cybersecurity alert triage performance. The model construction process demonstrated better efficiency with a systematic approach, and the resulting model showed improvement trend in quantitative accuracy, domain-specific validity, and the interpretability of human adaptability and flexibility. However, the model is limited in capturing human exploratory behavior, prompting a brief test of using Generative AI (GAI) models to address this gap. This thesis is the first exploration and implementation of integrating CWA-guided domain-specific analysis with ACT-R’s computational capabilities to develop an integrated cognitive model for humans in complex work domains. The effort advances the development of cognitive modeling by providing theoretical grounding and practical insights for applying and extending cognitive models. Finally, we discuss whether GAI models might enhance cognitive modeling, as GAI capabilities become more available.Item type: Item , Advancing Freezing of Gait Heterogeneity Modeling through Subtype-aware Detection, Generative Augmentation, and Adaptive Prediction(University of Waterloo, 2025-12-16) Yu, XinyueFreezing of Gait (FOG) is a disabling symptom of Parkinson’s Disease (PD) that varies in manifestations and motion contexts. Its heterogeneity motivates subtype categorization such as manifestation-specific subtypes (akinesia, trembling, or shuffling) and motion-specific subtypes (gait-initiation, walking, or turning), with occurrence and frequency of subtype varying across patients. FOG detection and prediction have attracted significant research interest for their applications in daily monitoring, automated FOG dataset labeling, and on-demand activation of intervention devices. With respect to FOG detection, despite numerous promising Deep Learning (DL) FOG detection studies, few consider FOG heterogeneity. It remains unclear whether different subtypes require distinct detection strategies, and whether tailoring subtype-specific models could enhance detection generalizability across subtypes. Additionally, training a DL detection model with robustness and generalization across subtypes is limited by data scarcity and imbalances between FOG/non-FOG classes and among subtypes, while FOG generative augmentation is considered a promising solution. However, subtype-conditioned FOG generative augmentation has not been developed, and its effectiveness and advantages compared to simpler, cheaper classical augmentation methods on detection model performance remain unknown. Regarding FOG prediction, one gap lies in the limited adaptability and complexity of available labeling approaches for pre-FOG (transition state leading to FOG), which exhibits heterogeneity across subjects and FOG episodes. Analyzing pre-FOG heterogeneity with respect to FOG subtypes may help better interpret it, but is currently underexplored. Another gap with respect to existing prediction model design is the lack of a multi-horizon prediction function, which could specify FOG onset while simultaneously enabling both short- and long-term alarms. These gaps are addressed in this thesis through three projects, each detailed in a methodology chapter, focusing on subtype-aware FOG detection, subtype-conditioned FOG generative augmentation, and multi-horizon FOG prediction incorporating a soft, data-driven, adaptive pre-FOG labeling. The FOG detection chapter first categorizes FOG data into manifestation- or motion-specific subtypes via classifier or clustering methods and then derives their corresponding detection strategies as interpretable feature masks. This chapter then proposes a feature-mask-based Convolutional Neural Network (CNN) that explicitly embeds the identified strategies. Using waist-mounted 3D accelerometer data, a general CNN and subtype-specific CNNs are trained. The results show that according to feature-mask analysis, motion-specific subtypes share a common detection strategy, whereas manifestation-specific subtypes require distinct strategies. Manifestation models exhibit enhanced generalizability across subtypes compared to the general model, boosting the overall average FOG detection sensitivity by 10.95% ± 9.24% and specificity by 32.08% ± 9.01%. Conversely, motion models reduce the overall FOG sensitivity by 1.89% ± 8.74% and specificity by 5.17% ± 10.76%. Consequently, the detection strategy is mainly driven by manifestation composition of the data. The general model favors the dominant manifestation-specific subtype group(s), a bias corrected by tailored manifestation-specific strategies. No comparable benefit arises from motion models due to their similar manifestation compositions. This chapter reveals the detection strategies required by different FOG subtypes and demonstrates the effectiveness of subtype-specific tailoring in improving FOG detection generalizability. The FOG augmentation chapter proposes a subtype-aware FOG augmentation technique enabling training of DL models to perform consistently across subtypes. Specifically, it introduces Hi-CF cGAN, a two-stage model that generates subtype-conditioned FOG-like ankle accelerations that are realistic and diverse, as verified through visualization, UAMPs, and MMD comparison against real signals. This chapter evaluates Hi-CF cGAN’s effectiveness by training CNNs for FOG detection with both general (subtype-stratified) and personalized (subtype-variant, based on patient-specific subtype composition) augmentation via Hi-CF cGAN, benchmarking against classical augmentations and baseline (no augmentation). Compared to baseline, general augmentation with Hi-CF cGAN effectively improves average detection rates of FOG, trembling FOG, and especially the previously overlooked minor subtypes, shuffling FOG (from 66.8% to 81.6%) and akinesia FOG (from 58.7% to 77.9%). These improvements exceed those of classical augmentations, demonstrating superior realism, richness, and adaptability of Hi-CF cGAN -generated data in addressing FOG/non-FOG and subtype imbalances. Personalized augmentation further enhances accuracy on targeted subtype(s) compared to general augmentation, highlighting its potential for tailored model optimization. The FOG prediction chapter first proposes a soft, data-driven, and adaptive pre-FOG labeling approach that identifies potential pre-FOG windows using statistical signal properties, including Shannon entropy and auto mutual information, and data-driven features via a CNN-predicted FOG probability. This adaptive labeling effectively captures intensifying pre-FOG characteristics while approaching a FOG episode and generalizes effectively across subjects. The labeling results reveal that for motion-specific subtypes, turning shows the strongest and most statistically reliable pre-FOG trends, while gait-initiation lacks a clear pre-FOG pattern. For manifestation-specific subtypes, trembling exhibits the most statistically consistent pre-FOG trend, while shuffling has the weakest trend. Some subjects display strong general pre-FOG trends, while others only show strong pre-FOG trend with specific subtype(s), highlighting the value of subtype-specific pre-FOG labeling and the interpretability of pre-FOG heterogeneity via subtypes. Additionally, this chapter also proposes a sequence-to-sequence, multi-horizon CNN-transformer that predicts the FOG state for each of the next six seconds. Combined with the proposed adaptive labeling, the model predicts both a discrete FOG state and a soft FOG Score representing FOG probability. It achieves a low mean error of 11.4% ± 4.1% and above-benchmark Prediction horizons of 3.19 ± 0.34 s. Comparisons across labeling methods show that the adaptive labeling improves both window- and sequence-wise prediction accuracy and stability relative to fixed labeling, confirming its higher clarity and flexibility in pre-FOG identification. However, compared to no-pre-FOG labeling, the adaptive labeling demonstrates improved Prediction horizons and prediction success rate on transition sequence but reduced accuracy on non-transition sequence due to increased false alarms, which is a trade-off to consider in practical application. Collaboratively, these three chapters demonstrate the necessity and benefits of tailoring with respect to manifestation-specific subtypes for cross-subtype detection generalization, manifestation-conditioned FOG augmentation for data imbalance correction, and episode-adaptive pre-FOG labeling for reliable prediction, while also proposing innovative deep learning solutions for each specific FOG modeling problem.Item type: Item , Adaptation Pathways for Direct Air Capture Deployment in Canada(University of Waterloo, 2025-12-15) Motlaghzadeh, KasraThis study responds to the insufficient understanding of the uncertainties surrounding the demand, and the techno-economic and socio-political feasibility of deploying Direct Air Capture (DAC)—a Carbon Dioxide Removal (CDR) technology that may be essential for Canada’s net-zero target and potential post-net-zero obligations. Limited tools are available to analyze such uncertainties and their interactions to support adaptive decision-making under deep uncertainty. These gaps are addressed through three interconnected, systems-based approaches: Integrated Assessment Modeling (IAM) for quantitative scenario modeling, Cross-Impact Balance (CIB) analysis for qualitative scenario discovery, and Adaptation Pathways (AP) for decision-support under deep uncertainty. First, existing IAM studies were systematically reviewed and a national-scale IAM analysis with the Global Change Analysis Model (GCAM) was conducted to quantify key uncertainties shaping DAC deployment in Canada. The following factors are found to strongly influence Canada’s potential reliance on DAC: Socio-economic pathways, fossil-fuel dependence, international CDR obligations grounded in burden-sharing principles, and DAC cost trajectories. Second, the CIB scenario discovery method is employed to examine how these uncertainties—and additional socio-political factors not represented in quantitative models—interact based on expert elicitation with DAC specialists. The CIB analysis produces 15 internally consistent futures and identifies public acceptance and policy coherence as critical bottlenecks if they evolve unfavourably. Third, CIB scenarios are used to parameterize GCAM to quantify DAC demand under four internally consistent, CIB-informed futures. This integrated approach shows that Canada’s DAC requirements could range from 0 to 300 MtCO₂/year by 2075, with narrative explanations linking each scenario’s structural components to its resulting DAC trajectory. Finally, the AP framework is applied to DAC policy, mapping flexible and dynamic strategies across three key dimensions: economic feasibility, electricity supply, and CO₂ transport and storage. APs identify low-regret near-term actions (e.g., electricity grid expansion), reveal thresholds where strategies fail, and indicate when strategic shifts are needed. Methodologically, this thesis demonstrates (1) how semi-quantitative tools such as CIB can validate and enrich IAM scenarios with socio-political dynamics, and (2) how APs can translate uncertainty-rich futures into robust, actionable policy pathways for DAC deployment in Canada. Together, these methods provide a comprehensive decision-support framework for navigating the deep uncertainties surrounding DAC deployment in Canada.Item type: Item , An Investigation of Overt Visual Attention and Gaze Behaviour in Social Human-Robot Interaction and Human-Computer Interaction Contexts(University of Waterloo, 2025-12-01) Shaghaghi, SahandIn human-human and human-robot interaction gaze has a consequential role as a type of non-verbal communication behaviour, affecting the social interaction depending on gaze behaviour's characteristics. As such, gaze behaviour has been a topic of major research throughout the past number of years since a better understanding of gaze behaviour could lead to design of robot behaviour for social interactions. In the context of the human-human interaction (HHI) and human-robot interaction (HRI) studies, gaze behaviour has been seldom investigated while taking into consideration all social interaction elements including interaction partners' personalities and social roles in addition to the social context. There are a number of studies which investigate conversational roles and personality matching in relation to gaze behaviour in the context of HRI in separate studies. However, works which investigate gaze behaviour in tandem with these social interaction elements are needed since such a study will contextualize gaze behaviour in relation to variations in these social elements (e.g. gaze behaviour characteristics based on introverted and extroverted personalities) while taking into consideration the compounded effects of these social elements in combination. What this thesis accomplishes is incorporation of all these social elements in tandem with gaze, all under the umbrella of one body of research. Utilization of this integrative approach was inspired by recent HRI literature, encouraging the investigation of verbal and non-verbal social interaction elements together with social interaction elements. This thesis investigates gaze behaviour in the context of HRI while taking into account social role and designed personalities in robotic platforms. As the social context, this thesis explores dyadic human-robot interactions involving objects of discourse from a gaze-centric point of view while considering the robot's gaze-centric perspective and the participant's gaze-centric perspective. Four major studies are conducted in the context of this thesis to fulfill this exploration. Tools for recording overt visual behaviour are vital in conducting human-computer interaction (HCI) research. However, specific tools enabling the recording of these metrics in online settings, facilitating video viewing were not available, therefore Study 1 created the FocalVid platform. This platform collects cursor location attentional data for the participants in online settings such as Amazon Mechanical Turk. The cursor metrics gathered through this platform were then compared to eye tracking data and our rendition of another relevant platform (BubbleView). It was determined that human gaze and cursor movements are distinct but have similarities in relation to velocities and dwell timing. This platform allowed for large-scale data collection for HCI and HRI studies, which is not possible in the context of in-person studies. Personality and social role are major elements of social interactions; however, perception of designed introverted/extroverted personalities for the humanoid iCub robot were not previously examined and additionally these two elements have not been explored simultaneously in the previous literature involving the iCub robot. In the second study, I explore the participants' perception of a robot in interactions between a robot and a human actor utilizing recorded online scenarios. In this study, the robot takes on different social roles while embodying different personalities. The robot is either a teacher, a student or a collaborator while either introverted or extroverted. To conduct this study, the Amazon Mechanical Turk platform and HRI video recordings were used. I discovered the presence of perceiver effects in participants’ assessment of the robot’s Ten-Item Personality Inventory (TIPI) dimensions perception vs. self TIPI dimensions, where participants' self-assessment of their personality correlated to their assessment of robot’s personality. TIPI questionnaire is a measure used to assess personality dimensions. It was also determined that the designed robot personality was perceived accordingly by the participants. These findings indicated that even though participants’ self-assessment of their personality dimensions affects their perception of the robot, they could still perceive the robot’s designed personality as intended. Observation and analysis of people’s overt visual attention dynamics in HRI could allow for better understanding of these interactions however, such overt attention while considering social interaction elements have not been previously explored in detail. The third study investigated participants' overt visual attention in the context of dyadic social settings using the FocalVid platform. In this study, I was also interested in the efficacy of the use of the FocalVid platform to collect attention metrics relating to such social settings. This study, taking advantage of the HRI scenarios designed in Study 2 and using the FocalVid platform, recorded the cursor attentional data for participants while the robot was enabled with different social roles and personalities. It was determined that the robot’s social role and personality significantly affected the participants’ overt visual attention. It was also determined that the presence of the FocalVid platform did not adversely affect the perception of the robot. Gaze studies in Human Robot Interaction should investigate both the human partner’s gaze behaviour’s effect on the social interaction, in addition to the robot’s gaze behaviour’s effect on the social interaction. A limited number of studies have explored the effects of gaze-architecture-enabled robots' behaviour on social interaction. In the fourth study, after the design of gaze-based interaction architectures based on Social Gaze Space taxonomy in dyadic interactions involving objects of discourse, the effects of using these gaze interaction architectures for robot gaze control were evaluated utilizing eye tracking data and Human Robot Interaction questionnaires. Through this study, it was determined that the SGS-IA architecture led to higher visual engagement by the participants towards the robot’s face and eye region compared to the TutorSpotter architecture, which was used for comparison purposes. One of the main contributions of this thesis is the design and evaluation of these gaze-based interaction architectures for anthropomorphic humanoid robots involved in human-robot interactions. All four of these studies were geared towards gathering a better understanding of gaze behaviour in HRI and HCI. Studies 1 and 2 had a preparatory role to this end. Study 1 allowed us to design the FocalVid platform and to investigate the attention metrics gathered through this platform against gaze metrics in this Human Computer Interaction platform. Study 2 allowed us to design the Human Robot Interaction scenarios needed for Studies 3 and 4. Study 3 investigated gaze behaviour of the human interaction partner involved in Human Robot Interaction using the FocalVid platform, and in Study 4 we designed and evaluated a gaze interaction architecture for the iCub robot through an in-person Human Robot Interaction study. These studies allowed for better understanding of the role of gaze behaviour in social HRI settings. These studies also enabled us to design gaze-specific interaction architectures for the iCub robot.Item type: Item , Safety and Security of Reinforcement Learning for Autonomous Driving(University of Waterloo, 2025-11-27) Lohrasbi, SaeedehIn the context of autonomous driving, reinforcement learning (RL) presents a powerful paradigm: agents capable of learning to drive efficiently in unseen situations through experience. However, this promise is shadowed by a fundamental concern—how can we entrust decision-making to agents that rely on trial-and-error learning in safety-critical environments where errors may carry severe consequences? This thesis advances a step toward resolving this dilemma by integrating three foundational pillars: adversarial robustness, simulation realism, and model-based safety. We begin with a comprehensive survey of adversarial attacks and corresponding defences within the domains of deep learning (DL) and deep reinforcement learning (DRL) for autonomous vehicles. This survey reveals the porous boundary between safety and security—both natural disturbances and adversarial perturbations can destabilize learned policies. Motivated by this insight, we introduce the Optimism Induction Attack (OIA), a novel adversarial technique that manipulates an RL agent’s perception of safety, causing it to act with unwarranted confidence in hazardous situations. Evaluated in the context of an Adaptive Cruise Control (ACC) task, the OIA significantly impairs policy performance, exposing critical vulnerabilities in state-of-the-art RL algorithms. To counter the demonstrated threats, we present a systematic defence architecture. We develop REVEAL, a high-fidelity simulation framework designed to support the training and evaluation of safe RL agents under realistic vehicle dynamics, traffic scenarios, and adversarial conditions. By narrowing the gap between abstract simulation and real-world complexity, REVEAL facilitates rigorous and nuanced testing, which is essential for safety-critical applications. To enhance learning efficiency within this environment, we employ a transfer learning (TL) strategy: policies initially trained in simplified simulators (e.g., SUMO) are adapted and fine-tuned in REVEAL, leading to faster convergence and improved safety performance during both training and deployment. Central to our approach is the development of a Multi-Output Control Barrier Function (MO-CBF), which simultaneously supervises throttle and brake commands to enforce safety constraints in real time. Rather than relying on hard overrides, the MO-CBF operates cooperatively with the learning agent—gently adjusting unsafe actions and introducing corresponding penalties during training. This enables the agent not only to learn safe behaviour but also to internalize safety principles and anticipate potentially unsafe scenarios. Our empirical evaluation demonstrates the effectiveness of the proposed framework across a spectrum of disturbances, adversarial inputs, and realistic high-risk maneuvers. The results consistently show improved safety and robustness, highlighting the framework’s capacity to transform RL agents from vulnerable learners into trustworthy autonomous systems. In summary, this thesis presents a comprehensive methodology for safe and secure RL in autonomous driving. By grounding agent training in high-fidelity simulation, leveraging adversarial awareness, and embedding real-time model-based safety mechanisms, we provide a cohesive and scalable pathway toward deploying RL in the real world with confidence.Item type: Item , Heterogeneous Decomposition of Convolutional Neural Networks Using Tucker Decomposition(University of Waterloo, 2025-11-26) Mokadem, FrankConvolutional Neural Network (CNN) remain the architecture of choice for computer vision tasks on compute-constrained platforms such as edge and personal devices, delivering both close to state-of-the-art performance metrics and linear inference complexity with respect to input resolution and number of channels. However, the deployment of larger and more complex CNN architectures is limited by the restrained memory offered by such platforms. This brings about a need to compress pretrained CNN into smaller models in number of parameters while controlling for degradation in performance. This thesis tackles CNN compression using low rank approximation of convolution layers using Tucker Decomposition (TD). We introduce a new heuristics-based Neural Architectural Search procedure to select low rank configurations for the convolution tensors, which we call Heterogeneous Tucker Decomposition (HTD). Standard low rank approximation using TD factorizes and approximates convolution layers using uniform ranks for all convolution tensors, then applies a few fine–tuning epochs to recover degradation in performance. An approach we show to be suboptimal against a heterogeneous selection of ranks for each convolution layer, followed by same number of fine-tuning epochs. Our primary contribution is the development and evaluation of TD, which applies layers-pecific compression rate (low rank divided by full rank) inferred from a Neural Architectural Search (NAS) process. Furthermore, we introduce a sampling heuristic to efficiently explore the search space of layer-specific compression rates, thus preserving performance while significantly reducing search time. We present a mathematical formulation for the HTD optimization problem and an NAS algorithm to find admissible solutions. We test our approach on multiple varieties of CNN architectures: AlexNet, VGG16, and ResNet18, adapted for the MNIST classification task. Our findings confirm that HTD performs better than TD on all models tested. For the same compression rate, HTD enables to recover a higher precision after fine-tuning, with gains ranging from 1.2% to 5.8%. For equivalent accuracy targets, HTD delivers 15-30% higher compression rates than TD. This thesis advances Neural Architectural Search by highlighting the efficacy of heterogeneous tensor decomposition approaches. It provides a robust framework for their implementation and evaluation, with significant implications for deploying convolutional deep learning models in resource-limited settings. Future work will explore incorporating low-rank constraints as a regularization objective during training, potentially enabling end-to-end compression-aware optimization.Item type: Item , Designing for Trust: A Multi-Factor Investigation of Optometrists’ Perspectives on AI-Based Glaucoma Screening Systems(University of Waterloo, 2025-11-07) Karim, AliAlthough glaucoma screening AI models show strong performance, their integration into clinical practice remains limited. Clinicians often face barriers rooted in technological acceptance, with trust emerging as a key determinant of adoption. Prior research has emphasized explainability, but a broader exploration of factors affecting trust is needed. This study investigates multiple factors shaping trust in AI and translates them into design requirements for next-generation glaucoma screening clinical decision support systems (CDSS). In a previous study, two real-world glaucoma patient cases, each comprising three visits at different times, were presented under both unimodal conditions (fundus images only) and multimodal conditions (fundus images, optical coherence tomography, visual fields, and medical history) through a mock interface simulating an AI-based glaucoma screening support system. During these simulated visits, nineteen licensed optometrists interacted with the system and participated in follow-up interviews, where they were asked whether they trusted the system and to explain their reasoning. The objective of this thesis is to identify the factors influencing optometrists’ trust in an AI-powered glaucoma screening tool and to propose design recommendations that can enhance trust in future iterations. The interview data were analyzed using Braun and Clarke’s thematic analysis approach. The emerging themes indicate that trust in the AI system is shaped by multiple factors: (1) alignment with clinicians’ expectations of AI’s role: flagging tool vs. consultant; (2) completeness of information; (3) communications of performance metrics: accuracy, sensitivity, confidence scores, perceived consistency and perceived quality of training data (4) clinical relevance of outputs (trends, actionable recommendations, differential diagnosis); (5) transparency in risk factor weighting, exclusions, and considered variables; (6) decision alignment between optometrists and the AI, assessed across decision inputs, identified risk factors, their relative importance, recommended actions, and the gradient of concordance in final decisions; (7) optimized the AI for cautious screening to captures all potential cases; (8) interface usability supporting timely decisions; (9) users’ self-perceived expertise, occasionally leading to overreliance; (10) onboarding and training that highlighted the system’s features and limitations; and (11) increasing familiarity over time, which helped calibrate trust. Based on these findings, 17 design principles were proposed to guide the development of the next iteration of a trust-supportive interface for glaucoma screening decision support systems.Item type: Item , Manifold-Aware Regularization for Self-Supervised Representation Learning(University of Waterloo, 2025-11-04) Sepanj, Mohammad HadiSelf-supervised learning (SSL) has emerged as a dominant paradigm for representation learning, yet much of its recent progress has been guided by empirical heuristics rather than unifying theoretical principles. This thesis advances the understanding of SSL by framing representation learning as a problem of geometry preservation on the data manifold, where the objective is to shape embedding spaces that respect intrinsic structure while remaining discriminative for downstream tasks. We develop a suite of methods—ranging from optimal transport–regularized contrastive learning (SinSim) to kernelized variance–invariance–covariance regularization (Kernel VICReg)—that systematically move beyond the Euclidean metric paradigm toward geometry-adaptive distances and statistical dependency measures, such as maximum mean discrepancy (MMD) and Hilbert–Schmidt independence criterion (HSIC). Our contributions span both theory and practice. Theoretically, we unify contrastive and non-contrastive SSL objectives under a manifold-aware regularization framework, revealing deep connections between dependency reduction, spectral geometry, and invariance principles. We also challenge the pervasive assumption that Euclidean distance is the canonical measure for alignment, showing that embedding metrics are themselves learnable design choices whose compatibility with the manifold geometry critically affects representation quality. Practically, we validate our framework across diverse domains—including natural images and structured scientific data—demonstrating improvements in downstream generalization, robustness to distribution shift, and stability under limited augmentations. By integrating geometric priors, kernel methods, and distributional alignment into SSL, this work reframes representation learning as a principled interaction between statistical dependence control and manifold geometry. The thesis concludes by identifying open theoretical questions at the intersection of Riemannian geometry, kernel theory, and self-supervised objectives, outlining a research agenda for the next generation of geometry-aware foundation models.Item type: Item , Towards a Novel Optical Spectroscopy Technique Using Photon Absorption Remote Sensing(University of Waterloo, 2025-11-04) Dhillon, JodhOptical spectroscopy has shown great promise in the field of biomedical research. For example, works employing traditional spectroscopy approaches have demonstrated that analyzing a sample’s optical response to incoming light can effectively differentiate between healthy and diseased tissue. However, these techniques suffer from limitations due to the fact that they typically capture signals from only a single light-matter interaction type, such as absorption, scattering or fluorescence. Therefore, many traditional methods are constrained in terms of the types of samples they can feasibly analyze, as well as, potentially, the depth of their sample characterization, as they do not focus on capturing relevant information from other interaction modalities. This work employs photon absorption remote sensing (PARS) to overcome these limitations. PARS is a novel all-optical imaging technique capable of capturing radiative and non-radiative relaxation processes following electronic photon absorption. This thesis explores the initial development of the first PARS system specifically designed and optimized for optical spectroscopy applications, aimed at studying wavelength-dependent relaxation processes to characterize a wide range of liquid samples. The first step of this work was to build a non-radiative PARS spectroscopy system capable of accurately capturing the thermal and acoustic relaxation processes that arise from different ultra-violet (UV) excitation wavelengths. These signals were processed and used to construct a non-radiative PARS absorption spectrum for each sample of interest. These spectra were benchmarked against the absorption data collected from a NanoDrop spectrophotometer, which served as the ground truth in this work. This study revealed that for certain samples, such as eumelanin, which is highly absorbent to UV light and relaxes almost all absorbed energy non-radiatively, the non-radiative PARS spectroscopy system is capable of generating highly accurate absorption spectra. However, this system did not generate as close to ground truth spectra for samples that do not have as strong UV absorbing tendencies and are not as non-radiative in nature. The second step of this work was to integrate a radiative relaxation arm into the developed non-radiative PARS spectroscopy system. This pathway was configured to collect fluorescence emission spectra, which represent radiative sample relaxation, simultaneously with the collected non-radiative data. Radiative PARS absorption spectra were generated for each sample. In this way, the developed PARS system combines absorption (monitoring both relaxation pathways) and fluorescence emission spectroscopy onto a single bench-top system. The radiative PARS absorption spectra were compared to the ground truth, which revealed that molecules that are highly fluorescent in nature are more appropriately studied through the radiative relaxation arm than the non-radiative pathway. Total absorption spectra, which combine the non-radiative and radiative absorption data, were also generated, and it was determined that the absorption profiles of certain samples, such as NADH, are best studied using this approach. The final step of this work was to use the collected total absorption and fluorescence emission data from the PARS spectroscopy system to identify the composition of different mixtures of craft red and blue ink samples. Traditional linear and generalized bilinear models were employed to perform this unmixing and the results from this study indicate that the combination of the absorption and fluorescence data collected on this system allows for a more accurate identification of a mixture’s components than either data source individually. This suggests that the PARS spectroscopy system provides an increased level of detail in sample characterization compared single-modality spectroscopy systems. Ultimately, this research lays the groundwork for the development of a PARS spectroscopy system capable of being deployed in clinical settings to study samples and help inform diagnoses. This work demonstrates the feasibility of leveraging PARS for optical spectroscopy and presents a system design and framework that can be further iterated upon to enhance performance and enable a robust characterization of relevant and complex biological samples.Item type: Item , Analysis of Limitations of AI Tools for Pediatric Speech Language Pathology Documentation and Mitigation Strategies(University of Waterloo, 2025-10-17) Tuinstra, TiaSpeech Language Pathology (SLP) is a therapy discipline offered by KidsAbility, a pediatric rehabilitation clinic in Southern Ontario. Documentation is a key part of SLP and other therapy practice guidelines and can take up significant portions of a therapist’s time. AI-based clinical documentation aids have been developed to help reduce this burden, and one such tool - MutuoHealth’s AutoScribe - has been piloted by KidsAbility. Though this AI tool has been beneficial to some therapy disciplines, the SLP clinicians face unique challenges when using these tools. The model seemed unable to recognize speech therapy strategies or to parse the play-based script of pediatric appointments. This thesis seeks to explore the issues SLPs encounter with AI documentation tools and propose potential approaches to mitigate these issues. The AI documentation process was divided into the transcription pipeline, where an audio file input produced a corresponding transcript output, and the generation pipeline, where an input transcript produced a draft SOAP note. The SLPs who had participated in the AutoScribe pilot test were interviewed about their experiences with the tool and its integration into their workflows. The issues reported by the therapists were sorted into those more closely related to the transcript and those more closely related to the drafted SOAP note. A set of sample SLP appointments from KidsAbility were gathered from an extended AutoScribe pilot, with 10 selected as examples of appointment data (audio, transcripts, drafted and final SOAP notes) to test the transcription and generation pipelines. An augmented automatic speech recognition (ASR) pipeline based on a Whisper model was used to test improvements to the transcript. However, the generated transcripts were not significantly improved from the pilot test. Instead, ground truth transcriptions were manually created from the audio files to use for testing the generation pipeline. For SOAP note generation, the addition of discipline-specific context tailored to appointment type was tested. This context was curated in collaboration with SLPs from KidsAbility to include SOAP templates, definitions of key concepts, and information about speech data. A Llama 3.3 70B model was used for SOAP note generation with ground truth transcriptions and SLP specific RAG-adjacent information as context. The input context was optimized over several iterations based on clinicians’ evaluations of generated SOAP note quality. KidsAbility’s SLPs had flagged sessions targeting speech practice as having particular difficulties with AutoScribe. The model seemed unable to make inferences about the child’s speech quality from the transcript alone. Methods of quantitatively assessing speech based on session audio were explored as ways to provide additional context on speech quality to the SOAP generation model. A sample appointment was selected for testing, and child speech samples of the targeted sound were sliced from the audio and assigned quality categories. These samples were then compared against correct productions using the cosine distance between their mel-spectrograms. The samples were also passed through a phoneme-based ASR model to get the layer activations. The cosine distances and layer outputs were then tested as predictive measures of articulation accuracy, with layer outputs yielding the best results. The resulting speech accuracy scores were then passed into the generation model as additional context, with the output containing correct statements about the nature of the child’s articulations. Though clinicians’ availability limited the extensiveness of generated SOAP note evaluations, the SOAP notes generated with SLP-specific context showed improvement compared to the basic model generation. The model also tended to repeat information from previous SOAP notes if examples were provided. It was found that quantitative speech analysis does seem possible using phoneme model layer activations and cosine distances between the mel-spectrograms of correct articulations. Based on these findings, further optimizations to the generation pipeline and work on making effective AI tools for KidsAbility’s EY SLPs will continue.Item type: Item , Using eye tracking to study the takeover process in conditionally automated driving and piloting systems(University of Waterloo, 2025-10-08) Ding, WenIn a conditionally automated environment, human operators are often required to resume manual control when the autonomous system reaches its operational limits — a process referred to as takeover. This takeover process can be challenging for human operators, as they must quickly perceive and comprehend critical system information and successfully resume manual control within a limited amount of time. Following a period of autonomous control, human operators’ Situation Awareness (SA) may be compromised, thus potentially impairing their takeover performance. Consequently, investigating potential approaches to enhance the safety and efficiency of the takeover process is essential. Human eyes are vital in an individual’s information gathering, and eye tracking techniques have been extensively applied in the takeover studies in previous research works. The current study aims at enhancing the takeover procedure by utilizing operators’ eye tracking data. The data analysis methods include machine learning techniques and the statistical approach, which will be applied to driving and piloting domains, respectively. Simulation experiments were conducted in two domains: a level-3 semi-autonomous vehicle in the driving domain and an autopilot-assisted aircraft landing scenario in the piloting domain. In both domains, operators’ eye tracking data and simulator-derived operational data were recorded during the experiments. The eye tracking data went through two categories of feature extractions: eye movement features linked predominantly to fixation and saccades, and Area-of Interest (AOI) features associated with which AOI the gaze was located. Eye tracking features were analyzed using both traditional statistical techniques and machine learning models. Key eye tracking features included fixation-based metrics and AOI features, such as dwelling time, entry count, and gaze entropy. Operators’ SA and takeover performance were measured by a series of domain-specific metrics, including Situation Awareness Global Assessment Technique (SAGAT) score, Hazard Perception Time (HPT), Takeover Time (TOT) and Resulting acceleration. Three research topics were discussed in the current thesis and each topic included one driving study and one piloting study. In topic 1, significant differences in eye movement patterns were found between operators with higher versus lower SA, as well as between those with better and worse takeover performance. Besides the notable differences in various Area-of-Interests (AOIs) across three pre-defined Time windows (TWs), in the driving domain, drivers with a better SA and better takeover performance showed inconsistent eye movement patterns after the Takeover Request (TOR) and before they perceived hazards. In the piloting domain, pilots with shorter TOT showed more distributed and complex eye movement pattern before the malfunction alert and after resuming control. During the intervening period, their eye movements were more focused and predictable, indicating fast identification of necessary controls with minimal visual search. In topic 2, significant differences in eye movement patterns were observed between younger and older drivers, as well as between learner and expert pilots. As for driving domain, older drivers exhibited more extensive visual scanning, indicating difficulty in effectively prioritizing information sources under time pressure. In piloting domain, expert pilots not only allocate more attention to critical instrument areas but also dynamically adjust their scanning behavior based on the current tasks. In topic 3, machine learning models trained on eye tracking features successfully performed binary classification for both SA-related and takeover performance related metrics. Model performance was evaluated using standard classification metrics, including accuracy, precision, recall, F1-score, and Area Under the ROC Curve (AUC). Finally, comparisons were made across Topics 1 and 2, as well as between the driving and piloting domains. The results suggest that better operators can flexibly adapt their gaze strategies to meet task demands, shifting between broad visual scanning and focused searching when appropriate. This shift in patterns underscores the importance of accounting for the specific Time window (TW) when interpreting operators’ eye movements. Overall, this thesis advances the understanding of different eye movement patterns during the takeover process by exploring a range of eye tracking features. The findings support the development of operator training programs and the design of customized interfaces to enhance the safety and efficiency of takeover performance.Item type: Item , Long-distance Travel in Canada: Multimodal Modeling with a National Network(University of Waterloo, 2025-09-24) Hajimoradi, MoloudLong‑distance (LD) travel comprises a disproportionately large share of total passenger-kilometers despite representing a small fraction of trip counts. Yet LD travel remains underexamined in Canada’s vast geographic context. This thesis develops and applies a comprehensive modeling framework to analyze LD trip generation and mode choice for Canadian residents, leveraging data from Statistics Canada’s National Travel Survey (NTS) (January 2018–February 2020) and a new national multimodal transportation network construct for this thesis. The network integrates geospatial centroids for Census Subdivisions with travel-time estimates for automobile, air, intercity rail, and bus modes. Trip generation was examined through both disaggregate (person‑level hurdle and zero‑inflated count models) and aggregate (origin‑destination zone‑pair hurdle models) approaches, incorporating socioeconomic variables (age, income, gender), trip attributes (distance, season), and accessibility measures. Results indicate that accessibility, rather than traditional demographics, may be an important variable in predicting whether a LD trip occurs: with lower local accessibility and greater distance to airports increasing the likelihood of at least one trip in the given month. However, once the trip “hurdle” is crossed, trip counts are less sensitive to accessibility, underscoring behavioral impacts. Even with the very large dataset, models are very weak suggesting that travel surveys are a weak method for understanding LD travel. Mode choice was analyzed using a Multinomial Logit (MNL) model alongside Machine Learning (ML) classifiers (Decision Trees, Random Forests, Support Vector Machines, Neural Networks). While MNL yields interpretable elasticities, with intercepts confirming preference for the driving mode and positive income effects for air travel, ML methods achieve superior predictive power. Feature importance from Random Forests highlights travel time (especially driving) as the dominant determinant, followed by accessibility, with sociodemographic and seasonal factors playing secondary roles. Mode choice models with alternative specific travel times are viable with publicly available data and these results support the need to seriously consider use of ML in LD mode choice even though understanding the influence of individual behavioral factors becomes more limited. Long-distance passenger travel demands models are not typically available in Canada despite their utility for infrastructure, service and environmental planning. This thesis research demonstrates models are viable with existing publicly available data.Item type: Item , Teach a robot to assemble a bolt to a nut with a handful of demonstrations(University of Waterloo, 2025-09-23) Yao, XueyangThis thesis investigates data-efficient methods for learning and executing complex, multistep robotic manipulation tasks in unstructured environments. A two-level hierarchical framework is first proposed, in which high-level symbolic action planning is performed using Vector Symbolic Architectures (VSA), and low-level 6D gripper trajectories are modeled using Task-parameterized Probabilistic Movement Primitives (TP-ProMPs). This approach enables both interpretable planning and motion generalization from limited human demonstrations. Building on this foundation, the thesis introduces the Task parameterized Transformer (TP-TF), a unified model that jointly predicts gripper pose trajectories, gripper states, and subtask labels conditioned on object-centric task parameters. Inspired by the parameterization strategy of Task-parameterized Gaussian Mixture Models (TP-GMMs), the TP-TF retains the data efficiency of classical Programming by demonstration (PbD) methods while leveraging the expressiveness and flexibility of transformer-based architectures. The model is evaluated on a real-world bolt–nut assembly task and achieves a 70% success rate with only 20 demonstrations when combined with visual servoing for precision-critical phases. The results highlight the potential of combining structured representations with deep sequence modeling to bridge symbolic reasoning and continuous control. This work contributes a step toward scalable, more interpretable, and data-efficient learning frameworks for autonomous robotic manipulation.Item type: Item , Experimental Study on the Vibration Response of a Jackleg Hammer Drill(University of Waterloo, 2025-09-22) Kuppa, SrividyaThis thesis presents an experimental investigation into the response of mechanical vibration in jackleg hammer drills during underground rock drilling operations. While previous studies have primarily focused on vibration exposure at the handle or operator interface, this work analyzes vibration transmission through the full structure of the drill to better understand internal component behavior under realistic working conditions. Vibration data were collected using uniaxial accelerometers mounted on four key components-the fronthead, main cylinder, backhead, and handle, with measurements recorded along three spatial axes. Testing was conducted in operational environments, capturing variations across distinct drilling phases, including collaring, sustained drilling, and retraction. The acquired data were processed using time and frequency domain methods, including Fast Fourier Transform (FFT) and Root Mean Square (RMS) analysis. Results revealed significant directional dependence of vibration, with the axial (X-axis) component exhibiting the highest amplitudes during drilling. During collaring, when the drill bit lacks a guiding groove, vibration increased across all axes. A resonance condition was observed at approximately 142 Hz in the handle assembly, suggesting localized amplification potentially due to dynamic interaction between structural components. By characterizing dominant frequencies, directional behavior, and phase-specific amplification trends, this study provides a system-level understanding of vibration response in jackleg drills. The findings establish a foundation for future research aimed at developing targeted design improvements and vibration mitigation strategies to enhance operator safety and tool performance.