Systems Design Engineering
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914
This is the collection for the University of Waterloo's Department of Systems Design Engineering.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item Switching User Perspectives: Using Virtual Reality to Understand Users for User Experience Research and Design(University of Waterloo, 2025-09-22) Lee, JieunVirtual reality (VR) technology can be used as a tool in user experience (UX) research and design to understand and empathize with users. Several works show that the perspective-taking ability of VR in a simulated, immersive environment is helpful in fostering empathy and understanding, which is a crucial first stage within the UX design thinking process. However, it remains unclear how VR could be used to understand multiple users, reflecting common UX research projects, and what kinds of perspective-taking interaction better facilitate empathy and understanding of the users. This thesis introduces switching perspectives interaction in VR to understand different user problems, reflecting the nature of UX research. We conducted a mixed-methods, between-participant study comparing the switching perspectives to two different user perspectives in order to investigate how different types of perspective-taking influence researchers’ and designers’ understanding and empathy for user problems and needs. The findings show that both affective and cognitive empathy are fostered across the different perspectives, shown through the avatar embodiment questionnaire and the interpersonal reactivity index questionnaire. The qualitative data indicate that Switching Perspectives influences participants to see the bigger picture of the user problem, prompting them to ideate solutions that impact both user groups compared to Single-Perspective. With these findings, my thesis aims to explain how and why these understandings and empathy are facilitated towards specific user groups. I conclude this thesis with suggestions on using Switching Perspectives and Single-Perspectives in UX research, with recommendations on the general usage of perspective taking in VR to empathize better and understand user problems.Item From Far-Field Dynamics to Close-Up Confidence: Action Recognition Across Varying Camera Distances(University of Waterloo, 2025-09-22) Buzko, KseniiaHuman action recognition (HAR) refers to the task of identifying and classifying human actions within videos or sequences of images. This field has gained significant importance due to its diverse applicability across domains such as sports analytics, human-computer interaction, surveillance, and interpersonal communication. Accurate action recognition becomes especially difficult when the camera distance changes, because the cues that matter shift with scale. For instance, a close-up hinges on facial emotion (such as smiles and eye gaze), whereas a medium shot relies on hand gestures or objects being manipulated. In the context of HAR, we distinguish two primary scenarios that illustrate this challenge. The first is the far-field setting, characterized by subjects positioned at a distance and often exhibiting rapid movement, which leads to frequent occlusions. This scenario is commonly observed in sports broadcasts, where capturing the game’s dynamics is essential. In contrast, the near-field setting involves subjects that are nearby and tend to remain relatively static. This setting enables the capture of subtle yet informative gestures, similar to those observed in presenter-focused videos. Although most studies treat these regimes separately, modern media (films, replays, vlogs) cut or zoom fluidly between them. An effective recognizer must therefore decide dynamically which cues to prioritize: facial emotion in tight close-ups, hand or torso motion in medium shots, and full-body dynamics in wide views. Despite substantial progress, current HAR pipelines rarely adapt across that zoom continuum. This thesis therefore asks: What scale-specific hurdles confront human action recognition in far-field, near-field, and zoom-mixed scenarios, and how can insights from separate case studies keep recognition robust when the camera sweeps from full-body scenes to tight close-ups and back again? To answer, we contribute three scale-aware systems: 1. Hockey Action Identification and Keypose Understanding (HAIKYU) (far-field). For hockey broadcasts, we introduce temporal bounding-box normalization, which removes camera-induced scale jitter, and a 15-keypoint skeleton that adds stick endpoints. Combined with normalization, this improves Top-1 accuracy from 31% to 64%, showing that stick cues are indispensable for ice-hockey actions. 2. Confidence Fostering Identity-preserving Dynamic Transformer (CONFIDANT) (near-field). We curate a 38-class micro-gestures dataset and train an upper-body action recognizer that flags unconfident cues, such as folding arms, crossing fingers, and clasping hands. A diffusion-based video editor then rewrites these segments into confident counterparts, serving as a downstream demonstration of fine-grained recognition. 3. Scale-aware routing framework for mixed-zoom action recognition (Zoom-Gate) (zoom-mixed). A lightweight zoom score derived from the bounding-box area and the density of detected keypoints routes each tracklet to the specialist model best suited to that scale. Experiments confirm that this scale-aware routing, combined with context-specific skeletons, delivers robust performance across mixed-zoom datasets. Collectively, these contributions demonstrate that coupling scale-aware preprocessing with context-specific skeletons can maintain pose-centric HAR reliability across the zoom spectrum. The resulting frameworks open avenues for real-time segmentation, multi-view fusion, and ultimately a unified, scale-invariant action understanding pipeline.Item Understanding AI’s Impact on Clinical Decision-Making: A Comparative Study of Simple and Complex Primary Care Scenarios(University of Waterloo, 2025-09-19) Mehri, SormehClinical decision-making is a complex cognitive process shaped by multiple factors, including cognitive biases, clinical context, and the integration of healthcare technologies. This thesis investigates how the introduction of artificial intelligence (AI)-enabled decision support tools influences clinical reasoning processes in primary care settings. Using Cognitive Work Analysis (CWA), Decision Ladder (DL) frameworks, and content analysis methods, this study qualitatively examines clinician decision-making behaviors across traditional electronic medical record (EMR) environments and AI-supported scenarios. Fourteen clinicians from Ontario, Canada, participated in scenario-driven sessions involving routine (uncomplicated urinary tract infections) and complex (mental health distress) cases. Analysis revealed distinct cognitive shortcuts, shifts, and reliance patterns influenced by AI. Specifically, AI systems reinforced heuristic-driven decisions for routine cases but introduced additional cognitive demands in complex scenarios due to information integration requirements. Visual emphasis in the DLs highlighted AI-driven cognitive shortcuts and behavior modifications. Limitations include scenario-driven constraints and a small, region-specific sample with similar EMR and AI experiences. Future research should explore mid-complexity scenarios, incorporate diverse clinician populations, and evaluate long-term effects of AI integration on clinical reasoning. This work contributes to understanding the nuanced interplay between cognitive processes and AI technology, informing user-centered design strategies for healthcare decision support systems.Item Studying the Biomechanics of a Wheelchair Basketball Free Throw using Pose Estimation(University of Waterloo, 2025-09-16) Mohammad, HishamWheelchair basketball is a popular Paralympic sport where athletes with varying disabilities compete under a point-based classification system. Lower-class athletes (1.0–2.5), with higher levels of disability, often struggle to engage their trunk and core muscles, while higher-class athletes (3.0–4.5) have greater functional ability and utilize their trunk extensively. Coaches must consider these functional disparities when formulating strategies and designing individualized training regimens. Consistent free-throw shooting is critical in wheelchair basketball, as it offers an uncontested scoring opportunity. Higher-class athletes, who incorporate trunk motion, rely less on their arms for force generation, resulting in distinct shooting mechanics. Given the biomechanical variability arising from these physical differences, understanding individual shooting techniques is vital for optimizing performance. Motion capture technologies are widely employed to analyze and improve athletic movements. However, traditional systems, such as wearable sensors and marker-based motion tracking, are often costly, time-intensive, and restrictive to mobility. Markerless motion capture systems address these limitations using computer vision techniques like pose estimation. Convolutional neural networks (CNNs) trained on large human image datasets can accurately detect joints and limbs, enabling real-time analysis. Commercial systems typically require multiple cameras, but deploying pose estimation CNNs on mobile devices allows motion analysis using only a built-in camera, enhancing portability and accessibility for sports training and biomechanical research. This research focuses on designing and deploying pose estimation models within a mobile application to analyze the shooting arm's motion during a basketball free throw, with specific considerations for wheelchair basketball players. The pose estimation models, trained on a COCO-WholeBodyTM dataset to detect fingertip positions, were deployed on an iPhone and tested for accuracy and computational performance, particularly real-time motion analysis. The derived joint positions are used to calculate kinematic and dynamic metrics, including joint angles and torques. The system's joint angle calculations were compared against the Vicon motion capture system. While upper arm and elbow angle errors had a root mean squared error (RMSE) within an acceptable range (less than 20◦), wrist angle errors exceeded 65◦ due to limitations in pose estimation accuracy and the iPhone camera's frame rate. To demonstrate the system's utility, two shooting studies were conducted: (1) a comparison of biomechanics between one-motion and two-motion shooting techniques and (2) a biomechanical analysis of the shooting arm contrasting a national-level class 1 wheelchair basketball athlete with class 4.5 able-bodied participants shooting from a basketball wheelchair.Item Studying Immersive Deception: Manifestations and User Perceptions of Deceptive Design in Commercial Virtual Reality(University of Waterloo, 2025-07-03) Hadan, HildaDeceptive Design (formerly “dark patterns”) refers to design practices that distort or impair users’ ability to make informed decisions, regardless of intent. As immersive technolo- gies, such as Virtual Reality (VR) and Augmented Reality (AR), transform people’s daily experiences, their immersive virtual environments unleash a highly engaging experience while enabling new opportunities for deceptive design strategies that users cannot easily recognize or resist. Consequently, ethical and privacy concerns are expanding into these environments. While previous research has examined deceptive design issues in websites, mobile apps, games, and gamification, the extent of these problems in immersive environ- ments remains largely unexplored. This thesis investigates deceptive design in immersive environments with a specific attention to VR. It identifies deceptive design that presents in VR and emerges from VR technology’s unique properties, and examines their impacts from users’ perspective. We first conducted a systematic literature review to synthesize the state-of-the-art research on deceptive design. This review revealed potential deceptive strategies that can be employed in immersive environments, and those that can be enabled by the large amount of user data collected by these technologies. However, most of the existing literature focused on hypothetical scenarios rather than examining deceptive design as it appears in commercially available applications. Informed by the findings from this review, we surveyed experienced users about their awareness of data practices in immersive technologies, examined deceptive design in commercially available VR applications, and compared these findings with those from traditional computer platforms. To ensure consistent and comparable deceptive design analyses across these platforms, we developed a Deceptive Design Assessment Guide grounded in foundational deceptive design literature. This Assessment Guide was applied and validated in two studies that examined how deceptive practices manifest and influence user experience in exemplary computer and VR applications. Our findings show that the deceptive design in VR applications relies heavily on 2D interfaces, such as dialogue windows and checkboxes, rather than fully integrating VR-specific properties. Hypothesized scenarios from the literature, such as perception-hacking and emotional-based manipulation, were not observed in our selected VR applications. Certain VR properties (e.g., realistic simulation, virtual-physical barrier) amplified the impacts of deceptive design in users’ decision-making process but did not directly enable it. While users cannot point out specific design elements that used deceptive practices, they still expressed a general discomfort and feeling of manipulation. Nevertheless, many users felt powerless in protecting themselves or asserting their autonomy, and perceived deceptive design as a standardized industry practice with no possible escapes. Our research has implications for future research, and immersive technology design, development, and regulation for building better industry design standards and stronger user protections. For future researchers, the findings provide guidance on fostering user awareness through effective educational strategies, expanding theoretical approaches for understanding deceptive design in immersive environments, and refining user-centered empirical approaches for identifying and evaluating deceptive practices. For designers and developers, this thesis offers a structured Assessment Guide and actionable recommendations for supporting the creation of ethical and user-centered immersive applications that respect privacy and autonomy. For immersive technology regulation, this thesis identifies the limitations of current regulations and provides practical advice for expanding and strengthening regulatory frameworks, enforcing transparent privacy communication and ethical industry design standards tailored to immersive technologies. In conclusion, this thesis advances the understanding of deceptive design in commercially available VR applications, delivers actionable strategies for identifying and mitigating deceptive practices, and establishes a foundation for cross-disciplinary collaborations to protect user well-being in immersive environments.Item Addressing Domain Shifts for Computer Vision Applications via Language(University of Waterloo, 2025-05-23) Liu, ChangSemantic segmentation is used in safety-critical applications such as autonomous driving and cancer diagnosis, where accurately identifying small and rare objects is essential. However, pixel-level annotations are expensive and time-consuming, and distribution shifts (e.g. daytime to snowy weather in self-driving, color variations between tumor scans across hospitals) between datasets further degrade model generalization capabilities. Unsupervised domain adaptation for semantic segmentation (DASS) addresses this challenge by training models on labeled source distributions and adapting them to unlabeled target domains. Existing DASS methods rely on either vision-only approaches or language-based techniques. Vision-only frameworks, such as masking and utilizing multi-resolution crops, implicitly learn spatial relationships between different image patches but often suffer from noisy pseudo-labels biased toward the source domain. To mitigate noisy predictions, language-based DASS methods leverage generalized representations from large-scale language pre-training. However, those approaches use generic class-level prompts (e.g., "a photo of a \{class\}") and fail to capture complex spatial relationships between objects, which are key for dense prediction tasks like semantic segmentation. To address these limitations, we propose LangDA, a language-guided DASS framework that enhances spatial context-awareness by leveraging vision-language models (VLMs). LangDA generates scene-level descriptions (e.g., "a pedestrian is on the sidewalk, and the street is lined with buildings") to encode object relationships. At an image-level, LangDA aligns an image's feature representation with the corresponding scene-level text embedding, improving the model’s ability to generalize across domains. LangDA eliminates the need for cumbersome manual prompt tuning and expensive human feedback, ensuring consistency and reproducibility. LangDA achieves state-of-the-art performance on three self-driving DASS benchmarks: Synthia to Cityscapes, Cityscapes to ACDC, and Cityscapes to DarkZurich, surpassing existing methods by 2.6\%, 1.4\%, and 3.9\%, respectively. Ablation studies confirm the effectiveness of context-aware image-level alignment over pixel-level alignment. These results demonstrate LangDA’s capability to leverage spatial relationships encoded in language to accurately segment objects under domain shift.Item New Attack Detection Methods for Connected and Automated Vehicles(University of Waterloo, 2025-05-08) Bian, ShuhaoEnsuring the security of Connected and Automated Vehicles (CAVs) against adversarial threats remains a critical challenge in cyber-physical systems. This thesis investigates attack detection methodologies and presents novel dual-perspective detection frameworks to enhance CAVs resilience. We first propose a vehicle dynamics-based attack detector that integrates the Unscented Kalman Filter (UKF) with machine learning techniques. This approach monitors physical system behaviour and identifies anomalies when sensor readings deviate from predicted states. Our enhanced model captures nonlinear vehicle dynamics while maintaining real-time performance, enabling the detection of sophisticated attacks that traditional linear models would miss. We develop a complementary trajectory-based detection framework that analyzes driving behaviour rationality to address the limitations of purely physics-based detection. This system evaluates vehicle trajectories within their environmental context, incorporating road conditions, traffic signals, and surrounding vehicle data. By leveraging neural networks for trajectory prediction and evaluation, our approach can identify malicious interventions even when attackers manipulate vehicle behaviour within physically plausible limits. Integrating these two detection perspectives—one based on vehicle dynamics modelling and the other on trajectory rationality analysis—provides a comprehensive security framework that significantly improves detection accuracy while reducing false positives. Experimental results demonstrate our system’s effectiveness against various attack vectors, including false data injection, adversarial control perturbations, and sensor spoofing attacks. Our research contributes to autonomous vehicle security by developing a holistic detection approach that considers both immediate physical anomalies and broader behavioural inconsistencies, enhancing system resilience against increasingly sophisticated cyber-physical threats.Item Towards Urban Digital Twins With Gaussian Splatting, Large-Language-Models, and Cloud Mapping Services(University of Waterloo, 2025-05-08) Gao, KyleComputer Vision Remote Sensing Gaussian Splatting Point Cloud 3D Modelling Urban Digital Twin GIS Large Language ModelsItem Towards the development of an all-optical, non-contact, photon absorption remote sensing (PARS) endomicroscope for blood vasculature imaging(University of Waterloo, 2025-05-06) Warren, AlkrisThe need for high-resolution, label-free imaging techniques has spurred the development of advanced endoscopic technologies for real-time tissue characterization. This thesis presents the design, development, and validation of the first forward-viewing, non-contact, all-optical Photon Absorption Remote Sensing (PARS) endomicroscope for in vivo vascular imaging. The proposed system is designed to leverage the endogenous optical absorption of hemoglobin to achieve high-resolution contrast, without the use of exogenous labels or acoustic coupling, addressing longstanding limitations of conventional absorption-based and scattering-based imaging modalities.Two prototype designs were developed using image guide fiber (IGF) technology and achromatic graded-index (GRIN) lenses, with systematic de-risking experiments guiding their evolution. The first prototype (P1) achieved a resolution of ~1 µm and signal-to-noise ratio (SNR) of 22 dB, demonstrating the feasibility of high-fidelity PARS imaging within a 1.6mm outer diameter (OD) device footprint. A second design (P2) was introduced to address constraints in working distance and imaging depth for in vivo use, trading resolution for improved accessibility in biological tissues. This work establishes a novel platform for PARS miniaturization and integration with widefield endoscopy, positioning the technology for future applications, including real-time, in situ virtual biopsies, blood oxygenation measurement, and surgical guidance within internal bodily cavities. The results represent a foundational advancement in the translation of PARS microscopy to clinical settings and lay the groundwork for real-time, high-resolution endoscopic diagnostics.Item Encoding FHIR Medical Data for Transformers(University of Waterloo, 2025-04-29) Yu, TrevorThe open source Fast Healthcare Interoperability Resources (FHIR) data standard is becoming increasingly adopted as a format for representing and communicating medical data. FHIR represents various types of medical data as resources, which have a standardized JSON structure. FHIR boasts the advantage of interoperability and can be used for electronic medical record storage and, more recently, machine learning analytics. Recent trends in the machine learning field have been the development of large, foundation models that are trained on large volumes of unstructured data. Transformers are a deep neural network architecture for sequence modelling and have been used to build foundation models for natural language processing. Text is input to transformers as a sequence of tokens. Tokenization algorithms break text into discrete chunks, called tokens. Using language tokenizers on FHIR JSON data is inefficient, producing several hundred text tokens per resource. Patient records may contain several thousand resources, which overall exceeds the total number of tokens that most text transformers can handle. Additionally, discrete encoding of numeric and time data may not be appropriate for these continuous quantities. In this thesis, I design a tokenization method that operates on data using the open source Health Level 7 FHIR standard. This method takes JSON returned from a FHIR server query and assigns tokens to chunks of JSON, based on FHIR data structures. The FHIR tokens can be used to train transformer models, and the methodology to train FHIR transformer models on sequence classification and masked language modelling tasks is presented. The performance of this method on the open source MIMIC-IV FHIR dataset is validated for length-of-stay prediction (LOS) and mortality prediction (MP) tasks. In addition, I explore methods for encoding numerical and time-delta values using continuous vector encodings rather than assigning discrete tokens to values. I also explore using compression methods to reduce the long sequence lengths. Previous works using MIMIC-IV have reported their performance on the LOS and MP tasks using XGBoost models, which use bespoke feature encodings. The results show that the FHIR transformer can perform the LOS task better than an XGBoost model, but the transformer performs worse at the MP task. None of the continuous encoding methods perform significantly better than discrete encoding methods, but they are not worse either. Compression methods provide a performance improvement on long sequence lengths in both accuracy and inference speed. Since performance is task dependent, future research should validate the performance of this method on other datasets and tasks. MIMIC-IV is too small to see benefits of pre-training, but if a larger dataset can be obtained, the methodology developed in this work could be applied towards creating a large FHIR foundation models.Item Electrostatic MEMS Sensors: From Mechanism Discovery to Deployment in Liquid Media(University of Waterloo, 2025-04-28) Shama, YasserThis thesis presents a methodical investigation into the fundamental sensing mechanism of electrostatic MEMS sensors in gas and liquid media. It provides new insights into electrostatic MEMS sensing mechanisms that can improve the sensor design process by combining mass sorption and permittivity change to enhance the sensitivity of gas and liquid sensors. First, it compares among the responsivities of a set of MEMS isopropanol sensors. I found that functionalized static-mode sensors do not exhibit a measurable change in response due to added mass, whereas bare sensors showed a clear change in response to isopropanol vapor. Functionalized dynamic-mode sensors showed a measurable frequency shift due to the added mass of isopropanol vapor. The frequency shift increased by threefold in the presence of strong electrostatic fields. These results show that the sensing mechanism is a combination of a weaker added mass effect and a stronger permittivity effect and that electrostatic MEMS gas sensors are independent of the direction of the gravitational field and are, thus, robust to changes in alignment. It is erroneous to refer to them as `gravimetric' sensors. I investigated the repeatability of electrostatic MEMS sensors over prolonged excitations. The sensors were subjected to two test conditions: continuous frequency sweeps and long-term residence on a resonant branch beyond the cyclic-fold bifurcation. I found that prolonged high-amplitude oscillations undermine repeatability and cause significant shifts in the bifurcation location toward lower frequencies by building up plastic deformations that reduce the capacitive gap. Biased excitation waveforms were also found to lead to charge buildup within dielectrics, exacerbating the drift in frequency of the bifurcation point. In comparison, stiffer in-plane sensors with no metallization operating under unbiased waveforms showed dramatic improvement in repeatability. With a view to deployment of electrostatic MEMS sensors in liquid media, I studied the use of motion-induced current to detect their high frequency vibrations. While current and ground truth (optical) measurements aligned well at lower frequency resonances, current measurements showed valleys rather than peaks at high frequency resonances. The root cause was found to be current behavior switching from capacitive to inductive as the frequency crossed a resonance in the measurement circuit. It was also found that output current diminishes with increasing mode number. Finally, I found a measurable change beyond 10 MHz in the output current of a bare chip carrier when the analyte (mercury acetate) was introduced at the concentration of 100 ppm into deionized water, suggesting a potential for interference with inertial sensing. In the final phase of this work, the fundamental vibration mode of electrostatic MEMS sensors was used to detect 100 ppm of mercury acetate in deionized water. The sensors measured a consistent shift in the frequency and amplitude of the resonant peak. This demonstrates the viability of electrostatic MEMS sensors for underwater applications and the need for further work to improve their detection mechanisms.Item Enhancing Space Situational Awareness with AI and Optimization Techniques(University of Waterloo, 2025-04-24) Kazemi, SajjadAs space becomes increasingly congested and contested, ensuring the safe operation of satellites has emerged as a critical concern for both public and private sector stakeholders. The growing number of active satellites and space debris significantly increases the risk of collisions, making Space Situational Awareness (SSA) an essential capability for modern space operations. SSA aims to provide timely and accurate assessments of space objects’ trajectories to prevent collisions and maintain the long-term sustainability of space activities. Currently, SSA processes are heavily reliant on human operators who must analyze large volumes of data from multiple sources, identify high-priority risks, interpret and validate information, and ultimately make decisions regarding collision risks. While computational tools assist in these processes, the dependence on human judgment introduces limitations, including delays in decision-making and potential errors in critical assessments. Given the increasing complexity of the space environment, there is a pressing need for automated and data-driven approaches to enhance SSA capabilities. A fundamental challenge within SSA is orbit prediction—the ability to accurately forecast the future trajectories of space objects. However, precise trajectory estimation alone is not sufficient, as some scenarios require active collision avoidance maneuvers. In such cases, decision support systems must generate reliable and efficient maneuver plans to ensure satellites can safely adjust their orbits without unnecessary fuel expenditure or operational disruptions. This thesis addresses both orbit prediction and collision avoidance through a combination of machine learning and optimization techniques. First, a transformer-based deep learning model is trained using publicly available data to predict space object trajectories with high accuracy and computational efficiency. This approach leverages advances in sequence modeling to improve predictive performance in dynamic orbital environments. Next, Reinforcement Learning (RL) techniques are employed to develop an autonomous decision-making framework that generates optimized collision avoidance maneuvers for satellites. By learning from simulated interactions, the RL-based approach aims to provide adaptive and fuel-efficient avoidance strategies. Finally, a Sequential Convex Optimization (SCvx) approach is explored to solve the collision avoidance problem from a purely optimization-driven perspective without relying on data-driven models. This method ensures mathematically rigorous maneuver planning based on physical constraints and operational requirements. This work contributes to the advancement of SSA by enhancing the accuracy of orbit prediction and the reliability of collision avoidance strategies. Besides that, this work has the potential to improve automation in space traffic management, reducing reliance on human operators and increasing the resilience of satellite operations.Item Towards Decision Support and Automation for Safety Critical Ultrasonic Nondestructive Evaluation Data Analysis(University of Waterloo, 2025-04-16) Torenvliet, NicholasA set of machine learning techniques that provide decision support and automation to the analysis of data taken during ultrasonic non-destructive evaluation of Canada Deuterium Uranium reactor pressure tubes is proposed. Data analysis is carried out primarily to identify and characterizes the geometry of flaws or defects on the pressure tube inner diameter surface. A baseline approach utilizing a variational auto-encoder ranks data by likelihood and performs analysis using Nominal Profiling (NPROF), a novel technique that characterize the very likely nominal component of the dataset and determines variance from it. While effective, the baseline method expresses limitations, including sensitivity to outliers, challenged explainability, and the absence of a strong fault diagnosis and error remediation mechanism. To address these shortcomings, Diffusion Partition Consensus (DiffPaC), a novel method integrating Conditional Score-Based Diffusion with Savitzky-Golay Filters, is proposed. The approach includes a mechanism for outlier removal during training that reliably improves model performance. It also features strong explainability and, with a human in the loop, mechanisms for fault diagnosis and error correction. These features advance applicability in safety-critical contexts such as nuclear nondestructive evaluation. Methods are integrated and scaled to provide: (a) a principled probabilistic performance model, (b) enhanced explainability through interpretable outputs, (c) fault diagnosis and error correction with a human-in-the-loop, (e) independence from dataset curation and out-of-distribution generalization (f) strong preliminary results that meet accuracy requirements on dimensional estimates as specified by the regulator in \cite{cog2008inspection}. Though not directly comparable, the integrated set of methods makes many qualitative improvements upon prior work, which is largely based on discriminative methods or heuristics. And whose results rely on data annotation, pre-processing, parameter selection, and out of distribution generalization. In regard to these, the integrated set of fully learned data driven methods may be considered state of the art for applications in this niche context. The probabilistic model, and corroborating results, imply a principled basis underlying model behaviors and provide a means to interface with regulatory bodies seeking some justification for usage of novel methods in safety critical contexts. The process is largely autonomous, but may include a human in the loop for fail-safe analysis. The integrated methods make a significant step forward in applying machine learning in this safety-critical context. And provide a state-of-the-art proof of concept, or minimum viable product, upon which a new and fully refactored process for utility owner operators may be developed.Item Dynamics of Golf Discs using Trajectory Experiments for Parameter Identification and Model Validation(University of Waterloo, 2025-04-15) Turner, AdamThe trajectories of flying discs are heavily affected by their aerodynamics and can vary greatly. The growing sport of disc golf takes advantage of these variations, offering seemingly endless disc designs to use in a round. Despite the increasing popularity of disc golf, most manufacturers lack a scientific approach to disc design and instead use subjective assessments and inconsistent disc rating systems to characterize disc performance. This leads to more guess work for players. This thesis addresses this issue by presenting a physics-based disc trajectory model optimized using experimental trajectory data, and by exploring the possibility for a standardized disc rating system. A novel stereo-camera-based methodology was developed to capture three-dimensional initial conditions and trajectories of disc golf throws. This data was used to identify the aerodynamic coefficients of physics-based models. These models included six aerodynamic coefficients that depended on five independent variables. Disc wobble was included as a variable affecting the aerodynamic coefficients for the first time. Its effect on model performance was compared to simpler models, which excluded it. The models used various coefficient estimation methods for parameter identification, including polynomial functions and a recently proposed deep-learning approach. The deep-learning approach modelled some relationships with a neural network, which had the benefit of allowing the model to form the most appropriate relationships without relying on functional approximations. Polynomial functions were also used to augment a model that used coefficients previously determined from computational fluid dynamics. These approaches were validated using experimental trajectory data. The model using a mix of computational fluid dynamics data and polynomial functions showed significant improvement over the baseline computational fluid dynamics model. The complete polynomial approaches resulted in the best performing models and showed good agreement with the validation data. The neural network approaches mostly performed well, but could not beat the pure polynomial approaches. The incorporation of disc wobble as a variable affecting the aerodynamic coefficients showed a negligible improvement over the models that disregarded it. Further model improvement is unlikely without first addressing measurement errors in data collection, particularly pertaining to disc attitude, which is the disc plane's orientation relative to the global coordinate system. The possibility of a trajectory-based test standard for discs was also explored, highlighting the need to carefully choose standardized initial conditions to evaluate disc trajectories with a wide range of flight characteristics. Possible approaches for quantifying flight numbers were also discussed. Considerations for disc mass, initial spin ratio, and air density were also highlighted as these factors were shown to affect disc flight and can have implications for a testing standard. This research contributes to the growing work surrounding disc golf, by proposing a capture method for three-dimensional disc golf trajectories and validated physics-based disc trajectory models, and by exploring a standardized disc rating system. This work contributes to the understanding of disc behaviour for both manufacturers and players alike, and propels disc golf towards a more scientifically informed future.Item Toward Enhanced Sea Ice Parameter Estimation: Fusing Ice Surface Temperature with the AI4Arctic Dataset using Convolutional Neural Networks(University of Waterloo, 2025-04-14) de Loe, LilyArctic sea ice mapping is essential for supporting several key applications. These include facilitating safe marine navigation, providing accurate data for climate monitoring, and assisting efforts by remote northern communities to adapt to variable ice conditions. Automated mapping approaches can leverage an abundance of freely accessible satellite data, with the potential to supplement navigational ice charts, improve operational forecasting, and produce high-resolution estimates of sea ice parameters. However, current approaches rely on synthetic aperture radar (SAR) and passive microwave (PM) data, which can struggle to distinguish ice features due to ambiguous textures, atmospheric effects, and sensor limitations. This thesis explores the potential for thermal-infrared data to improve estimates of sea ice concentration, stage of development, and floe size produced by multi-task deep learning architectures. Work builds on the recent AI4Arctic dataset, which combines Sentinel-1 SAR, AMSR2 brightness temperature, ERA-5 reanalysis data, and ice charts to enhance deep learning-based mapping approaches. VIIRS ice surface temperature (IST) is investigated for its potential to improve predictions in regions where SAR and PM measurements are challenging to interpret. A VIIRS-AI4Arctic dataset is developed, which consists of 84 scenes, and demonstrates overlap between VIIRS, Sentinel-1, and AMSR2 products. Three variations on the U-Net architecture are introduced, which incorporate IST features at the input- and feature-levels. These models are evaluated against the winning AI4EO AutoICE Challenge architecture, which acts as an AI4Arctic baseline. A SIC accuracy metric is introduced to provide an additional assessment of model performance. Results demonstrate that models incorporating IST consistently reduce classification errors across all three tasks, particularly when identifying open water under conditions with low-incidence angle (SAR), high atmospheric moisture (PM), and wind roughening (SAR and PM). A single, shared decoder improves contextual awareness, although multi-decoder architectures effectively reconstruct task-specific features. The DEU-Net-V architecture, which learns IST features separately from AI4Arctic channels, is most effective at mitigating ambiguity introduced by SAR and PM data. Finally, estimation of aleatoric uncertainty yields heightened variance in marginal ice zones, highlighting potential discrepancies between ice chart labels and pixel-level conditions, and demonstrating the value of quantifying uncertainty from observation noise. IST ultimately enhances sea ice classification, but is limited by cloud contamination and the resolution of current products. These findings support the continued development of deep learning approaches incorporating IST, and highlight the potential for next-generation thermal-infrared instruments to further improve automated sea ice mapping.Item Advancing Photometric Odometry to Dense Volumetric Simultaneous Localization and Mapping(University of Waterloo, 2025-03-25) Hu, Yan Song; Zelek, JohnNavigating complex environments remains a fundamental challenge in robotics. At the core of this challenge is Simultaneous Localization and Mapping (SLAM), the process of creating a map of the environment while simultaneously using that map for navigation. SLAM is essential for mobile robotics because effective navigation is a prerequisite for nearly all real-world robotic applications. Visual SLAM, which relies solely on the input of RGB cameras is important because of the accessibility of cameras, which makes it an ideal solution for widespread robotic deployment. Recent advances in graphics have driven innovation in the visual SLAM domain. Techniques like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) enable the rapid generation of dense volumetric scenes from RGB images. Researchers have integrated these radiance field techniques into SLAM to address a key limitation of traditional systems. Although traditional SLAM excels at localization, the generated maps are often unsuitable for broader robotics applications. By incorporating radiance fields, SLAM systems have the potential for the real-time creation of volumetric metric-semantic maps, offering substantial benefits for robotics. However, current radiance field-based SLAM approaches face challenges, particularly in processing speed and map reconstruction quality. This work introduces a solution that addresses limitations in current radiance fields SLAM systems. Direct SLAM, a traditional SLAM technique, shares key operational similarities with radiance field approaches that suggest potential synergies between the two systems. Both methods rely on photometric loss optimization, where the pixel differences between images guide the optimization process. This work demonstrates that the benefits of combining these complementary techniques extend beyond theory. This work demonstrates the synergy between radiance field techniques and direct SLAM through a novel system that combines 3DGS with direct SLAM, achieving a superior combination of quality, memory efficiency, and speed compared to existing approaches. The system, named MGSO, addresses a challenge in current 3DGS SLAM systems: Initializing 3D Gaussians while performing SLAM simultaneously. The proposed approach leverages direct SLAM to produce dense and structured point clouds for 3DGS initialization. This results in faster optimization, memory compactness, and higher-quality maps even with mobile hardware. These results demonstrate that traditional direct SLAM techniques can be effectively integrated with radiance field representations, opening avenues for future research.Item Transformer-based Point Cloud Processing and Analysis for LiDAR Remote Sensing(University of Waterloo, 2025-03-24) Lu, Dening; Li, Jonathan; Xu, LinlinThe processing and analysis of Light Detection and Ranging (LiDAR) point cloud data, a fundamental task in Three-Dimensional (3D) computer vision, is essential for a wide range of remote sensing applications. However, the disorder, sparsity, and uneven spatial distribution of LiDAR point clouds pose significant challenges to effective and efficient processing. In recent years, Transformers have demonstrated notable advantages over traditional deep learning methods in computer vision, yet designing Transformer-based frameworks tailored to point clouds remains an underexplored topic. This thesis investigates the potential of Transformer models for accurate and efficient LiDAR point cloud processing. Firstly, a 3D Global-Local (GLocal) Transformer Network (3DGTN) is introduced to capture both local and global context, thereby enhancing model accuracy for LiDAR data. This design not only ensures a comprehensive understanding of point cloud characteristics but also establishes a foundation for subsequent efficient Transformer frameworks. Secondly, a fast point Transformer network with Dynamic Token Aggregation (DTA-Former) is proposed to improve model speed. By optimizing point sampling, grouping, and reconstruction, DTA-Former substantially reduces the time complexity of 3DGTN while retaining its strong accuracy. Finally, to further reduce time and space complexity, a 3D Learnable Supertoken Transformer (3DLST) is presented. Building on DTA-Former, 3DLST employs a novel supertoken clustering strategy that lowers computational overhead and memory consumption, achieving state-of-the-art performance across multi-source LiDAR point cloud tasks in terms of both accuracy and efficiency. These Transformer-based frameworks contribute to more robust and scalable LiDAR point cloud processing solutions, supporting diverse remote sensing applications such as urban planning, environmental monitoring, and autonomous navigation. By enabling efficient yet high-accuracy analysis of large-scale 3D data, this work fosters further research and innovation in LiDAR remote sensing.Item Multi-Object Tracking using Mamba and an Investigation into Data Association Strategies(University of Waterloo, 2025-03-19) Khanna, Dheraj; Zelek, JohnMulti-Object Tracking (MOT) is a critical component of computer vision, with applications spanning autonomous driving, video surveillance, sports analytics, and more. Despite significant advancements in tracking algorithms and computational power, challenges such as maintaining long-term identity associations, handling dynamic object counts, managing irregular movements, and mitigating occlusions persist, particularly in complex and dynamic environments. This research addresses these challenges by proposing a learning-based motion model that leverages past trajectories to improve motion prediction and object re-identification, and we also investigate how to maximize the performance of trackers with data association. Inspired by recent advancements in state-space models (SSMs), particularly Mamba, we propose a novel learning-based architecture for motion prediction that combines the strengths of Mamba and self-attention layers to effectively capture non-linear motion patterns within the Tracking-By-Detection (TBD) paradigm. Mamba's input-dependent sequence modeling capabilities enable efficient and robust handling of long-range temporal dependencies, making it well for complex motion prediction tasks. Building on this foundation, we explore hybrid data association strategies to improve object tracking robustness, particularly in scenarios with occlusions and identity switches. By integrating stronger cues such as Intersection over Union (IoU) for spatial consistency and Re-Identification (Re-ID) for appearance-based matching, we enhance the reliability of object associations across frames, reducing errors in long-term tracking. Fast motion and partial overlaps often lead to identity mismatches in object tracking. Traditionally, spatial association relies on IoU, which can struggle in such scenarios. To address this, we enhance the cost matrix by incorporating Height-based IoU to handle partial overlaps more effectively. Additionally, we extend the original bounding boxes with a buffer to account for fast motion, thereby improving the robustness and accuracy of the spatial association process. Additionally, we study the impact of dynamically updating the feature bank for Re-ID during the matching stage, culminating in a refined weighted cost matrix. To further address challenges in identity switching and trajectory consistency, we introduce the concept of virtual detections in overlapping scenarios and explore its effectiveness in mitigating ID switches. Developing a robust and accurate MOT tracker demands a critical interplay between accurate motion modeling and a sophisticated combination of stronger and weaker cues in data association. Through extensive experimental evaluations on challenging benchmarks such as DanceTrack and SportsMOT, the proposed approaches achieve significant performance gains, with HOTA scores of 63.16% and 77.26% respectively, surpassing multiple existing state-of-the-art methods. Notably, our approach outperforms DiffMOT by 0.9% on DanceTrack and 0.06% on SportsMOT, while achieving 3- 7% improvements over other learning-based motion models. This work contributes to advancing MOT systems capable of achieving high performance across diverse and demanding scenarios.Item Deployment of Piezoelectric Disks in Sensing Applications(University of Waterloo, 2025-02-12) Abdelrahman, Mohamed; Abdel-rahman, Eihab; Yavuz, MustafaMicro-electromechanical Systems (MEMS) have revolutionized the way we approach sensing and actuation, offering benefits like low power usage, high sensitivity, and cost efficiency. These systems rely on various sensing mechanisms such as electrostatic, piezoresistive, thermal, electromagnetic, and piezoelectric principles. This thesis focuses on piezoelectric sensors, which stand out due to their ability to generate electrical signals without needing an external power source. Their compact size and remarkable sensitivity make them highly attractive. However, they’re not without challenges—their performance can be affected by temperature changes, and they can’t measure static forces. These limitations call for advanced signal processing and compensation techniques. Piezoelectric sensors, which operate based on the direct and inverse piezoelectric effects, find use in a wide range of applications, from measuring force and acceleration to detecting gases. This research zooms in on two key applications of piezoelectric sensors: force sensing and gas detection. For force sensing, the study focuses on developing smart shims that measure forces between mechanical components, which helps prevent structural failures. The experimental setup includes an electrodynamic shaker, a controller, and custom components like a glass wafer read-out circuit and a 3D-printed shim holder. During tests, the system underwent a frequency sweep from 10 Hz to 500 Hz, and a resonance was detected at about 360 Hz, matching the structural resonance. Some inconsistencies in the sensor’s output were traced back to uneven machining of the shim’s holes and variations in circuit attachment. To address these issues, the study suggests improving the machining process and redesigning the shim holder for better circuit alignment. Future work will include testing for bending moments, shear forces, and introducing a universal joint in the design to study moment applications more effectively. On the gas sensing side, the research examines a piezoelectric disk with a Silver- Palladium electrode for detecting methane. Using the inverse piezoelectric effect, the sensor’s natural frequency was found to be around 445 kHz. When coated with a sensitive material—PANI doped with ZnO—the disk exhibited a frequency shift of 2.538 kHz, indicating successful methane detection. The setup for this experiment included a gas chamber with precise control over gas flow and displacement measurements. Interestingly, after methane was replaced with nitrogen, the natural frequency returned to its original value, demonstrating the sensor’s reversible detection capability. Future research will expand to test other gases and sensitive materials, broadening the scope of applications. In summary, this thesis pushes the boundaries of piezoelectric MEMS sensors by tackling key design and performance challenges. Through detailed experimental methods, results, and suggested improvements, it lays a solid foundation for further research aimed at enhancing the reliability and versatility of piezoelectric sensors in real-world applications.Item Examining Computer-Generated Aeronautical English Accent Testing and Training(University of Waterloo, 2025-02-04) Seong, Hyun Su; Cao, Shi; Kearns, SuzanneObjective: This thesis focused on the persisting problem of language-related issues, in pilot-air traffic controller (ATC) communication, particularly in relation to foreign accents interfering with pilots’ understanding. It examined the effect of foreign accents embedded in human and computer voice (HV, CV), as well as demographic background on the level of understanding of the participants. Background: Studies focusing on the impacts of foreign accents in Aviation English (AE) are scant. Accents have been identified as one of the main contributors to miscommunication between pilot-ATC radiotelephony communication in the air, thereby endangering flight safety. It is necessary to examine how to train ab initio and returning pilots on extracting accurate meanings from an accented instruction coming from ATCs. This thesis introduces a Text-to-Speech (TTS) supported by artificial intelligence for such training. Method: Multiple studies (a total of six) were conducted: 2 literature reviews, 4 empirical studies. For the empirical studies, 50 participants from the University of Waterloo who had experiences with flight or had experience in listening to pilot-ATC communications were recruited. They were put into two Voice Groups (HV and CV) one of which played only human voices and the other TTS. They completed two rounds (Round 1 and 2) of listening tests that contained both Aviation Script (AS; scripts read in foreign accents that were related to aviation context) and Neutral Scripts (NS; non-aviation scripts read in foreign accents with no contextual background). The foreign accents used in the listening tests along with native-accented English were three of the ICAO’s main languages: Arabic, Spanish, and French. Scores were analyzed according to the Script Types (NS, AS), Accents (Arabic, Spanish, French), Rounds (1 and 2), and Demographic Profiles (Age, Gender, Years of Speaking English, Flight Hours, Flight Ratings, Language Background, Familiarity with Arabic, Spanish, French, and Aviation English). Results: For the empirical studies, in the HV group, participants improved their scores from round 1 to 2 in the AS portion of the tests. In the CV group, participants improved their scores in NS. Examination of demographic information showed that non-native English speakers (NNES) tended to perform more poorly on average than native English speakers (NES). Being familiar with Aviation English was beneficial for completing listening tests. Also, having a higher flight rating was beneficial. Having more years of speaking English was only partially advantageous. Post survey results were analyzed, and it was found that participants in the CV group found the speech mostly unnatural. Those in the HV group also expressed difficulty in understanding due to accents but mentioned that the speech was clear, and scripts were representative of real-life pilot-ATC communication. Participants expressed foreign accents interfered with their process of logical deduction when choosing answers on the tests. Participants – regardless of whether they belonged to the HV or CV group – found NS difficult and challenging due to lacking contexts when answering questions on the tests. For AS, participants were able to piece together information using contextual knowledge related to aviation. Conclusion: Accents do interfere with pilots’ understanding in radiotelephony communication by making extracting content challenging, which in turn makes interpreting messages or instruction difficult. This is an important finding as it will affect situational awareness to a certain extent when making decisions on the fly. Pilots have to multi-task whenever possible to keep the passengers safe and to find the best route to get to a destination that maximizes fuel efficiency but minimizes passenger wait times. Communication plays a large role in deciding the fate of an aircraft’s journey. In this logic, accents can be said to be at the core of this overarching issue with language in the context of aviation. Therefore, training with a new technology such as TTS, along with other educational resources, could confer a valuable experience and exposure to pilots who are either beginning or re-starting their language training.