Systems Design Engineering

Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914

This is the collection for the University of Waterloo's Department of Systems Design Engineering.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Recent Submissions

Now showing 1 - 20 of 776
  • Item
    Advancing Photometric Odometry to Dense Volumetric Simultaneous Localization and Mapping
    (University of Waterloo, 2025-03-25) Hu, Yan Song; Zelek, John
    Navigating complex environments remains a fundamental challenge in robotics. At the core of this challenge is Simultaneous Localization and Mapping (SLAM), the process of creating a map of the environment while simultaneously using that map for navigation. SLAM is essential for mobile robotics because effective navigation is a prerequisite for nearly all real-world robotic applications. Visual SLAM, which relies solely on the input of RGB cameras is important because of the accessibility of cameras, which makes it an ideal solution for widespread robotic deployment. Recent advances in graphics have driven innovation in the visual SLAM domain. Techniques like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) enable the rapid generation of dense volumetric scenes from RGB images. Researchers have integrated these radiance field techniques into SLAM to address a key limitation of traditional systems. Although traditional SLAM excels at localization, the generated maps are often unsuitable for broader robotics applications. By incorporating radiance fields, SLAM systems have the potential for the real-time creation of volumetric metric-semantic maps, offering substantial benefits for robotics. However, current radiance field-based SLAM approaches face challenges, particularly in processing speed and map reconstruction quality. This work introduces a solution that addresses limitations in current radiance fields SLAM systems. Direct SLAM, a traditional SLAM technique, shares key operational similarities with radiance field approaches that suggest potential synergies between the two systems. Both methods rely on photometric loss optimization, where the pixel differences between images guide the optimization process. This work demonstrates that the benefits of combining these complementary techniques extend beyond theory. This work demonstrates the synergy between radiance field techniques and direct SLAM through a novel system that combines 3DGS with direct SLAM, achieving a superior combination of quality, memory efficiency, and speed compared to existing approaches. The system, named MGSO, addresses a challenge in current 3DGS SLAM systems: Initializing 3D Gaussians while performing SLAM simultaneously. The proposed approach leverages direct SLAM to produce dense and structured point clouds for 3DGS initialization. This results in faster optimization, memory compactness, and higher-quality maps even with mobile hardware. These results demonstrate that traditional direct SLAM techniques can be effectively integrated with radiance field representations, opening avenues for future research.
  • Item
    Transformer-based Point Cloud Processing and Analysis for LiDAR Remote Sensing
    (University of Waterloo, 2025-03-24) Lu, Dening; Li, Jonathan; Xu, Linlin
    The processing and analysis of Light Detection and Ranging (LiDAR) point cloud data, a fundamental task in Three-Dimensional (3D) computer vision, is essential for a wide range of remote sensing applications. However, the disorder, sparsity, and uneven spatial distribution of LiDAR point clouds pose significant challenges to effective and efficient processing. In recent years, Transformers have demonstrated notable advantages over traditional deep learning methods in computer vision, yet designing Transformer-based frameworks tailored to point clouds remains an underexplored topic. This thesis investigates the potential of Transformer models for accurate and efficient LiDAR point cloud processing. Firstly, a 3D Global-Local (GLocal) Transformer Network (3DGTN) is introduced to capture both local and global context, thereby enhancing model accuracy for LiDAR data. This design not only ensures a comprehensive understanding of point cloud characteristics but also establishes a foundation for subsequent efficient Transformer frameworks. Secondly, a fast point Transformer network with Dynamic Token Aggregation (DTA-Former) is proposed to improve model speed. By optimizing point sampling, grouping, and reconstruction, DTA-Former substantially reduces the time complexity of 3DGTN while retaining its strong accuracy. Finally, to further reduce time and space complexity, a 3D Learnable Supertoken Transformer (3DLST) is presented. Building on DTA-Former, 3DLST employs a novel supertoken clustering strategy that lowers computational overhead and memory consumption, achieving state-of-the-art performance across multi-source LiDAR point cloud tasks in terms of both accuracy and efficiency. These Transformer-based frameworks contribute to more robust and scalable LiDAR point cloud processing solutions, supporting diverse remote sensing applications such as urban planning, environmental monitoring, and autonomous navigation. By enabling efficient yet high-accuracy analysis of large-scale 3D data, this work fosters further research and innovation in LiDAR remote sensing.
  • Item
    Multi-Object Tracking using Mamba and an Investigation into Data Association Strategies
    (University of Waterloo, 2025-03-19) Khanna, Dheraj; Zelek, John
    Multi-Object Tracking (MOT) is a critical component of computer vision, with applications spanning autonomous driving, video surveillance, sports analytics, and more. Despite significant advancements in tracking algorithms and computational power, challenges such as maintaining long-term identity associations, handling dynamic object counts, managing irregular movements, and mitigating occlusions persist, particularly in complex and dynamic environments. This research addresses these challenges by proposing a learning-based motion model that leverages past trajectories to improve motion prediction and object re-identification, and we also investigate how to maximize the performance of trackers with data association. Inspired by recent advancements in state-space models (SSMs), particularly Mamba, we propose a novel learning-based architecture for motion prediction that combines the strengths of Mamba and self-attention layers to effectively capture non-linear motion patterns within the Tracking-By-Detection (TBD) paradigm. Mamba's input-dependent sequence modeling capabilities enable efficient and robust handling of long-range temporal dependencies, making it well for complex motion prediction tasks. Building on this foundation, we explore hybrid data association strategies to improve object tracking robustness, particularly in scenarios with occlusions and identity switches. By integrating stronger cues such as Intersection over Union (IoU) for spatial consistency and Re-Identification (Re-ID) for appearance-based matching, we enhance the reliability of object associations across frames, reducing errors in long-term tracking. Fast motion and partial overlaps often lead to identity mismatches in object tracking. Traditionally, spatial association relies on IoU, which can struggle in such scenarios. To address this, we enhance the cost matrix by incorporating Height-based IoU to handle partial overlaps more effectively. Additionally, we extend the original bounding boxes with a buffer to account for fast motion, thereby improving the robustness and accuracy of the spatial association process. Additionally, we study the impact of dynamically updating the feature bank for Re-ID during the matching stage, culminating in a refined weighted cost matrix. To further address challenges in identity switching and trajectory consistency, we introduce the concept of virtual detections in overlapping scenarios and explore its effectiveness in mitigating ID switches. Developing a robust and accurate MOT tracker demands a critical interplay between accurate motion modeling and a sophisticated combination of stronger and weaker cues in data association. Through extensive experimental evaluations on challenging benchmarks such as DanceTrack and SportsMOT, the proposed approaches achieve significant performance gains, with HOTA scores of 63.16% and 77.26% respectively, surpassing multiple existing state-of-the-art methods. Notably, our approach outperforms DiffMOT by 0.9% on DanceTrack and 0.06% on SportsMOT, while achieving 3- 7% improvements over other learning-based motion models. This work contributes to advancing MOT systems capable of achieving high performance across diverse and demanding scenarios.
  • Item
    Deployment of Piezoelectric Disks in Sensing Applications
    (University of Waterloo, 2025-02-12) Abdelrahman, Mohamed; Abdel-rahman, Eihab; Yavuz, Mustafa
    Micro-electromechanical Systems (MEMS) have revolutionized the way we approach sensing and actuation, offering benefits like low power usage, high sensitivity, and cost efficiency. These systems rely on various sensing mechanisms such as electrostatic, piezoresistive, thermal, electromagnetic, and piezoelectric principles. This thesis focuses on piezoelectric sensors, which stand out due to their ability to generate electrical signals without needing an external power source. Their compact size and remarkable sensitivity make them highly attractive. However, they’re not without challenges—their performance can be affected by temperature changes, and they can’t measure static forces. These limitations call for advanced signal processing and compensation techniques. Piezoelectric sensors, which operate based on the direct and inverse piezoelectric effects, find use in a wide range of applications, from measuring force and acceleration to detecting gases. This research zooms in on two key applications of piezoelectric sensors: force sensing and gas detection. For force sensing, the study focuses on developing smart shims that measure forces between mechanical components, which helps prevent structural failures. The experimental setup includes an electrodynamic shaker, a controller, and custom components like a glass wafer read-out circuit and a 3D-printed shim holder. During tests, the system underwent a frequency sweep from 10 Hz to 500 Hz, and a resonance was detected at about 360 Hz, matching the structural resonance. Some inconsistencies in the sensor’s output were traced back to uneven machining of the shim’s holes and variations in circuit attachment. To address these issues, the study suggests improving the machining process and redesigning the shim holder for better circuit alignment. Future work will include testing for bending moments, shear forces, and introducing a universal joint in the design to study moment applications more effectively. On the gas sensing side, the research examines a piezoelectric disk with a Silver- Palladium electrode for detecting methane. Using the inverse piezoelectric effect, the sensor’s natural frequency was found to be around 445 kHz. When coated with a sensitive material—PANI doped with ZnO—the disk exhibited a frequency shift of 2.538 kHz, indicating successful methane detection. The setup for this experiment included a gas chamber with precise control over gas flow and displacement measurements. Interestingly, after methane was replaced with nitrogen, the natural frequency returned to its original value, demonstrating the sensor’s reversible detection capability. Future research will expand to test other gases and sensitive materials, broadening the scope of applications. In summary, this thesis pushes the boundaries of piezoelectric MEMS sensors by tackling key design and performance challenges. Through detailed experimental methods, results, and suggested improvements, it lays a solid foundation for further research aimed at enhancing the reliability and versatility of piezoelectric sensors in real-world applications.
  • Item
    Examining Computer-Generated Aeronautical English Accent Testing and Training
    (University of Waterloo, 2025-02-04) Seong, Hyun Su; Cao, Shi; Kearns, Suzanne
    Objective: This thesis focused on the persisting problem of language-related issues, in pilot-air traffic controller (ATC) communication, particularly in relation to foreign accents interfering with pilots’ understanding. It examined the effect of foreign accents embedded in human and computer voice (HV, CV), as well as demographic background on the level of understanding of the participants. Background: Studies focusing on the impacts of foreign accents in Aviation English (AE) are scant. Accents have been identified as one of the main contributors to miscommunication between pilot-ATC radiotelephony communication in the air, thereby endangering flight safety. It is necessary to examine how to train ab initio and returning pilots on extracting accurate meanings from an accented instruction coming from ATCs. This thesis introduces a Text-to-Speech (TTS) supported by artificial intelligence for such training. Method: Multiple studies (a total of six) were conducted: 2 literature reviews, 4 empirical studies. For the empirical studies, 50 participants from the University of Waterloo who had experiences with flight or had experience in listening to pilot-ATC communications were recruited. They were put into two Voice Groups (HV and CV) one of which played only human voices and the other TTS. They completed two rounds (Round 1 and 2) of listening tests that contained both Aviation Script (AS; scripts read in foreign accents that were related to aviation context) and Neutral Scripts (NS; non-aviation scripts read in foreign accents with no contextual background). The foreign accents used in the listening tests along with native-accented English were three of the ICAO’s main languages: Arabic, Spanish, and French. Scores were analyzed according to the Script Types (NS, AS), Accents (Arabic, Spanish, French), Rounds (1 and 2), and Demographic Profiles (Age, Gender, Years of Speaking English, Flight Hours, Flight Ratings, Language Background, Familiarity with Arabic, Spanish, French, and Aviation English). Results: For the empirical studies, in the HV group, participants improved their scores from round 1 to 2 in the AS portion of the tests. In the CV group, participants improved their scores in NS. Examination of demographic information showed that non-native English speakers (NNES) tended to perform more poorly on average than native English speakers (NES). Being familiar with Aviation English was beneficial for completing listening tests. Also, having a higher flight rating was beneficial. Having more years of speaking English was only partially advantageous. Post survey results were analyzed, and it was found that participants in the CV group found the speech mostly unnatural. Those in the HV group also expressed difficulty in understanding due to accents but mentioned that the speech was clear, and scripts were representative of real-life pilot-ATC communication. Participants expressed foreign accents interfered with their process of logical deduction when choosing answers on the tests. Participants – regardless of whether they belonged to the HV or CV group – found NS difficult and challenging due to lacking contexts when answering questions on the tests. For AS, participants were able to piece together information using contextual knowledge related to aviation. Conclusion: Accents do interfere with pilots’ understanding in radiotelephony communication by making extracting content challenging, which in turn makes interpreting messages or instruction difficult. This is an important finding as it will affect situational awareness to a certain extent when making decisions on the fly. Pilots have to multi-task whenever possible to keep the passengers safe and to find the best route to get to a destination that maximizes fuel efficiency but minimizes passenger wait times. Communication plays a large role in deciding the fate of an aircraft’s journey. In this logic, accents can be said to be at the core of this overarching issue with language in the context of aviation. Therefore, training with a new technology such as TTS, along with other educational resources, could confer a valuable experience and exposure to pilots who are either beginning or re-starting their language training.
  • Item
    Toward Adaptive and User-Centered Intelligent Vehicles: AI Models with Granular Classifications for Risk Detection, Cognitive Workload, and User Preferences
    (University of Waterloo, 2025-01-29) Lee, Hyowon; Samuel, Siby
    As artificial intelligence (AI) increasingly integrates into our transportation systems, intelligent vehicles have emerged as research topics. Many advancements aim to enhance both the safety and comfort of drivers and the reliability of intelligent vehicles. The main focus of my research is addressing and responding to the varying states and needs of drivers, which is essential for improving driver-vehicle interactions through user-centered design. To contribute to this evolving field, this thesis explores the use of physiological signals and eye-tracking data to decode user states, perceptions, and intentions. While existing studies mostly rely on binary classification models, these approaches are limited in capturing the full spectrum of user states and needs. Addressing this gap, my research focuses on developing AI-driven models with more granular classifications for cognitive workload, risk severity levels, and user preferences for self-driving behaviours. This thesis is structured into three core domains: collision risk detection, cognitive workload estimation, and perception of user preferences for self-driving behaviours. By integrating AI techniques with multi-modal physiological data, my studies develop ML (Machine Learning) models for the domains introduced above and achieve high performance of the ML models. Feature analytical techniques are employed to enhance model interpretability for a better understanding of features and to improve the model performance. These findings pave the way for a new paradigm of intelligent vehicles that are not only more adaptive but also more aligned with user needs and preferences. This research lays the groundwork for the future development of user-centered intelligent companion systems in vehicles, where adaptive, perceptive, and interactive vehicles can better meet the complex demands of their users.
  • Item
    Evaluating the Potential Environmental and Human Toxicity of Solvents Proposed for use in Post-Combustion Carbon Capture
    (University of Waterloo, 2025-01-28) Ghiasi, Fatima; Elkamel, Ali
    Carbon dioxide emitted by industrial activities is a growing concern due to the effects on global climate. For this reason, firms are being urged to lower their carbon footprint. Post combustion carbon capture is being explored as a method for the power and materials industries to decarbonize. The most mature technique of carbon capture is amine absorption. Different amines are being explored to potentially be used within post-combustion carbon capture units. Many biological molecules are amines, and amines that resemble them can disrupt biological processes, harming organisms. In addition, if an amine is soluble within lipids, it can persist within the food chain and cause long term toxic effects that are not immediately visible. 151 solvents were compared based on four properties: volatility, lipophilicity, mutagenicity, and neuroactivity. Machine learning models were trained to predict these values. Due to their hydrophilicity, amino acids were determined to have the lowest potential of causing environmental toxicity.
  • Item
    Investigating Technology Implementation in a Canadian Community Hospital
    (University of Waterloo, 2025-01-27) Allana, Sana; Burns, Catherine
    The integration of technology into healthcare has witnessed significant advancements. However, the widespread adoption of such technologies may not be uniformly positive. While highest levels of adoption are typically found in densely populated urban areas, community healthcare facilities face challenges due to insufficient resources, like infrastructure, funding, and specialized staff, exacerbated by their remote locations. This is cause for concern as community hospitals account for 90% of all hospitals in Canada. This reveals a major opportunity to improve technology adoption and implementation at community hospitals, to aid their existing challenges, increase equity in healthcare, and improve generalizability of healthcare technologies. This research aims to uncover the perceptions, expectations, cultural nuances, and barriers to technology adoption at a community-level hospital in Ontario, Canada. The study began with a contextual inquiry approach, incorporating semi-structured interviews and surveys. Data was collected from nine clinical and managerial staff members whose workflows were impacted by three pilot technology projects. The interviews aimed to explore staff expectations and experiences with how these pilot projects impacted their workflows, patient care, and the overall technology implementation process. The survey included demographic questions and items based on the Unified Theory of Acceptance and Use of Technology (UTAUT) model, designed to predict factors influencing technology acceptance. The pilot technologies included a discharge planning tool, a portable X-ray scanner, and a digital pathology tool. A thematic analysis of the qualitative data was conducted, followed by affinity mapping to identify overarching themes. The Functional Resonance Analysis Method (FRAM) was also used to understand and model the impact of integrating the pilot technologies into preexisting, variable workflows. Finally, survey results were analyzed using frequency distributions to identify trends and triangulate findings. Overall, most staff reported a high level of technology use in both their work and daily lives. They also acknowledged that technology breakdowns at the workplace were inevitable, often resulting in time-consuming, manual workarounds. As well, for all pilot projects, staff felt overburdened by the additional workload required to manage the pilots alongside their regular duties. However, despite these challenges, all staff expressed an appreciation for innovation and a strong willingness to try new tools to improve their work. The discharge planning and X-ray scanner tools did not integrate well into existing workflows or provide additional value. Both tools performed inconsistently and failed to meet expectations for streamlining processes, leading to reluctance and distrust among staff. Additionally, change management planning was insufficient for both tools, with staff experiencing abrupt workflow changes, limited training, and a lack of clarity on project timelines or statuses. As a result, neither tool was requested for purchase following pilot testing. Conversely, staff decided to purchase the digital pathology tool, despite the disruptions to existing workflows, as the perceived benefits to both staff and patient care outweighed these challenges. Staff were excited about the tool’s potential and engaged in close collaboration with the manufacturer and project team. Furthermore, change management was carefully planned, with a phased implementation approach. The pilot was also driven by strong advocacy from a pathologist, which ensured alignment with clinical needs. Based on these findings, several recommendations were uncovered to improve the technology implementation process. First, the challenges with change management highlight the need for better resource allocation. This includes providing sufficient time for introducing new tools, clearly explaining the reasons for their selection, offering personalized training that covers tool usage, troubleshooting, and its impact on existing processes, and ensuring staff have the necessary bandwidth to manage change without disrupting daily operations. Second, communication channels should be improved. Startup companies should collaborate closely with the hospital during the development and testing phases to better understand staff needs and workflows, while also providing tailored support throughout the implementation process. Additionally, communication with hospital leadership must be strengthened to secure strong support, allocate resources effectively, and incorporate feedback on the challenges staff encounter, fostering a more collaborative environment that is better equipped to drive innovation. Finally, it is crucial to define and share specific success metrics for pilot projects. These metrics will help staff assess the technology's impact, make informed decisions about its use, evaluate the implementation process, identify lessons learned, and pinpoint areas for improvement, all of which can refine future technology adoption strategies. Overall, technology implementation and adoption are influenced by a variety of factors, which are further compounded by the high workload, staffing shortages, and unpredictable environments commonly found in community hospitals. By addressing these recommendations, health organizations can enhance the adoption and effectiveness of new technologies, ultimately improving staff workflows and patient care.
  • Item
    Multi-Wavelength in vivo Photon Absorption Remote Sensing: Towards Non-Contact Label-Free Functional Vascular Imaging
    (University of Waterloo, 2025-01-23) Werezak, Sarah; Haji Reza, Parsin
    Blood oxygen saturation (SO2) is an important functional metric in the diagnosis and monitoring of blinding eye diseases and cancer. Additionally, SO2 imaging has high value in illustrating changes in blood oxygenation within a vascular network, particularity when changes are demonstrated within the context of surrounding biological structures. This has promising potential to provide valuable information to researchers and clinicians on the mechanisms of disease progression and the efficacy of treatment. Various techniques have been explored for SO2 imaging, however limitations of inaccuracy in measurement, a requirement of contact with the tissue and the reliance on exogenous labels have prevented the clinical adoption of these approaches. Photon absorption remote sensing (PARS) is a novel imaging technique that is label-free, non-contact and absorption-based. When a photon is absorbed by a biomolecule, energy can be released through radiative or non-radiative relaxation. Most imaging modalities are limited to capturing one form of relaxation contrast, however PARS is capable of capturing both simultaneously. The unique PARS approach has promising potential as an SO2 imaging modality. This thesis explores work which furthers efforts towards accurate, non-contact, label-free SO2 imaging using PARS. First, system developments are implemented to demonstrate the first multi-wavelength in-vivo PARS system. The use of independent excitation paths, power compensation, and the improvement of the secondary excitation generation enables the reliable and consistent in-vivo multi-wavelength PARS imaging of chicken embryo vasculature. Additionally, the power compensation of incident excitation pulses is critical for quantitative SO2 measurements to ensure that measured SO2 is not impacted by power variations in the excitation source. This is followed by the development of techniques for in-vitro phantom studies. A blood oxygenation and deoxygenation protocol is developed and tested, enabling the time-efficient and low-cost preparation of blood samples at various oxygenation levels. Additionally, a flow phantom is developed with a 50 micrometer channel which successfully enables PARS signal to be captured from blood in an in-vitro flow phantom. This experimental setup was unable to demonstrate a change in PARS signal across various blood samples at differing oxygenation levels. Simulation is used to demonstrate that the blood preparation and samples are not the cause of the unsuccessful result. This result is determined to be a consequence of the flow phantom design. The knowledge gained through the iterative design process provides valuable insight to guide future flow phantom developments. Finally, in-vivo experimentation of the multi-wavelength PARS system successfully demonstrated the variation in blood oxygenation during the hypoxia and recovery of a chicken embryo. The hypoxia holder was designed to modulate the ambient oxygen inside the holder and induce states of hypoxia and recovery. This highlights the success of the PARS multi-wavelength system in demonstrating a relative change in SO2 in-vivo. The presented work furthers efforts towards accurate, non-contact, label-free PARS SO2 imaging through the development of the first multi-wavelength in-vivo PARS system, in-vitro blood and flow phantom developments and the in-vivo demonstration of relative change in SO2 measured using PARS.
  • Item
    Towards Humanoids Operating Mobility Devices Designed for Humans
    (University of Waterloo, 2025-01-22) Rajendran, Vidyasagar; Mombaur, Katja
    Humanoid robotics is advancing rapidly, with significant potential to address challenges in disaster recovery, manufacturing, and healthcare. Despite progress, current humanoid capabilities remain limited, particularly in terms of efficient mobility over long distances. Integrating humanoid robots with personal transporters (PTs) like Segways, offers a promising solution, enabling them to operate more efficiently in human-centric environments such as factories, malls, and airports. This approach not only preserves the humanoid's ability to navigate complex, uneven terrain with its legs but also enhances versatility, allowing for faster, more energy-efficient movement on flat surfaces. This thesis explores methods for enabling bipedal humanoids to operate PTs, focusing on the REEM-C humanoid riding a Segway x2 SE. The research begins by analyzing human interactions with Segways to reverse-engineer their internal controllers, leading to a high-fidelity simulation model. This model informs the development of control algorithms for the REEM-C, enabling successful simulation-based demonstrations of humanoid-driven Segway motions, including translational, rotational, and mixed maneuvers. Building on this, balance stabilization strategies are devised for actuated balance boards, addressing both frontal and sagittal plane control through an integration of admittance control strategies. A comprehensive analysis of bimanual manipulation is also conducted, emphasizing manipulability and stability within a constrained workspace. Using a combined manipulability-stability metric, collision-free bimanual trajectories are generated, demonstrating improved stability during dynamic tasks such as manipulating objects of varying shapes and masses. This analysis underpins the implementation of bimanual manipulation strategies needed for operating the Segway’s LeanSteer handlebar. The final contribution consolidates all findings, presenting a whole-body control strategy that enables the REEM-C to ride a Segway safely and effectively. A stack-of-tasks quadratic program is utilized to ensure stability, balance, and bimanual control in dynamic conditions. Experimental validation demonstrates the feasibility of this approach, showcasing the REEM-C’s ability to operate a Segway under real-world conditions. This research provides a step towards more versatile and adaptable humanoid mobility solutions for everyday human environments.
  • Item
    Broadcast is all you need: Robust Multiplayer Tracking in Ice Hockey using Monocular Videos
    (University of Waterloo, 2025-01-22) Prakash, Harish; Clausi, David; Zelek, John
    MOT in ice hockey pursues the combined task of detecting and associating players across a given sequence to maintain their identities. Tracking players in sports using monocular broadcast videos is an important computer vision problem that enables several downstream analytics and enhances viewership experience. However, existing tracking approaches encounter significant challenges in dealing with occlusions, blurs, camera pan-tilt-zoom effects, and dynamic player movements prevalent in telecast feeds. These challenges are further exacerbated in fast-paced sports such as ice hockey, where existing trackers struggle to maintain identity consistency due to players' sudden, non-linear motion patterns. In this thesis, acknowledging the fundamental role of quality datasets, we first present two hockey tracking datasets: our previously developed HTD-1 and a newly curated, open-source dataset called HTD-2, annotated from broadcast NHL games. Based on this new dataset, we establish a reference benchmark by evaluating six SOTA tracking methods to enable performance comparisons in hockey MOT. A detailed study is conducted for each algorithm to understand their merits and drawbacks on tracking players. Next, to address the present limitations, we propose a novel tracking model formulating MOT as a bipartite graph matching problem cued with homography inputs. Specifically, we disambiguate the positional representation of occluded players as viewed through broadcast footage, by warping them onto a view-invariant overhead rink template and encode their transformations into the graph message passing network. This ensures reliable spatial context for identity-preserved track prediction. Experimental results demonstrate that our model achieves a 10 times reduction in IDsw and a 32.45% improvement in IDF1 score compared to the existing baseline on HTD-1, establishing a new SOTA. The proposed model also exhibits strong generalization capabilities, achieving 92.8% IDF1 and only 60 IDsw during cross-validation on HTD-2. Finally, ablation studies are presented to validate our performance and substantiate our approach.
  • Item
    A Real-Time Autonomous Path Planning Framework for Space Satellites Using Improved Interfered Fluid Dynamic System (IFDS)
    (University of Waterloo, 2025-01-03) Patel, Aditya Hetalkumar; Lashgarian Azad, Nasser; Scott, Andrea
    In the vast expanse of space, a critical challenge threatens the sustainability of satellite operations and future exploration: space debris. The accumulation of inactive satellites and small debris has elevated the risk of cascading collisions, known as the Kessler Syndrome, which could render critical orbital paths unusable. This scenario would significantly impact our ability to deploy and maintain satellites essential for global communication, weather monitoring, navigation, and scientific research. Addressing the urgent need for advanced space traffic management solutions, this research proposes an autonomous satellite navigation system designed to optimize collision avoidance maneuvers and minimize fuel consumption, contributing to more sustainable space operations. Our system integrates the Interfered Fluid Dynamic System (IFDS) with Machine Learning (ML) models, leveraging real-time predictive capabilities to enhance satellite safety and reduce human intervention. Using the Nutcracker Optimization Algorithm (NOA), optimal parameters are generated to train the predictive model, enabling efficient dataset generation. XGBoost, trained on this dataset, is then employed within the IFDS framework to predict optimal collision-avoidance parameters in real time. This two-step approach enables satellites to autonomously adjust trajectories, maintaining safe distances from debris with minimal fuel consumption. XGBoost achieved an 92% success rate in predicting the optimal reaction parameter of the IFDS Algorithm such that the collision is avoided with a minimum of 2000 m, proving its effectiveness in dynamic orbital environments. Our work also compares NOA with Particle Swarm Optimization (PSO) for tuning IFDS parameters. Our results show NOA’s superior convergence rate and computational efficiency, reducing processing time by approximately 47% compared to PSO. This efficiency accelerates dataset generation and model training. Simulations were conducted using the orekit library to assess the system’s operational effectiveness. The IFDS algorithm, guided by XGBoost-predicted parameters, effectively executes preemptive collision avoidance maneuvers, achieving minimum fuel consumption while ensuring safe separation from debris up to one hour in advance of a potential collision. In conclusion, this research introduces a framework for autonomous satellite collision avoidance that enhances the safety and efficiency of space operations. By reducing reliance on ground intervention, conserving fuel, and enabling safe, independent navigation, this system supports more effective and scalable space traffic management, paving the way for future advancements in satellite operations.
  • Item
    Computational study of cellular adhesion in metastasis: Implications for Circulating Tumor Cell Arrest, Extravasation, and Thrombosis Formation
    (University of Waterloo, 2024-12-20) Rahmati, Nahid; Maftoon, Nima
    Cancer metastasis is the process by which cancer cells spread from the primary tumor to distant sites in the body, forming secondary tumors. This process is responsible for the majority of cancer-related deaths, despite significant advancements in treating primary tumors. This thesis aims to enhance the understanding of metastasis mechanisms by exploring the roles of circulating tumor cells (CTCs), ultra-large Von Willebrand Factor (UL-VWF) multimers, and blood vessel configurations. This study focuses on the mechanical, biochemical, and hemodynamic factors that drive metastatic processes and cancer-associated coagulopathies, providing insights into the interactions between CTCs, VWF, and endothelial cells. Through computational modeling and simulations, first, we investigate the role of UL-VWF multimers in cancer-associated thrombosis. The computational model integrates the lattice Boltzmann method for simulating blood flow, a coarse-grained model for deformable cells to capture their mechanical behavior, and the immersed boundary method to handle fluid-structure interactions. Additionally, an adhesion model was developed to simulate the binding dynamics between cells. This multi-scale approach allows for a detailed analysis of how UL-VWF multimers interact with blood cells to initiate microthrombus formation and progression. The findings reveal that UL-VWF plays a dual role in thrombosis and metastasis, enhancing platelet adhesion and trapping red blood cells, which can lead to significant changes in blood flow dynamics, such as reduced velocity and increased shear stress near thrombus sites, leading to a pressure drop of up to six times compared to healthy conditions. The study also explores the impact of blood vessel architecture on CTC dynamics, focusing on how vessel tortuosity influences CTC adhesion and extravasation. The same computational methodology has been utilized to analyze CTC interactions with the vessel wall, incorporating adhesion dynamics between the CTCs and the endothelial surface while considering the effect of shear rate on adhesion strength. The results indicate that curved vessels create asymmetrical flow patterns, resulting in variable shear stress, a 25% decrease in the wall shear stress in low-shear regions and a 58.5% increase in the high-shear region, that significantly affects CTC behavior. Specifically, high-shear regions in curved vessels show a threefold rise in adhesion bond formation compared to straight vessels, enhancing the likelihood of CTC extravasation. Increasing the tortuosity index of the vessel led to a 50% increase in maximum wall shear stress ratio and a 15.3% decrease in minimum wall shear stress ratio, as well as a 58% increase in the transit time of CTCs through the vessel curvature. The adhesion force in these high-shear regions increased by about 171%, indicating a significantly higher risk of CTC adhesion and extravasation in vessels with higher curvature. Additionally, while softer CTCs in low-shear regions showed a higher likelihood of detachment, stiffer cells in high-shear regions exhibited a reduction of approximately 12% in adhesion force compared to their behavior in straight vessels. This study identified an optimal range of cellular stiffness for successful CTC extravasation, challenging the assumption that softer cells always extravasate more efficiently. In this thesis, we also employed a stochastic model to analyze the dynamics of CTC adhesion, a crucial factor driving metastasis, incorporating parameter uncertainties in cell mechanical properties and adhesion characteristics. This probabilistic approach realistically captures the biological variability inherent in CTC behavior by accounting for a wide range of possible cell adhesion scenarios. Our analysis revealed that incorporating parameter variability, with a coefficient of variation of 20%, led to a maximum uncertainty of 12% in cell velocity. This variability manifested in two distinct CTC behaviors: either the cells detached from the vessel wall or continued to roll in a semi-stable manner, emphasizing the non-linear and complex nature of the adhesion dynamics. To efficiently manage computational demands, we developed a Random Forest surrogate model, achieving a high level of accuracy with a maximum error of 4.36% for velocity and 0.63% for stretch ratio. This model enabled comprehensive sensitivity analysis using Sobol' and E-FAST methods, which identified the bond spring constant and rupture strength as the most influential parameters following the initial adhesion phase, while cell membrane elasticity played a critical role during the initial adhesion. We also observed significant interdependencies between bond formation and rupture properties, underscoring their combined impact on CTC dynamics. Furthermore, machine learning techniques, particularly XGBoost, validated the model's predictive capabilities by achieving a classification accuracy of 95.62% and an area under the curve (AUC) value of 0.99 in distinguishing between 'rolling' and 'detached' CTC states. These findings highlight the importance of focusing on key parameter interactions to refine predictive models for metastasis. This comprehensive approach builds on the computational frameworks developed in this thesis, enhancing our understanding of metastasis by offering predictive insights into CTC behavior under different conditions. By integrating these findings into a cohesive framework, the thesis supports the development of more targeted therapeutic strategies to prevent or disrupt cancer progression.
  • Item
    A Unified and Hybrid Approach for Image-based Scene Change Detection and Pose-agnostic Object Anomaly Detection
    (University of Waterloo, 2024-12-17) Liu, Yizhe; Zelek, John
    Image-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control. This is an object-central task that seeks to find anomalies from pose-known query images of a tested object given a set of reference images of a standard anomaly-free object. There is also a similar task: Image-based Scene Change Detection which focuses on the differences in a scene instead of an object. Image-based Scene Change Detection is a critical task in mapping and monitoring a scene, seeking to find the semantic changes in a scene described by two sets of images (reference and query) captured at different timestamps. For those industrial detection tasks, image sensors are widely used for their ease of use and low cost to acquire. However, the most commonly used image sensors: RGB cameras are only capable of capturing 2D information in the form of an RGB image from a specific angle of an object or a scene. While, in the context of imaged-based anomaly detection and change detection, reference images and query images are often taken from different poses; and the poses of the query views can be unknown. As a result, reference images and query images cannot be compared easily. Recent learning-based methods, for example, OmniposeAD and SplatPose employ Novel View Synthesis (NVS) Methods, i.e., NeRF and Gaussian Splatting to bridge the gap by simultaneously localizing the query image with respect to the reference images and synthesizing pseudo reference images for the query views for direct pixel-to-pixel comparison. However, these learning-based methods suffer from long localization overhead during the inference stage because inversed Neural Radiance Field methods, e.g., INeRF, can take hundreds of gradient descent steps to localize and refine the poses. This paper introduces a hybrid approach SplatPose+ that maintains both a learning-based model (Gaussian Splatting) for NVS and a structure from motion (SfM) model (Hierarchical Localization) for localization, which takes advantage of the fast training and inference of 3D Gaussian Splatting and the fast localization of Hierarchical Localization. On the Image-based Pose-Agnostic 3D Anomaly Detection task, although our proposed pipeline requires the computation of an additional SfM model, it offers real-time inference speeds and faster training compared to SplatPose. Quality-wise, we achieve a new SOTA on the Pose-agnostic Anomaly Detection benchmark with the Multi-Pose Anomaly Detection (MAD-SIM) dataset. On the Image-based Scene Change Detection task, we achieve a higher IoU than previous supervised methods on the binary change detection sub-task without environment variations. Moreover, we demonstrate the potential of combining SAM2 with SplatPose+ to further refine the object-level change masks toward higher accuracy.
  • Item
    Synthesis of User Interfaces with Categorical Methods
    (University of Waterloo, 2024-12-16) Zheplinska, Marta; Nehaniv, Chrystopher
    We apply category theory to modeling user interfaces, focusing on the interaction between functional configuration and user perception. By representing user interfaces as directed labeled multigraphs and applying pullback constructions to the category of directed labeled multigraphs, the study formalizes interface structures in a way that encompasses both technical operations and users' perceptual capabilities. Users interact with interfaces by identifying affordances that hint at possible actions. We propose a perception-driven interpretation of the interface as a set of affordances available to the user. Conversely, the structure of the interface is formed by its functional components and presented in different states. This double strategy aims to provide a tool for studying the usability and unambiguity of interactive systems and analyzing how interfaces communicate functionality to users through structural design. A major component of this research is the study of user profiles, which define the relationship between the interface and the cognitive and physical characteristics of users. Profiles are encoded in the interface representation by graphs as certain filters of the pullback graph to match the human perception. This approach provides a basis for assessing usability for people with different cognitive and physical abilities. Ambiguity is the presence of multiple possible meanings, interpretations, or outcomes that can cause uncertainty. It affects the user's sense of control over the interface. The thesis examines ambiguity in user interfaces within a categorical formal representation. By applying the conditions to the pullback, the study proposes a formal method to detect the ambiguity of the interface from the perspective of a particular user profile, supported by provided case studies.
  • Item
    Electromyography-based Biometrics for Secure and Robust Personal Identification and Authentication
    (University of Waterloo, 2024-12-13) Pradhan, Ashirbad; Jiang, Ning; Tung, James
    Recently, electromyogram (EMG), the electrical activity of skeletal muscles, has been proposed as a novel biometric trait to address the limitations of current biometrics, such as fingerprint and facial recognition. A unique property of EMG as a biometric trait is that it allows for distinguishable patterns from different limb movements (e.g. hand gestures), enabling individuals to set personalized passwords comprising multiple gestures for dual-security systems, i.e., both biometric-level and password-level. This is fundamentally different from other physiological signals such as electrocardiogram (ECG) and electroencephalogram (EEG), which are highly difficult for the user to voluntarily control with sufficient precision. This unique advantage has facilitated EMG-based biometrics for two different applications: authentication, where a user can access personal devices, and identification, where the system determines the closest match within a database. To establish EMG as a novel biometric trait, the following two properties need to be thoroughly investigated: 1) the ability to accurately detect the genuine user from all the other users (uniqueness), and 2) retaining the biometric performance over multiple sessions and multiple days (robustness). The overarching aim of this PhD research is to investigate these properties by addressing a series of research questions in the following studies. In the first study (Chapter 3), the effect of EMG system parameters such as the feature extraction methods and the number of channels are investigated for improved biometric performance. Three robust feature extraction methods, Time-domain (TD), Frequency Division Technique (FDT), and Autoregressive (AR) features, and their combinations were investigated, while the number of channels varied from one to eight. The results showed that for all the feature extraction methods, the performance of a four-channel setup plateaued with a further increase in channels. For a four-channel system, the authentication performance resulted in an average equal error rate (EER) of 0.04 for TD features, 0.053 for FDT features, and 0.10 for AR features. The identification mode resulted in an average Rank-1 accuracy was 97% for TD features, 87.6% for FDT features, and 63.7% for AR features. Thus, combining the TD feature set and a four-channel EMG is recommended for optimal biometric performance. In the second study (Chapter 4), the dual-security property of EMG is facilitated by the development of a multi-code framework. Such a framework allows the combination of hand gestures to form an access code. In this study, three levels of fusion, score, rank, and decision were investigated for the two biometric applications. The biometric performance of the fusion schemes wasanalyzed while varying the codelength from one to six. For a codelength of four, the authentication EER was 0.006 using a decision-level fusion scheme using a weighted majority voting. For the identification mode, the score-level fusion scheme resulted in a Rank-2 accuracy of 99.9% for a codelength of four. The multi-code biometric system provided improved dualmode security based on the personalized codes and biometric traits of individuals. However, the above two studies and the majority of the current EMG-based biometric research face two critical limitations: 1) a small subject pool, comparative to other more established biometric traits, and 2) single-session data sets. In multi-day scenarios, there is performance degradation of EMG-based biometrics. In the third study (Chapter 5) a multi-day and large-sample dataset collection was performed to address these limitations. For the research study, EMG data was collected from 43 participants over three different days with long separation (Days 1, 8, and 29) while performing 16 different static hand/wrist gestures with seven repetitions. The dataset was made public as the GRABMyo dataset. In study four (Chapter 6), a multi-day analysis involving training data and testing data from different days of the GRABMyo dataset was employed to test the robustness of the EMG-based biometrics in practical scenarios. The cross-day authentication using the FDT features extraction resulted in a median EER of 0.039 when the code (gestures) was secure, and an EER of 0.068 when the code (gestures) was leaked to intruders. The cross-day identification achieved a median rank-5 accuracy of 93.0%. For improving multi-day performance, robust feature extraction methods that employ deep learning are warranted. In study five (Chapter 7), a convolutional feature engineering method, MyoBM-Net, is proposed. It involves a two-stage training paradigm for improving the authentication performance. In a cross-day analysis, the MyoBM-Net resulted in a median EER of 0.003 and 0.008 when the gesture (code) is safe and compromised, respectively, thus suggesting superior performance than the traditional feature extraction method. The findings suggest that the performance of EMG-based biometrics is comparable to conventional biometrics for both authentication and identification applications. The results show the potential of using EMG signals for biometric identification in real-world scenarios. The multi-code framework facilitates the combination of gestures as passcodes. The large multi-day dataset will support further research on EMG-based biometrics and other gesture recognition applications. The MyoBM-Net architecture will enable the development of new applications using the GRABMyo dataset, leading to accurate and robust biometric performance. This could lead to EMG-based biometrics being used as an alternative to traditional biometric methods.
  • Item
    Remote Medical Diagnosis in Virtual Reality: A Mixed-methods Approach to Understanding the Perceptions of Patients and Physicians
    (University of Waterloo, 2024-12-11) Momoh, Mustapha Unubi; Burns, Catherine; Mikael Mäkelä, Ville
    Global healthcare faces challenges, including physician shortages and resource limitations. Telehealth has offered solutions through services such as text and video chats. Yet, these methods have their issues: they provide only limited opportunities for diagnoses, and they do not foster solid patient-physician relationships. Virtual reality (VR) offers a promising future alternative, which could facilitate real-time patient-physician interactions that resemble real-life visits through realistic 3D avatars. However, understanding patients’ and physicians’ needs, attitudes, and concerns is crucial for tailoring such VR solutions to healthcare’s unique demands. Therefore, an online patient survey (n = 402) and physician interviews (n = 6) were conducted to understand these needs. Through thematic analysis, common telehealth concerns, including privacy and limited scope of diagnoses in VR, were identified. Unique elevated concerns, mostly around technology reliability, required expertise, accessibility, and integration with existing workflows, also emerged. Furthermore, the study examined the influence of technology affinity on patients’ acceptance of VR telehealth through a Regression Discontinuity Design (RDD) approach. Overall, this study explores the critical concerns in telehealth and proposes evidence-based considerations for developing VR-based telehealth solutions.
  • Item
    Human-aware Autonomous Vehicle Navigation in Pedestrian-rich Unstructured Environments
    (University of Waterloo, 2024-12-11) Golchoubian, Mahsa; Lashgarian Azad, Nasser; Dautenhahn, Kerstin
    Autonomous Vehicles (AVs) have the potential to enhance transportation safety, improve efficiency, and elevate quality of life. Despite significant advancements in AV technology, operating these vehicles in dynamic, crowded environments that requires frequent interaction with other decision-making agents remains challenging. A key example is the interaction between AVs and pedestrians. While most research has focused on these interactions in structured road settings, the complexity and diversity of AV navigation among pedestrians in unstructured environments (e.g., shared spaces, airport terminals) have been less explored. In such pedestrian-rich environments, AVs must be human-aware, meeting people's expectations while ensuring both their safety and comfort. At the same time, navigating these spaces requires reasoning about mutual interactions and accounting for the uncertainty in pedestrian behaviour. This thesis introduces a novel approach to address these challenges, presenting an integrated prediction and planning framework for AV navigation among pedestrians in unstructured shared environments. The thesis is structured into two main phases: a design requirement study and an algorithmic development phase. Given the novelty of this application, the first phase focused on understanding the perceptions and preferences of pedestrians regarding AV behaviour in common interactive scenarios within unstructured settings. Additionally, we examined the unique aspects of pedestrian behaviour in these environments, identifying common behaviours AVs must manage and gathering existing datasets that better represent pedestrian behaviour in such settings. This study highlights the importance of considering uncertainty in pedestrian behaviour, shaping the direction of the development phase. In the algorithmic development phase, we propose a novel proactive, uncertainty-aware Deep Reinforcement Learning (DRL) decision-making algorithm. This algorithm efficiently accounts for complex interaction effects with multiple pedestrians while maintaining reasonable computational time. The navigation algorithm is made proactive and farsighted by integrating the DRL motion planner with a data-driven pedestrian trajectory predictor. Our novel prediction model is designed to forecast pedestrian trajectories in highly interactive shared environments. It uses a collision risk metric to identify key interacting agents and encodes their effects through a newly engineered interaction feature which guide the learning process. During training, we prevented overconfident predictions and improved estimates of prediction uncertainty using an augmented loss function that incorporates uncertainty awareness. Unlike other DRL algorithms in this area, our model's DRL motion planner accounts for prediction uncertainty, integrating it into the reward function to encourage the AV to minimize collision probability with pedestrians over a prediction horizon. Additionally, the reward function design encourages socially aware behaviours, such as reducing speed during close encounters, respecting pedestrians' personal space, and adhering to social norms identified in our earlier design requirement study. We trained our model in a simulation environment that contains realistic pedestrian trajectory behaviour in the presence of vehicles in shared spaces. The simulation results demonstrate that our uncertainty-aware DRL navigation framework outperforms state-of-the-art DRL crowd navigation and uncertainty-aware Model Predictive Control (MPC) models, both in terms of efficiency and social behaviour aspects. Overall, this thesis contributes to the advancement of socially-aware crowd navigation algorithms beyond human-sized mobile robots to autonomous vehicles operating as mobility aids among pedestrians in unstructured environments. It demonstrates how agent interactions can be effectively modelled within prediction and planning modules, and how uncertainty in these predictions can be integrated into a DRL-based motion planner.
  • Item
    Numerical and experimental investigation of effects of deformability of circulating tumor cells in physical occlusion
    (University of Waterloo, 2024-11-26) Keshavarz Motamed, Pouyan; Maftoon, Nima; Poudineh, Mahla
    The hematogenous spread of metastasis, an indispensable pathway in metastasis progression, occurs when primary tumor cells enter the bloodstream and circulate throughout the body. Understanding the genetic, biochemical, and biomechanical factors contributing to this spread could significantly advance the early diagnosis and treatment of metastasis. Among these factors, the hemodynamic forces in the blood play a crucial role in spreading metastasis to distant organs, as the blood flow is the primary means of transporting the circulating tumor cells (CTC) in the bloodstream. The survival, intravascular arrest, and extravasation of CTCs are significantly influenced by shear stresses from their interactions with blood plasma, blood cells, and endothelial cells forming the inner layer of blood vessels. However, our understanding of these interactions is still limited due to the complex nature of the phenomena. Advanced numerical methods, capable of accounting for the high deformability of CTCs and fluid dynamics in microcapillaries, offer a promising approach to deciphering CTCs' responses to the hemodynamic forces in their microenvironment. Physics-based cellular-scale numerical methods capable of simulating a large number of highly deformable objects immersed in a fluidic domain came into being as advanced methods to help researchers unravel the CTC's role in metastasis. The discrete nature of these numerical methods comes with the price of defining several unknown parameters for the cell model, which directly impact the deformation behavior of the cell interacting with the force sources that exist in CTC’s microenvironment. Therefore, in the first step of this study, a systematic approach has been developed to identify the unknown parameters of the numerical cell model accurately. In this step, the power of the developed identification method has been credited by using the experimental data reported in the literature for various experiments such as stretch experiments of Red Blood Cells (RBCs) and lung cancer cell deformability measurement with constricted microchannels. However, the experimental data in the literature are insufficient to be applied in identifying the cancer cell parameters, mainly because not only the measured time is too high, making the identification step almost impossible to perform, but the reported data also lack various inputs for cancer cell deformability measurement, hindering the acquiring of stable cell models. Therefore, in the second step, experiments of cells passing the constricted microfluidic devices were performed, and the deformability of highly invasive breast cancer cell lines were measured. By creating the numerical domains of the constricted channels and identifying the cell parameters, cell models of various sizes ranging from 13-18 µm that their motion and deformation behavior have been validated according to the experimental data were acquired. As presented in detail in this study, the developed models can replicate the gradual squeezing and shape change of cancer cells into the constricted microchannels as well as the drop of the flow rate during the cell entrance into the constrictions. Targeting CTC mechanical entrapment in the microcapillaries, in the third step, a predictive numerical tool that can predict the CTC's occlusion fate in an arbitrary microvascular system has been developed. The obstacle in this step is that tracking the CTC’s fate within a large fluidic domain is still beyond the capabilities of the cellular scale in-silico models. Therefore, devising a method that divides an arbitrary microvascular system into smaller domains amenable for the in-silico method to calculate the CTC’s fate is the key to overcome the mentioned obstacle. Therefore, substantial numerical investigations were performed on smaller models to determine the relationship between cell fate and mechanical factors in the blood, such as plasma flow rate, cell deformability, capillary geometry, and cell size. The outputs of the numerical investigations have been stored and used later to predict the cell fate and the site of cell entrapment in the arbitrary microvascular system. Afterward, an algorithm for tracking the CTC in the microvasculature and pinpointing its occluded location has been developed. This algorithm takes the initial cell position and size, communicates with the stored data at every microcapillary branch that contains the CTC, and predicts the CTC trajectory from the previously provided information of the microvascular system's anatomic structure and fluid flow.
  • Item
    Creation of a Custom Language Model for Pediatric Occupational Therapy Documentation
    (University of Waterloo, 2024-11-20) DiMaio, Rachel; Tripp, Bryan
    KidsAbility is a pediatric rehabilitation center that offers services including occupational therapy (OT) to youth. Documentation, including writing progress notes for each treatment appointment, is essential to OT treatment but can also be time-consuming and tedious. If the time spent on writing progress notes was reduced, KidsAbility believes that their capacity for treatment would increase. This thesis explores the creation of a custom large language model that is intended to decrease the amount of time that clinicians spend writing progress notes by transforming point-form scratch notes from pediatric OT treatment appointments into draft full-form documentation in SOAP format for the clinicians to edit. A dataset of thousands of historical progress notes, with personal health information redacted, was used in the model training paradigm for which different training techniques were explored including domain-adaptive pre-training and LoRA fine-tuning. As there were no corresponding scratch notes in the dataset, few-shot prompting with a human-in-the-loop evaluation process was used to generate matching scratch notes. The historical progress notes and generated point-form notes were used to fine-tune Llama 2 and 3 models on the desired task. Different models’ outputs were evaluated and compared before the final model, a fully fine-tuned Llama 3 8B Instruct model, was selected for a pilot study at KidsAbility in which the custom model was compared against the proprietary Microsoft Co-Pilot model. Ten OT’s participated in the study, using Co-Pilot and then the custom model to write their progress notes for three weeks each. It was found that providing training on how to most effectively use the custom model is important in reducing the amount of time spent on the process. After training, the average time taken to write a note was 7.6 minutes compared to an average of 13.8 minutes before training, both of which are based on subjective reporting. The progress notes written during the pilot study were also used in a quality assessment, in which four OTs scored the custom model notes, Co-Pilot notes, and manually written notes on multiple criteria. Results for this evaluation demonstrated that the notes written with the custom model were of high quality, receiving the highest score for three criteria and the second highest score for the remaining two. For all criteria, the custom model notes scored higher than the manually written notes. Objective timing data collection for determining the impact of using the custom model compared to not using any model was limited by the availability of clinicians.mac