Systems Design Engineering
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914
This is the collection for the University of Waterloo's Department of Systems Design Engineering.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item The Human Factors in the Adoption of Ambient Artificial Intelligence Scribe Technology: Towards Informed and User-centered Implementation of AI in Healthcare(University of Waterloo, 2024-10-08) Basha, ImanThe landscape of healthcare documentation has undergone substantial transformations over the past few decades, evolving in parallel with technological advancements and shifts in healthcare delivery models. Central to these changes is the electronic medical record (EMR), a digital iteration of patients' paper charts that has become standard in healthcare settings. While EMRs are instrumental in streamlining data management and accessibility, they have introduced new challenges, particularly in terms of administrative burden on healthcare providers. This thesis explores the integration of ambient artificial Intelligence (AI) scribe technology, a solution leveraging advancements in automatic speech recognition (ASR) and natural language processing (NLP), into physicians' workflows. AI scribes semi-automate the documentation process by capturing and synthesizing physician-patient interactions in real time, potentially alleviating the administrative workload on clinicians and improving the quality of care. The potential benefits of this technology are vast, and its adoption raises significant questions regarding privacy, consent, and trust, especially given its capability to record sensitive interactions in detail. The study aims to (1) explore the integration of ambient scribe technology into physicians' workflows and assess its impact on physician-patient interactions, (2) identify and analyze the concerns related to privacy, consent, and trust among patients and physicians regarding the use of the technology, and (3) develop and evaluate a flexible informed consent protocol for patients and physicians. A mixed-method approach was employed, integrating quantitative data from surveys and qualitative insights from semi-structured interviews, providing a comprehensive understanding of the multifaceted impact of the technology. The findings reveal that while AI scribes offer efficiency gains, particularly for complex and lengthy encounters, they are less beneficial for simple cases. Further, the efficiency of documentation with AI scribes compared to without is found to be dependent on individuals, with some physicians reporting negligible improvements due to extensive post-editing and the need for customization, while others noted notable gains. Regarding the impact on interaction, patients and physicians reported enhanced interactions due to reduced distractions but noted instances of self-censorship by patients due to discomfort with the recording process. Patients also expressed worry about self-censorship by physicians due to medicolegal concerns and unintended consequences due to technology over-reliance. Concerning the second objective, patients and physicians expressed significant privacy concerns due to a lack of understanding and transparency in data handling policies. Patients also expressed concerns regarding the autonomy of private data, unauthorized access, and data breaches. The findings underscore the need for transparent data handling policies and robust security measures. Trust in physicians and pre-established patient-physician relationships also played a notable role in patient consent, with patients more likely to consent to AI scribe use with familiar physicians. To address these concerns, the thesis proposed a Multi-Tier Granular Informed Consent (MTGIC) framework, integrating tiered and granular consent models to enhance transparency and participant control over personal data. The empirical evaluation of the MTGIC was well-received by both patients and physicians, though it necessitates ongoing refinement to improve usability and ensure it aligns with user needs. In conclusion, while ambient scribe technology presents a promising tool for enhancing healthcare delivery, its successful implementation is contingent upon careful consideration of its integration into clinical workflows, the management of privacy concerns, and the development of effective consent processes. This study contributes to the ongoing discussion on the best practices for integrating emerging technologies into healthcare systems, aiming to enhance operational efficiency and patient care quality.Item Camera Calibration from Out-of-Focus Images(University of Waterloo, 2024-10-01) Schmalenberg, RyanFor many 3D computer vision applications, accurate camera calibration is a necessary pre-requisite task. Generally, the objective is to find a camera’s intrinsic parameters such as focal lengths, or extrinsic parameters such as the camera’s pose in 3D space, or both. Camera calibration using structured calibration targets relies on special patterns which contain features that are used to localize control points with sub-pixel accuracy. The most frequently used patterns are checkerboards and circle grids, and in well constrained environments, these patterns are known to provide accurate feature correspondences for accurate camera calibration results. One challenging case for camera calibration is in the instance of calibrating a long focal length camera. In this case, the focal plane can be too far away in distance, and the only practical solution is to capture images of the calibration pattern out-of-focus while it is closer to the camera. Due to the radial distribution of out-of-focus blur, and biases created by a lack of distance preservation, as well as changes in spatial blur with perspective, checkerboard patterns have been proven to lose accuracy when they are captured in out-of-focus images, and with increased blur, can fail to provide feature correspondences all together. To address this, phase-shift circular gradient (PCG) patterns had been proposed as a method to encode control point positions into phase distributions, rather than through pixel intensities. Our work aims to validate previous authors claims of out-of-focus blur invariance and accuracy when using PCG patterns. Using PCG, small circle, and concentric circle grid patterns, we made comparisons using their respective retrieved pixel value focal lengths, and in-focus vs. out-of-focus percentage differences. Initial comparisons showed that PCGs were largely invariant to blur. However, their accuracy was marginally worse than comparable small circles when real-world noise was introduced. In this real case, a 7-DOF robot arm was used for repeatable calibration target positioning. The recorded set of poses was also used to mirror conditions in a further synthetic experiment. From this work, PCGs initially showed mixed results, but when extended beyond real-world conditions, PCGs were the only pattern that worked under the most severe levels of out-of-focus blur. This validated their improved detectability under extreme blur, and theoretical effectiveness for use with long focal length cameras. From these results, this study acknowledges the trade-offs in calibration pattern selection for respective use cases. It also highlights the importance of ellipse fitting techniques, as well as acknowledging the role of other learned methods. Finally, this study outlines the benefits that were observed when using robotic target positioning, and our synthetic validation pipeline for experimentation with calibration patterns under various conditions.Item An Investigation into Automatic Photometric Calibration(University of Waterloo, 2024-09-20) Feng, Chun-ChengPhotometric calibration is a critical process that ensures uniformity in brightness across images captured by a camera. It entails the identification of a function that converts the scene radiance into the pixel values in an image. The goal of the process is to estimate the three photometric parameters - camera response function, vignette, and exposure. A significant challenge in this field is the heavy reliance on ground truth information in current photometric calibration methods, which is often unavailable in general scenarios. To address this, we investigate our proposed simple method, New Photometric Calibration (NPC), which eliminates the need for ground truth data. Firstly, we integrated our photometric calibration algorithm with long-term pixelwise trackers, MFT, enhancing the system’s robustness and reliability. Since the MFT effectively handles occlusion and reduces drifting, it results in a more stable trajectory. By incorporating MFT to track feature points across frames and using the trajectory as corresponding points, we can utilize the pixel intensity of corresponding points to forgo the need for exposure ground truth during initialization. Subsequently, we independently optimize the photometric parameters to sidestep the exponential ambiguity problem. Our experiments demonstrate that our method achieves results comparable to those utilizing ground truth information, as evidenced by comparable root mean square errors (RMSE) of the three photometric parameters. In scenarios without ground truth data, NPC outperforms existing methods. This indicates that our approach maintains the accuracy of photometric calibration and can be applied to arbitrary videos where ground truth information is not provided. In conclusion, our research represents a significant advancement in the field of photometric calibration. We investigate a novel and effective method that requires no ground truth information during the photometric calibration process. Our approach incorporates the use of a robust tracker, enhancing the trajectories of feature points, thereby improving the overall performance of our method. Furthermore, our model not only bypasses the exponential ambiguity problem inherent in the optimization process but also addresses the challenges associated with the traditional reliance on ground truth information, outperforming previous photometric calibration methods when the input lacks ground truth data.Item The importance of incidence angle for GLCM texture features and ancillary data sources for automatic sea ice mapping.(University of Waterloo, 2024-09-19) Pena Cantu, Fernando JoseSea ice is a critical component of Earth’s polar regions. Monitoring it is vital for navigation and construction in the Arctic and crucial to understand and mitigate the impacts of climate change. Synthetic aperture radar (SAR) imagery, particularly dual polarized SAR, is commonly used for this purpose due to its ability to penetrate clouds and provide data in nearly all weather conditions. However, relying solely on HH and HV polarizations for automated sea ice mapping models has limitations, as different ice types and conditions may yield similar backscatter signatures. To enhance the accuracy of these classification models, researchers have explored the integration of additional features, including hand-crafted texture features, learned features, and supplementary data sources. This thesis makes two main contributions to the field of automated sea ice mapping. The first contribution investigates the dependence of incidence angle (IA) on gray level co-occurrence matrix texture features (GLCM) and its impact on sea ice classification. The methodology involved extracting GLCM features from SAR images in dB units and analyzing their dependence on IA using linear regression and class separability metrics. In addition, a Bayesian classifier was trained to compare the classification performance with and without incorporating the IA dependence. The results indicated that the IA effect had a minor impact on classification performance (≈ 1%), with linear regression results indicating that the IA dependence accounts for approximately less 10% of the variance in most cases. The second contribution evaluates the importance of various data inputs for automated sea ice mapping using the AI4Arctic dataset. A U-Net based model was trained with SAR imagery, passive microwave data from AMSR2, weather data from ERA5, and ancillary data. Ablation studies and the addition of individual data inputs were conducted to assess their impact on model performance. The results demonstrated that including AMSR2, time, and location data significantly increased model performance, especially for the classification accuracy of major ice types in stage of development (SOD). ERA5 data had mixed effects, as it was found not to increase performance when AMSR2 was already included. These findings are critical for the development of more accurate and efficient automated sea ice mapping systems. The minimal impact of IA dependence on GLCM features suggests that accounting for IA may not be necessary, simplifying the feature extraction process. Identifying the most valuable data inputs allows for the optimization of model performance, ensuring better resource allocation and enhanced operational capabilities in sea ice monitoring. This research provides a foundation for future studies and developments in automated sea ice mapping, contributing to more effective climate monitoring and maritime navigation safety.Item Automatic Whale Detection using Deep learning(University of Waterloo, 2024-09-17) Patel, MuhammedAccurate monitoring of whale populations is essential for conservation efforts, yet traditional surveying methods are often time-consuming, expensive, and limited in coverage. This thesis investigates the automation of whale detection using state-of-the-art (SOTA) deep learning techniques applied to high-resolution aerial imagery. By leveraging advancements in computer vision, specifically object detection models, this research aims to develop a robust and efficient system for identifying and counting whales from aerial surveys. The study formulates whale detection as a small object detection problem and evaluates the performance of various SOTA models, including Faster R-CNN, YOLOv8, and Deformable DETR, paired with modern backbone architectures such as ConvNext-T, Swin-T, and ResNet-50. The influence of input image size and context on model performance is systematically explored by testing patch sizes ranging from 256 to 4096 pixels, marking this study as the first to investigate the efficacy of such large patch sizes in the remote sensing domain. Results indicate that the Faster R-CNN model with a ConvNext-T backbone achieves the highest detection accuracy, with an average precision of 0.878 at an IoU threshold of 0.1, particularly when trained on larger patch sizes. The study also addresses the challenge of domain adaptation by implementing an active learning framework, designed to enhance model performance on new survey data with varying environmental conditions. A novel portfolio-based acquisition function, leveraging the social behavior of whales, is introduced to optimize the annotation process. This research significantly contributes to the field of automated whale monitoring, offering a scalable and adaptable solution that reduces annotation costs and improves the accuracy of population estimates. The developed system holds promise for enhancing conservation strategies and providing valuable insights into whale movements and behaviors.Item Improving Neural Radiance Fields for More Efficient, Tailored, View-Synthesis(University of Waterloo, 2024-09-17) Nair, Saeejith MuralidharanNeural radiance fields (NeRFs) have revolutionized novel view synthesis, enabling high-quality 3D scene reconstruction from sparse 2D images. However, their computational intensity often hinders real-time applications and deployment on resource-constrained devices. Traditional NeRF models can require days of training for a single scene and demand significant computational resources for rendering, with some implementations necessitating over 150 million network evaluations per rendered image. While various approaches have been proposed to improve NeRF efficiency, they often employ fixed network architectures that may not be optimal for all scenes. This research introduces NAS-NeRF, an new approach that employs generative neural architecture search (NAS) to discover compact, scene-specialized NeRF architectures. NAS, a technique for automatically designing neural network architectures, is investigated as a potential method for optimizing NeRFs by tailoring network architectures to the specific complexities of individual scenes. NAS-NeRF reformulates the NeRF architecture into configurable field cells, enabling efficient exploration of the architecture space while maintaining compatibility with various NeRF variants. Our method incorporates a scene-specific optimization strategy that considers the unique characteristics of each 3D environment to guide architecture search. We also introduce a quality-constrained generation approach that allows for the specification of target performance metrics within the search process. Experiments on the Blender synthetic dataset demonstrate the effectiveness of NAS-NeRF in generating a family of architectures tailored to different efficiency-quality trade-offs. Our most efficient models (NAS-NeRF XXS) achieve up to 23× reduction in parameters and 22× fewer FLOPs compared to baseline NeRF, with only a 5.3% average drop in structural similarity (SSIM). Meanwhile, our high-quality models (NAS-NeRF S) match or exceed baseline performance while reducing parameters by 2-4× and offering up to 1.93× faster inference. These results suggest that high-quality novel view synthesis can be achieved with more compact models, particularly when architectures are tailored to specific scenes. NAS-NeRF contributes to the ongoing research into efficient 3D scene representation methods, helping enable applications in resource-constrained environments and real-time scenarios.Item Microarray Image Denoising Leveraging Autoencoders and Attention-Based Architectures with Synthetic Training Data(University of Waterloo, 2024-09-16) Czarnecki, ChrisMicroarray technology has for many years remained a golden standard in transcriptomics. However, preparation of physical slides in wet labs involves procedures which tend to introduce occasional dirt and noise into the slide. Having to repeat experiments due to environmental noise present in the scanned images leads to increased reagent and labor costs. Motivated by the high costs of repeated wet lab procedures we explore denoising methods in the narrow subfield of microarray image analysis. We propose SADGE, a domain-relevant metric to quantify the denoising power of methods considered. We introduce a synthetic data generation protocol which permits the creation of very large microarray image datasets programmatically and provides noise-free ground truth useful for objective quantification of denoising. We also train several deep learning architectures for the denoising task, with several of them beating the current state-of-the-art method on both PSNR and SADGE metrics. We propose a new training modality leveraging EATME module to condition the image reconstruction on ground-truth expression values and we introduce an additional loss term (DEL) which further enhances the denoising capabilities of the model while ensuring minimal information loss. Collectively, innovations outlined in our work constitute a significant contribution to the field of microarray image denoising, influencing the cost-effectiveness of microarray experiments and thus impacting a wide range of clinical and biotechnological applications.Item Financialization of the Housing Market: A Contribution to Modern Urban Rent Theory(University of Waterloo, 2024-09-16) Wright, KirstenA great deal of wealth is produced through the economic activity of cities. There is a gap, however, in the formal apparatus in standard economic theory for analyzing the distribution of this enormous value created in cities. In the context of a widely-felt housing crisis, we explore how the capture of urban value by financial actors through the financialization of the housing market affects ownership patterns in urban areas, and the ultimate implications of these processes for urban productivity. We hypothesize that financialization induces a shift towards tenancy among the urban workforce that is likely to result in decreased urban productivity through a range of channels. To examine this hypothesis, we construct an agent-based model with a land market and production sector in which productivity scales superlinearly with city population. This work brings together urban agglomeration effects, Ricardian rent theory and a spatially explicit land market model in a novel way. In our model, transportation costs determine the size of the city, and the available locational rents. Rising productivity increases wages and urban land values, so the value of increased productivity is transferred to land owners. Investors attempt to capture these productive gains by purchasing land. These financial actors can bid against residents to purchase urban land. The interaction of agents determines the distribution of property ownership, city size, and wages. City size and wages provide a measure of urban productivity. The evolving pattern of property ownership tells us how residents are distributed between the tenant class and the owner class. We then explore a range of channels through which financialization might result in decreased urban productivity. When we add this link in the model, we see that financialization not only transforms the class structure of the city and the distribution of urban wealth, it disrupts the relationship between population growth and productivity, reducing the wealth and resilience of the urban system. To illustrate the uses of this kind of computational model for economic policy analysis, we run six policy experiments with and without the productivity link. Contributions of this work include: integrating classical rent theory into an agent-based urban model; linking urban rent dynamics with urban productivity, and population growth; incorporating urban scaling literature into the model framework; examining the impacts of financialization on wealth distribution and urban productivity; creating a framework for a broader understanding of public policies in an urban system; and examining the qualitative effects of various public policies on wealth distribution, productivity, and class.Item Assessment of Acoustic Markers of Conversational Difficulty(University of Waterloo, 2024-09-06) Ellag, MenatallaHuman conversations, one of the most complex behaviors, require the real-time coordination of speech production and comprehension, involving cognitive, social, and biological dimensions. There has been a rising need for laboratory and clinical assessments to evolve to capture the essence of everyday interactions. The cognitive demands of interactive conversation, which require listeners to process and store information while simultaneously planning their responses, often exceed those encountered in standard clinical tests. These assessments must encompass diverse contexts and participant groups, including varying hearing statuses, challenging listening environments such as background noise, the use of assistive devices that may alter the listening experience, and different conversation types such as relational versus transactional exchanges, dyadic versus group interactions, and face-to-face versus remote interactions. This study consists of two investigations exploring how different conditions affect acoustic measures of speech production and conversational behavior. The first study was an extension of a study originally conducted for content analysis and participants’ subjective rating questionnaires, focusing on hearing-impaired (HI) individuals. It examined the impact of face masks and remote microphones on communication dynamics. Four native English-speaking HI participants engaged in free-form conversations within small groups under a constant background noise of 55 dBA. Interestingly, the results showed that using remote microphones shortened floor-transfer offsets (FTOs) and extended conversation durations, suggesting improved communication. When participants did not wear a face mask, interpausal unit (IPU) durations were shorter with remote microphones than without, indicating easier communication. However, no significant difference was found between the two mask conditions, suggesting that face masks affect both speech perception and production by decreasing inhalation and exhalation volumes, thereby limiting the duration of utterances. Face masks are speculated to increase resistance to airflow, reducing subglottal pressure and consequently lowering fundamental frequency (F0). Despite no significant differences in articulation rate and floor transfer rate, the constant noise environment, presented at lower levels compared to previous studies, may have likely limited the potential for pronounced effects. The second study involved normal-hearing (NH) individuals, investigating the effects of conversation type (free-form vs. task-based) and noise presence (70 dB SPL) on conversational dynamics. Dyadic interactions among NH participants were examined. Task-based conversations exhibited structured patterns with longer FTOs and higher floor transfer rates, while free-form conversations showed greater FTO variability, more frequent overlaps, longer IPUs, and increased pause durations and rates. Noise presence increased IPU durations and pause lengths but did not significantly alter floor-transfer rates or FTO variability. Both conversation types experienced increased articulation rates and speech levels in noise. Contrary to the expected change as part of the Lombard effect, the increase in articulation rates may be attributed to the noise acting as a stressor. Meanwhile, the increase in mean speech levels was less pronounced than expected, possibly due to the specific noise characteristics and the use of closed headphones. These studies shine a light on the complexity of communicative interactions and the necessity of accounting for a wide spectrum of factors in experimental designs. The findings highlight the importance of considering both environmental conditions and conversation types when researching speech perception, production, and conversational dynamics. This research provides valuable insights for academic studies and the development of hearing-assistive technologies, emphasizing the need for assessments that reflect the varied nature of everyday communication.Item Addressing Data Scarcity in Domain Generalization for Computer Vision Applications in Image Classification(University of Waterloo, 2024-08-30) Kaai, KimathiDomain generalization (DG) for image classification is a crucial task in machine learning that focuses on transferring domain-invariant knowledge from multiple source domains to an unseen target domain. Traditional DG methods assume that classes of interest are present across multiple domains (domain-shared), which helps mitigate spurious correlations between domain and class. However, in real-world scenarios, data scarcity often leads to classes being present in only a single domain (domain-linked), resulting in poor generalization performance. This thesis introduces the domain-linked DG task and proposes a novel methodology to address this challenge. This thesis proposes FOND, a "Fairness-inspired cONtrastive learning objective for Domain-linked domain generalization," which leverages domain-shared classes to learn domain-invariant representations for domain-linked classes. FOND is designed to enhance generalization by minimizing the impact of task-irrelevant domain-specific features. The theoretical analysis in this thesis extends existing domain adaptation error bounds to the domain-linked DG task, providing insights into the factors that influence generalization performance. Key theoretical findings include the understanding that domain-shared classes typically have more samples and learn domain-invariant features more effectively than domain-linked classes. This analysis informs the design of FOND, ensuring that it addresses the unique challenges of domain-linked DG. Furthermore, experiments are performed across multiple datasets and experimental settings to evaluate the effectiveness of various current methodologies. The proposed method achieves state-of-the-art performance in domain-linked DG tasks, with minimal trade-offs in the performance of domain-shared classes. Experimental results highlight the impact of shared-class settings, total class size, and inter-domain variations on the generalizability of domain-linked classes. Visualizations of learned representations further illustrate the robustness of FOND in capturing domain-invariant features. In summary, this thesis advocates future DG research for domain-linked classes by (1) theoretically and experimentally analyzing the factors impacting domain-linked class representation learning, (2) demonstrating the ineffectiveness of current state-of-the-art DG approaches, and (3) proposing an algorithm to learn generalizable representations for domain-linked classes by transferring useful representations from domain-shared ones.Item The Effects of Stimulus Statistics on Representational Similarity in a Model of Mouse Visual Cortex(University of Waterloo, 2024-08-30) Torabian, ParsaDeep convolutional neural networks have emerged as convincing models of the visual cortex, demonstrating remarkable ability to predict neural activity. However, the specific combination of factors that optimally align these models with biological vision remains an open question. Network architecture, training objectives, and the statistics of training data all likely play a role, but their relative contributions and interactions are not fully understood. In this study, we focus on the role of training data in shaping the representations learned by deep networks. We investigate how the degree of 'realism' in the training data affects the similarity between network activations and neural recordings from mouse visual cortex. We hypothesised that training on more naturalistic stimuli would lead to greater brain-model similarity, as the visual system has evolved to process the statistics of the natural world. We leveraged the Unity video-game engine to generate custom training datasets with the ability to control for three distinct factors: the realism of the virtual environment, the motion statistics of the simulated agent, and the optics of the modelled eye. Deep networks were trained on datasets generated from all eight permutations of these three experiment variables using a self-supervised learning approach. The trained models were subsequently compared to mouse neural data from the Allen Institute using representational similarity analysis. Our results reveal that the realism of the virtual environment has a substantial and consistent effect on brain-model similarity. Networks trained on the more realistic meadow-environment exhibited significantly higher similarity to mouse visual cortex across multiple areas. In contrast, the effects of motion statistics and visual optics were more subtle and area-specific. Furthermore, all possible interactions between these three factors were statistically significant, suggesting complex nonlinear relationships.Item Towards an Optical Biopsy Tool Using Photon Absorption Remote Sensing(University of Waterloo, 2024-08-28) Veugen, JennaStreamlining diagnosis is more important than ever, as the long wait times, resource constraints, and diagnostic inaccuracies place burdens on the healthcare system that climb each year. The development of a tool capable of instantaneous in situ diagnosis would eliminate the excess time and resources used in current diagnostic procedures, and thereby relieve some of these burdens. This could be achieved with an optical biopsy by leveraging light-matter interactions for advanced microscopy in an endoscopic form. However, to date there is no technology able to provide diagnostically equivalent image quality to the gold standard for diagnosis in an endoscopic form. Photon Absorption Remote Sensing (PARS) is a novel imaging modality that utilizes optical absorption contrast to achieve label-free, non-contact microscopy. PARS technology holds promising potential in resolving many of the challenges faced in the development of an optical biopsy tool. This thesis explores the initial development of a PARS endoscope capable of in vivo microvascular imaging through multiple phases of development. The first stage investigated the performance of a dual green PARS bench-top system, utilizing green excitation and detection wavelengths to address chromatic aberrations in the final endoscopic form. The system was confined to a green excitation wavelength in order to target the absorption of hemoglobin for vascular imaging. It was then paired with a green detection wavelength for the first time, unlike typical PARS microscopes that rely on near-infrared (NIR) wavelengths for detection. Both phantom and in vivo samples were imaged to validate the performance of the system, showing functionality and sensitivity comparable to NIR PARS systems. The next phase explored the transition of a stationary PARS bench-top system to a free imaging head using optical fiber. This introduced many challenges, such as high losses and inherent noise, that had to be addressed through careful design, assembly and optimization. Two types of specialized optical fiber were tested by imaging phantom targets and in vivo chicken embryo samples. The double clad fiber setup showed strong performance with excellent contrast, signal to noise ratio and sensitivity in the PARS images. The final stage included miniaturizing the imaging head to achieve an endoscopic form factor. Various miniature objective lens designs were developed, and tested in the system. The successful design was capable of imaging both in phantoms and in vivo, demonstrating, for the first time, vasculature imaged using PARS through optical fiber. This research lays the groundwork in the development of a PARS endoscope capable of providing a gold standard quality, instantaneous diagnosis in situ. It demonstrates a successful design capable of capturing relevant biomarkers in vivo using endoscopic PARS technology. The improved understanding of the design requirements for a more efficient system, and insight into the fundamental limitations, highlight future directions to further improve this device. This puts us one step closer towards achieving a successful optical biopsy tool that could streamline diagnosis, improve the outcome, safety and experience of the patient, and significantly reduce the cost burden on the health system.Item Language Guided Out-of-Bounding Box Pose Estimation for Robust Ice Hockey Analysis(University of Waterloo, 2024-08-27) Balaji, BaveshAccurate estimation of human pose and the pose of interacting objects, such as hockey sticks, is fundamental in vision-driven hockey analytics and crucial for tasks like action recognition and player assessment. Estimating 2D keypoints from monocular video is challenging, particularly in fast-paced sports such as ice hockey, where motion blur, occlusions, bulky equipment, color similarities, and constant camera panning complicate accurate pose prediction. This thesis addresses these challenges with contributions on three fronts. First, recognizing the lack of an existing benchmark, we present a comparative study of four state-of-the-art human pose estimation approaches using a real-world ice hockey dataset. This analysis aims to understand the impact of each model on ice hockey pose estimation and investigate their respective advantages and disadvantages. Building on insights from this comparative study, we develop an ensemble model for jointly predicting player and stick poses. The ensemble comprises two networks: one trained from scratch to predict all keypoints, and another utilizing a unique transfer learning paradigm to incorporate knowledge from large-scale human pose datasets. Despite achieving promising results, we observe that these top-down approaches yield suboptimal outcomes due to constraints such as requiring all keypoints to be within a bounding box and accommodating only one player per bounding box. To overcome these issues, we introduce an image and text based multi-modal solution called TokenCLIPose, which predicts stick keypoints without encapsulating them within a bounding box. By focusing on capturing only the player in a bounding box and treating their stick as missing, our model predicts out-of-bounding box keypoints. To incorporate the context of the missing keypoints, we use keypoint-specific text prompts to leverage the rich semantic representations provided by language. This dissertation’s findings advance the state-of-the-art in 2D pose estimation for ice hockey, outperforming existing methods by 2.6% on our dataset, and provide a robust foundation for further developments in vision-driven sports analytics.Item Scaling Laws for Compute Optimal Biosignal Transformers(University of Waterloo, 2024-08-20) Fortin, ThomasScaling laws which predict the optimal balance between number of model parameters and number of training tokens given a fixed compute budget have recently been developed for language transformers. These allow model developers to allocate their compute budgets such that they can achieve optimal performance. This thesis develops such scaling laws for the Biosignal Transformer trained separately on both accelerometer data and EEG data. This is done by applying methods used by other researchers to develop similar scaling laws for language transformer models. These are referred to as the iso-FLOP curve method and the parametric loss function method. The Biosignal Transformer model is a transformer model which is designed specifically to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as input. For example, the Biosignal Transformer can be trained to detect or classify seizures from EEG signals. The Biosignal Transformer is also of particular interest because it is designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve performance on downstream tasks with smaller labelled fine-tuning datasets. This work develops scaling laws which optimize for the best unsupervised pre-training loss given a fixed compute budget. Results show that the developed scaling laws are successful at predicting a balance between number of parameters and number of training tokens for compute budgets five times larger than those used to develop them such that pre-training loss is minimized. Researchers who intend to scale up the Biosignal Transformer should use these scaling laws to attain optimal pre-training loss from their given compute budgets when applying unsupervised pre-training with the Biosignal Transformer.Item Talker Sensitivity to Turn-Taking in Conversation(University of Waterloo, 2024-08-19) Masters, BenjaminTurn-taking in conversation is a complex phenomenon that requires talkers to, at a minimum, simultaneously plan and produce their own speech and listen to and comprehend the speech of their partner(s). Given this necessary division of attention, the increase in listening difficulty introduced by hearing impairments can have confounding effects on a person's ability to communicate, and evaluating listening effort during communication remains difficult. One of the most detrimental effects of hearing loss is the impact it has on one's ability to communicate effectively though, thus the assessment of listening effort in natural environments is especially important. This thesis takes two approaches to evaluating listening effort in conversation. The first analyzes the response of the pupil at the temporal scale of turn-taking to understand how effort and attention are allocated between speaking, listening, and other task demands. Pupillary temporal response functions to turn-taking are derived and analyzed for systematic differences that exist across people and acoustic environmental conditions, and are further analyzed to determine differences in pupil response based on expected difficulty of a conversation. The second approach analyzes behavioral changes related to the timing of turn-taking to understand how talkers identify that communication difficulty is being experienced by a conversational partner. The floor transfer offset (FTO), defined as the time it takes one talker to begin their turn after another has ended theirs, was manipulated during interactive conversations to mimic the observed increase in magnitude and variability of FTOs in difficult listening environments. To enable this, an audio processing framework was developed to track the state of a conversation in near real-time and manipulate the perceived response time of talkers. The findings suggest that the timing of turn-taking is not used a cue by talkers to infer difficulty.Item Autonomous Robotic System Conducting Nasopharyngeal Swabbing(University of Waterloo, 2024-08-15) Lee, Peter Qiu JiunThe nasopharyngeal swab test is a procedure where a healthcare worker inserts a swab through the nose until it reaches the nasopharynx located at the back of the nasal cavity in order to collect secretions that can later be examined for illnesses. This procedure saw heightened use to detect cases during the COVID-19 pandemic. Its ubiquity also highlighted fragilities in the healthcare system by way of the hazards to healthcare workers from infectious patients and the pressures a pandemic can inflict upon an unready healthcare system. In this thesis we consider and propose an autonomous robotic system for performing nasopharyngeal swab tests by use of a collaborative robotic manipulator arm, under the ideology that the hardware and techniques could eventually be applied to other types of close-contact tasks to support the healthcare system. We also assume that prospective patients would be standing unrestrained in front of the arm, which adds the challenges of adjusting to arbitrary poses of the head and compensating for natural head motion. We first designed an instrumented end-effector to attach to a robotic arm to enable suitable vision and force sensing capabilities for the task. Next, we developed a finite element modeling simulation environment to describe the deformation of the swab as it moves through the nasal cavity, and solve an optimization problem to find ideal paths through the nasal cavity. A visual servo system was designed to properly align the swab next to the nose using visual information using advances in deep learning and state-estimation, which we validated with a number of human trials. A torque controlled force compliant system was designed and evaluated to determine the feasibility of using force measurements to correct for misalignment when the swab is inserted into a nasal cavity phantom. Finally, we integrated all the system components into a cohesive system for performing nasopharyngeal swab tests. We created a simulator using a nasal cavity phantom and a second robot arm to mimic natural motions of the head. This simulator was leveraged to perform extensive experimentation that found promising controller configurations that were able to compensate for head motion.Item Topic Segmentation of Recorded Meetings(University of Waterloo, 2024-08-13) Lazoja, IlirVideo chapters allow videos to be more easily digestible and can be an important pre-processing step for other video-processing tasks. In many cases, the creator can easily chapter their own videos, especially for well-edited structured videos. However, some types of videos, such as recorded meetings, are more loosely structured with less obvious breaks which makes them more cumbersome to chapter and thus would highly benefit from being automated. One approach to chaptering these types of videos is through performing topic segmentation on the transcript of the video, especially if the video is rich in dialogue. Topic segmentation is the task of dividing text based on when the topic of the text changes, most commonly performed on large bodies of written text. This thesis will detail how well state-of-the-art approaches for topic segmentation performs on recorded meetings, as well as present and evaluate strategies to improve performance for recorded meetings and express shortcomings of the common metrics used for topic segmentation.Item Robust 3D Human Modeling for Baseball Sports Analytics(University of Waterloo, 2024-08-12) Bright, JerrinIn the fast-paced world of baseball, maximizing pitcher performance while minimizing runs relies on understanding subtle variations in mechanics. Traditional analysis methods, reliant on pre-recorded offline numerical data, struggle in the dynamic flow of live games. Although seemingly ideal, broadcast video analysis faces significant challenges due to motion blur, occlusion, and low resolution. This research proposes a novel 3D human modeling technique and a pitch statistics identification system that are robust to the aforementioned challenges. Specifically, we propose a technique called Distribution and Depth-Aware Human Mesh Recovery (D2A-HMR), a depth and distribution-aware 3D human mesh recovery technique that extracts pseudo-depth from each frame and utilizes a transformer network with self- and cross-attention to create a 3D mesh that extracts the 3D pose coordinates. The network is regularized using various loss functions including a silhouette loss function, joint reprojection loss functions, and a distribution loss function which utilize normalizing flow to learn the deviation between the underlying predicted and ground truth distributions. Furthermore, we propose a focused augmentation strategy specifically designed to address the motion blur issue caused by fast-moving motion. Following that, we introduce the PitcherNet system, which is built upon the D2A-HMR and motion blur augmentation strategy. PitcherNet proposes an automated analysis system that analyzes pitcher kinematics directly from live broadcast video, providing valuable pitch statistics (pitch velocity, release point, pitch position, release extension, and pitch handedness). The system relies solely on the broadcast videos as its input and leverages computer vision and pattern recognition to generate reliable pitch statistics from the game. First, PitcherNet isolates the pitcher and batter in each frame using a role classification network. Next, PitcherNet extracts the kinematic information representing the pitcher’s joints and surface using a refined version of D2A-HMR model. Additionally, we enhance the generalizability of the 3D human model by incorporating additional in-the-wild high-resolution videos from the Internet. Finally, PitcherNet employs Temporal Convolutional Network (TCN) and kinematic-driven heuristics to capture the pitch statistics, which can be used to analyze baseball pitchers.Item Planning Renewable Electricity Using Life-Cycle Analysis(University of Waterloo, 2024-07-16) Ali, Mir SadekIt has been predicted that by the mid-21st century worldwide energy demand will grow two to three times the current level of demand. Expanding the global electric power generation capacity will be problematic using the three predominant methods, namely, nuclear fission, fossil fuels and hydropower. There are few suitable sites left for new large-scale hydropower dams. Both fossil fuels and nuclear fission have widespread environmental consequences to their use and the supply of fuel for these two technologies is a non-renewable resource. Renewable energy system (RES) technologies have been proposed as the means to expanding energy markets in a sustainable manner. A formative step in deploying RES will be the design of a standardized methodology for determining policy and planning decisions to initiate market and government support for these nascent technologies. This thesis outlines the design of a RES planning model based on the life-cycle analysis (LA) methodology. The proposed model will integrate a climatologically-based renewable energy optimization and simulation (REOS) model into the LCA. Goal-attainment algorithms will be used to find feasible installed capacities for power generation which will meet a prescribed load demand and simultaneously attempt to meet desired policy targets. The policy targets here will be the per-kilowatt hour price of power, life-cycle air-borne CO2 emissions, and the land requirements of the system. An analysis of the performance of RES technologies in two Canadian cities that already have mature electricity utilities is done to demonstrate the methodology.Item Toward Automated Detection of Landfast Ice Polynyas in C-Band Synthetic Aperture Radar Imagery with Convolutional Neural Networks(University of Waterloo, 2024-07-12) Brubacher, NeilLandfast ice polynyas - areas of open water surrounded by ice - are important features in many Northern coastal communities, and their automated detection from spaceborne synthetic aperture radar (SAR) imagery is positioned to support on-ice travel safety under changing Arctic sea ice and climate conditions. The characteristically small spatial scales and sparse distribution of landfast ice polynyas present key challenges to their detection, and limit the suitability of established methods developed for SAR-based sea ice and open water classification at broader spatial scales. This thesis explores the development of deep learning-based object detection networks for landfast ice polynya detection in dual-polarized C-band SAR imagery, having three main contributions. The first is a characterization of landfast ice polynya signatures and separability in SAR imagery based on datasets of polynyas mapped over several seasons near the communities of Sanikiluaq, NU, and Nain, NL. Results from this analysis highlight the challenging and variable nature of polynya signatures in dual-polarized backscatter intensity, motivating the use of convolutional neural networks (CNNs) to capture relevant textural, geometric and contextual polynya features. The second contribution is the development and evaluation of CNN-based object detection networks for polynya detection, drawing on advancements in the natural-scene small object detection field to address the challenging size and sparsity characteristics of polynyas. A simplified detection network architecture optimized for polynya detection in terms of feature representation capacity, feature map resolution, and training loss balancing is found to reliably detect polynyas with sufficient size and local contrast, and demonstrates good generalization to regions not seen in training. The third contribution is an assessment of detection model generalizability between imagery produced by Sentinel-1 (S1) and Radarsat Constellation Mission (RCM) SAR sensors, illustrating the ability for models trained only on S1 imagery to effectively extract and classify polynya features in RCM despite differences in resolution and noise characteristics. Across regions and sensors, missed polynyas are found to have smaller sizes and weaker signatures than detected polynyas, while false predictions are often caused by boundary areas between smooth and rough landfast ice. These represent fundamental limits to polynya / landfast ice separability in the medium-resolution, dual-polarized C-band SAR imagery used in this thesis, motivating future research into multi-temporal, multi-frequency, and/or higher-resolution SAR imagery for polynya detection. Ongoing and future progress in the development of robust landfast ice hazard detection systems is positioned to support community sea ice safety and monitoring.