Systems Design Engineering
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9914
This is the collection for the University of Waterloo's Department of Systems Design Engineering.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Recent Submissions
Item Electromyography-based Biometrics for Secure and Robust Personal Identification and Authentication(University of Waterloo, 2024-12-13) Pradhan, AshirbadRecently, electromyogram (EMG), the electrical activity of skeletal muscles, has been proposed as a novel biometric trait to address the limitations of current biometrics, such as fingerprint and facial recognition. A unique property of EMG as a biometric trait is that it allows for distinguishable patterns from different limb movements (e.g. hand gestures), enabling individuals to set personalized passwords comprising multiple gestures for dual-security systems, i.e., both biometric-level and password-level. This is fundamentally different from other physiological signals such as electrocardiogram (ECG) and electroencephalogram (EEG), which are highly difficult for the user to voluntarily control with sufficient precision. This unique advantage has facilitated EMG-based biometrics for two different applications: authentication, where a user can access personal devices, and identification, where the system determines the closest match within a database. To establish EMG as a novel biometric trait, the following two properties need to be thoroughly investigated: 1) the ability to accurately detect the genuine user from all the other users (uniqueness), and 2) retaining the biometric performance over multiple sessions and multiple days (robustness). The overarching aim of this PhD research is to investigate these properties by addressing a series of research questions in the following studies. In the first study (Chapter 3), the effect of EMG system parameters such as the feature extraction methods and the number of channels are investigated for improved biometric performance. Three robust feature extraction methods, Time-domain (TD), Frequency Division Technique (FDT), and Autoregressive (AR) features, and their combinations were investigated, while the number of channels varied from one to eight. The results showed that for all the feature extraction methods, the performance of a four-channel setup plateaued with a further increase in channels. For a four-channel system, the authentication performance resulted in an average equal error rate (EER) of 0.04 for TD features, 0.053 for FDT features, and 0.10 for AR features. The identification mode resulted in an average Rank-1 accuracy was 97% for TD features, 87.6% for FDT features, and 63.7% for AR features. Thus, combining the TD feature set and a four-channel EMG is recommended for optimal biometric performance. In the second study (Chapter 4), the dual-security property of EMG is facilitated by the development of a multi-code framework. Such a framework allows the combination of hand gestures to form an access code. In this study, three levels of fusion, score, rank, and decision were investigated for the two biometric applications. The biometric performance of the fusion schemes wasanalyzed while varying the codelength from one to six. For a codelength of four, the authentication EER was 0.006 using a decision-level fusion scheme using a weighted majority voting. For the identification mode, the score-level fusion scheme resulted in a Rank-2 accuracy of 99.9% for a codelength of four. The multi-code biometric system provided improved dualmode security based on the personalized codes and biometric traits of individuals. However, the above two studies and the majority of the current EMG-based biometric research face two critical limitations: 1) a small subject pool, comparative to other more established biometric traits, and 2) single-session data sets. In multi-day scenarios, there is performance degradation of EMG-based biometrics. In the third study (Chapter 5) a multi-day and large-sample dataset collection was performed to address these limitations. For the research study, EMG data was collected from 43 participants over three different days with long separation (Days 1, 8, and 29) while performing 16 different static hand/wrist gestures with seven repetitions. The dataset was made public as the GRABMyo dataset. In study four (Chapter 6), a multi-day analysis involving training data and testing data from different days of the GRABMyo dataset was employed to test the robustness of the EMG-based biometrics in practical scenarios. The cross-day authentication using the FDT features extraction resulted in a median EER of 0.039 when the code (gestures) was secure, and an EER of 0.068 when the code (gestures) was leaked to intruders. The cross-day identification achieved a median rank-5 accuracy of 93.0%. For improving multi-day performance, robust feature extraction methods that employ deep learning are warranted. In study five (Chapter 7), a convolutional feature engineering method, MyoBM-Net, is proposed. It involves a two-stage training paradigm for improving the authentication performance. In a cross-day analysis, the MyoBM-Net resulted in a median EER of 0.003 and 0.008 when the gesture (code) is safe and compromised, respectively, thus suggesting superior performance than the traditional feature extraction method. The findings suggest that the performance of EMG-based biometrics is comparable to conventional biometrics for both authentication and identification applications. The results show the potential of using EMG signals for biometric identification in real-world scenarios. The multi-code framework facilitates the combination of gestures as passcodes. The large multi-day dataset will support further research on EMG-based biometrics and other gesture recognition applications. The MyoBM-Net architecture will enable the development of new applications using the GRABMyo dataset, leading to accurate and robust biometric performance. This could lead to EMG-based biometrics being used as an alternative to traditional biometric methods.Item Creation of a Custom Language Model for Pediatric Occupational Therapy Documentation(University of Waterloo, 2024-11-20) DiMaio, RachelKidsAbility is a pediatric rehabilitation center that offers services including occupational therapy (OT) to youth. Documentation, including writing progress notes for each treatment appointment, is essential to OT treatment but can also be time-consuming and tedious. If the time spent on writing progress notes was reduced, KidsAbility believes that their capacity for treatment would increase. This thesis explores the creation of a custom large language model that is intended to decrease the amount of time that clinicians spend writing progress notes by transforming point-form scratch notes from pediatric OT treatment appointments into draft full-form documentation in SOAP format for the clinicians to edit. A dataset of thousands of historical progress notes, with personal health information redacted, was used in the model training paradigm for which different training techniques were explored including domain-adaptive pre-training and LoRA fine-tuning. As there were no corresponding scratch notes in the dataset, few-shot prompting with a human-in-the-loop evaluation process was used to generate matching scratch notes. The historical progress notes and generated point-form notes were used to fine-tune Llama 2 and 3 models on the desired task. Different models’ outputs were evaluated and compared before the final model, a fully fine-tuned Llama 3 8B Instruct model, was selected for a pilot study at KidsAbility in which the custom model was compared against the proprietary Microsoft Co-Pilot model. Ten OT’s participated in the study, using Co-Pilot and then the custom model to write their progress notes for three weeks each. It was found that providing training on how to most effectively use the custom model is important in reducing the amount of time spent on the process. After training, the average time taken to write a note was 7.6 minutes compared to an average of 13.8 minutes before training, both of which are based on subjective reporting. The progress notes written during the pilot study were also used in a quality assessment, in which four OTs scored the custom model notes, Co-Pilot notes, and manually written notes on multiple criteria. Results for this evaluation demonstrated that the notes written with the custom model were of high quality, receiving the highest score for three criteria and the second highest score for the remaining two. For all criteria, the custom model notes scored higher than the manually written notes. Objective timing data collection for determining the impact of using the custom model compared to not using any model was limited by the availability of clinicians.macItem Combined Action Observation, Motor Imagery and Steady State Motion Visual Evoked Potential Based Brain Computer Interface System(University of Waterloo, 2024-11-12) Ravi, AravindStroke is one of the leading causes of long-term acquired disability in adults worldwide. Gait recovery is a major objective in post-stroke rehabilitation programs. Conventional gait therapy encourages patient involvement, but the results can be slow and/or limited, leading to sub-optimal recovery. Active patient involvement, collaboration, and motivation are key factors that promote efficient motor learning. Therefore, there is a need to develop novel rehabilitation strategies to enhance user engagement by utilizing their movement intent. Brain-computer interfaces (BCIs) based on electroencephalography (EEG) offer an attractive approach for rehabilitation as they enable an alternative method for active participation in therapy. Current visual BCIs provide high decoding accuracy but typically do not activate sensorimotor areas critical for motor recovery. Conversely, BCIs based on motor imagery (MI) activate motor areas but suffer from high inter-subject variability and long user training, resulting in poorer movement intent detection accuracy and potentially leading to high cognitive demand. This thesis proposed a novel BCI paradigm called CAMS—Combined Action Observation (AO), Motor Imagery (MI), and Steady-State Motion Visual Evoked Potentials (SSMVEP). The CAMS paradigm aimed to induce acute changes in movement-related areas of the cortex through the observation and imagery of gait movements, activating both motor and visual cortices to elicit SSMVEP-like responses. Furthermore, the responses elicited by the CAMS paradigm were investigated in two distinct applications to detect user movement intent with the aim of actively engaging the participant. The research conducted across three studies investigates the efficacy of CAMS in enhancing cortical excitability, decoding gait phases, and improving asynchronous visual BCI performance. Twenty-five healthy volunteers participated in this study wherein they observed and imagined lower limb movements of gait as part of the CAMS intervention, which was compared with an SSMVEP control condition. Study I aimed to investigate the acute changes in cortical excitability induced by the CAMS intervention. The results demonstrated significant increases in movement-related cortical potential (MRCP) components, indicating enhanced cortical excitability. For instance, the magnitude of BP1 at channel C1 increased from -1.41 ± 0.54 µV pre-intervention to -3.23 ± 0.5 µV post-intervention (p = 0.009), highlighting the potential of CAMS to engage motor- related brain areas and promote neuroplasticity. Study II focused on decoding the phases of gait (swing and stance) from EEG responses elicited by the CAMS paradigm. Using Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM), the study achieved a classification accuracy of 75% and 78%, respectively, in decoding the swing and stance phases of gait. Study III introduced a novel detection algorithm based on Complex Convolutional Neural Networks (C-CNN) for asynchronous offline CAMS BCI. The C-CNN method achieved high F1-scores for asynchronous operation. Median F1-scores for C-CNN were 0.88 (W=1s), 0.92 (W=2s), and 0.96 (W=3s), with corresponding False activation rates (FARs) of 0.34, 0.30, and 0.27. Additionally, larger stimulus frequency differences resulted in stronger visual BCI classification performance, with combinations (7.5 Hz, 12 Hz) and (8.57 Hz, 12 Hz) yielding the highest accuracies of 87% and 78%, respectively. These findings underscore the potential of the CAMS BCI paradigm in enhancing cortical excitability, eliciting responses for decoding gait phases, and improving asynchronous visual BCI performance while simultaneously engaging the movement related areas of the cortex. By providing a comprehensive investigation of the CAMS paradigm, this work contributes to existing knowledge and helps guide future clinical applications in neurorehabilitation.Item The Human Factors in the Adoption of Ambient Artificial Intelligence Scribe Technology: Towards Informed and User-centered Implementation of AI in Healthcare(University of Waterloo, 2024-10-08) Basha, ImanThe landscape of healthcare documentation has undergone substantial transformations over the past few decades, evolving in parallel with technological advancements and shifts in healthcare delivery models. Central to these changes is the electronic medical record (EMR), a digital iteration of patients' paper charts that has become standard in healthcare settings. While EMRs are instrumental in streamlining data management and accessibility, they have introduced new challenges, particularly in terms of administrative burden on healthcare providers. This thesis explores the integration of ambient artificial Intelligence (AI) scribe technology, a solution leveraging advancements in automatic speech recognition (ASR) and natural language processing (NLP), into physicians' workflows. AI scribes semi-automate the documentation process by capturing and synthesizing physician-patient interactions in real time, potentially alleviating the administrative workload on clinicians and improving the quality of care. The potential benefits of this technology are vast, and its adoption raises significant questions regarding privacy, consent, and trust, especially given its capability to record sensitive interactions in detail. The study aims to (1) explore the integration of ambient scribe technology into physicians' workflows and assess its impact on physician-patient interactions, (2) identify and analyze the concerns related to privacy, consent, and trust among patients and physicians regarding the use of the technology, and (3) develop and evaluate a flexible informed consent protocol for patients and physicians. A mixed-method approach was employed, integrating quantitative data from surveys and qualitative insights from semi-structured interviews, providing a comprehensive understanding of the multifaceted impact of the technology. The findings reveal that while AI scribes offer efficiency gains, particularly for complex and lengthy encounters, they are less beneficial for simple cases. Further, the efficiency of documentation with AI scribes compared to without is found to be dependent on individuals, with some physicians reporting negligible improvements due to extensive post-editing and the need for customization, while others noted notable gains. Regarding the impact on interaction, patients and physicians reported enhanced interactions due to reduced distractions but noted instances of self-censorship by patients due to discomfort with the recording process. Patients also expressed worry about self-censorship by physicians due to medicolegal concerns and unintended consequences due to technology over-reliance. Concerning the second objective, patients and physicians expressed significant privacy concerns due to a lack of understanding and transparency in data handling policies. Patients also expressed concerns regarding the autonomy of private data, unauthorized access, and data breaches. The findings underscore the need for transparent data handling policies and robust security measures. Trust in physicians and pre-established patient-physician relationships also played a notable role in patient consent, with patients more likely to consent to AI scribe use with familiar physicians. To address these concerns, the thesis proposed a Multi-Tier Granular Informed Consent (MTGIC) framework, integrating tiered and granular consent models to enhance transparency and participant control over personal data. The empirical evaluation of the MTGIC was well-received by both patients and physicians, though it necessitates ongoing refinement to improve usability and ensure it aligns with user needs. In conclusion, while ambient scribe technology presents a promising tool for enhancing healthcare delivery, its successful implementation is contingent upon careful consideration of its integration into clinical workflows, the management of privacy concerns, and the development of effective consent processes. This study contributes to the ongoing discussion on the best practices for integrating emerging technologies into healthcare systems, aiming to enhance operational efficiency and patient care quality.Item Camera Calibration from Out-of-Focus Images(University of Waterloo, 2024-10-01) Schmalenberg, RyanFor many 3D computer vision applications, accurate camera calibration is a necessary pre-requisite task. Generally, the objective is to find a camera’s intrinsic parameters such as focal lengths, or extrinsic parameters such as the camera’s pose in 3D space, or both. Camera calibration using structured calibration targets relies on special patterns which contain features that are used to localize control points with sub-pixel accuracy. The most frequently used patterns are checkerboards and circle grids, and in well constrained environments, these patterns are known to provide accurate feature correspondences for accurate camera calibration results. One challenging case for camera calibration is in the instance of calibrating a long focal length camera. In this case, the focal plane can be too far away in distance, and the only practical solution is to capture images of the calibration pattern out-of-focus while it is closer to the camera. Due to the radial distribution of out-of-focus blur, and biases created by a lack of distance preservation, as well as changes in spatial blur with perspective, checkerboard patterns have been proven to lose accuracy when they are captured in out-of-focus images, and with increased blur, can fail to provide feature correspondences all together. To address this, phase-shift circular gradient (PCG) patterns had been proposed as a method to encode control point positions into phase distributions, rather than through pixel intensities. Our work aims to validate previous authors claims of out-of-focus blur invariance and accuracy when using PCG patterns. Using PCG, small circle, and concentric circle grid patterns, we made comparisons using their respective retrieved pixel value focal lengths, and in-focus vs. out-of-focus percentage differences. Initial comparisons showed that PCGs were largely invariant to blur. However, their accuracy was marginally worse than comparable small circles when real-world noise was introduced. In this real case, a 7-DOF robot arm was used for repeatable calibration target positioning. The recorded set of poses was also used to mirror conditions in a further synthetic experiment. From this work, PCGs initially showed mixed results, but when extended beyond real-world conditions, PCGs were the only pattern that worked under the most severe levels of out-of-focus blur. This validated their improved detectability under extreme blur, and theoretical effectiveness for use with long focal length cameras. From these results, this study acknowledges the trade-offs in calibration pattern selection for respective use cases. It also highlights the importance of ellipse fitting techniques, as well as acknowledging the role of other learned methods. Finally, this study outlines the benefits that were observed when using robotic target positioning, and our synthetic validation pipeline for experimentation with calibration patterns under various conditions.Item An Investigation into Automatic Photometric Calibration(University of Waterloo, 2024-09-20) Feng, Chun-ChengPhotometric calibration is a critical process that ensures uniformity in brightness across images captured by a camera. It entails the identification of a function that converts the scene radiance into the pixel values in an image. The goal of the process is to estimate the three photometric parameters - camera response function, vignette, and exposure. A significant challenge in this field is the heavy reliance on ground truth information in current photometric calibration methods, which is often unavailable in general scenarios. To address this, we investigate our proposed simple method, New Photometric Calibration (NPC), which eliminates the need for ground truth data. Firstly, we integrated our photometric calibration algorithm with long-term pixelwise trackers, MFT, enhancing the system’s robustness and reliability. Since the MFT effectively handles occlusion and reduces drifting, it results in a more stable trajectory. By incorporating MFT to track feature points across frames and using the trajectory as corresponding points, we can utilize the pixel intensity of corresponding points to forgo the need for exposure ground truth during initialization. Subsequently, we independently optimize the photometric parameters to sidestep the exponential ambiguity problem. Our experiments demonstrate that our method achieves results comparable to those utilizing ground truth information, as evidenced by comparable root mean square errors (RMSE) of the three photometric parameters. In scenarios without ground truth data, NPC outperforms existing methods. This indicates that our approach maintains the accuracy of photometric calibration and can be applied to arbitrary videos where ground truth information is not provided. In conclusion, our research represents a significant advancement in the field of photometric calibration. We investigate a novel and effective method that requires no ground truth information during the photometric calibration process. Our approach incorporates the use of a robust tracker, enhancing the trajectories of feature points, thereby improving the overall performance of our method. Furthermore, our model not only bypasses the exponential ambiguity problem inherent in the optimization process but also addresses the challenges associated with the traditional reliance on ground truth information, outperforming previous photometric calibration methods when the input lacks ground truth data.Item The importance of incidence angle for GLCM texture features and ancillary data sources for automatic sea ice mapping.(University of Waterloo, 2024-09-19) Pena Cantu, Fernando JoseSea ice is a critical component of Earth’s polar regions. Monitoring it is vital for navigation and construction in the Arctic and crucial to understand and mitigate the impacts of climate change. Synthetic aperture radar (SAR) imagery, particularly dual polarized SAR, is commonly used for this purpose due to its ability to penetrate clouds and provide data in nearly all weather conditions. However, relying solely on HH and HV polarizations for automated sea ice mapping models has limitations, as different ice types and conditions may yield similar backscatter signatures. To enhance the accuracy of these classification models, researchers have explored the integration of additional features, including hand-crafted texture features, learned features, and supplementary data sources. This thesis makes two main contributions to the field of automated sea ice mapping. The first contribution investigates the dependence of incidence angle (IA) on gray level co-occurrence matrix texture features (GLCM) and its impact on sea ice classification. The methodology involved extracting GLCM features from SAR images in dB units and analyzing their dependence on IA using linear regression and class separability metrics. In addition, a Bayesian classifier was trained to compare the classification performance with and without incorporating the IA dependence. The results indicated that the IA effect had a minor impact on classification performance (≈ 1%), with linear regression results indicating that the IA dependence accounts for approximately less 10% of the variance in most cases. The second contribution evaluates the importance of various data inputs for automated sea ice mapping using the AI4Arctic dataset. A U-Net based model was trained with SAR imagery, passive microwave data from AMSR2, weather data from ERA5, and ancillary data. Ablation studies and the addition of individual data inputs were conducted to assess their impact on model performance. The results demonstrated that including AMSR2, time, and location data significantly increased model performance, especially for the classification accuracy of major ice types in stage of development (SOD). ERA5 data had mixed effects, as it was found not to increase performance when AMSR2 was already included. These findings are critical for the development of more accurate and efficient automated sea ice mapping systems. The minimal impact of IA dependence on GLCM features suggests that accounting for IA may not be necessary, simplifying the feature extraction process. Identifying the most valuable data inputs allows for the optimization of model performance, ensuring better resource allocation and enhanced operational capabilities in sea ice monitoring. This research provides a foundation for future studies and developments in automated sea ice mapping, contributing to more effective climate monitoring and maritime navigation safety.Item Automatic Whale Detection using Deep learning(University of Waterloo, 2024-09-17) Patel, MuhammedAccurate monitoring of whale populations is essential for conservation efforts, yet traditional surveying methods are often time-consuming, expensive, and limited in coverage. This thesis investigates the automation of whale detection using state-of-the-art (SOTA) deep learning techniques applied to high-resolution aerial imagery. By leveraging advancements in computer vision, specifically object detection models, this research aims to develop a robust and efficient system for identifying and counting whales from aerial surveys. The study formulates whale detection as a small object detection problem and evaluates the performance of various SOTA models, including Faster R-CNN, YOLOv8, and Deformable DETR, paired with modern backbone architectures such as ConvNext-T, Swin-T, and ResNet-50. The influence of input image size and context on model performance is systematically explored by testing patch sizes ranging from 256 to 4096 pixels, marking this study as the first to investigate the efficacy of such large patch sizes in the remote sensing domain. Results indicate that the Faster R-CNN model with a ConvNext-T backbone achieves the highest detection accuracy, with an average precision of 0.878 at an IoU threshold of 0.1, particularly when trained on larger patch sizes. The study also addresses the challenge of domain adaptation by implementing an active learning framework, designed to enhance model performance on new survey data with varying environmental conditions. A novel portfolio-based acquisition function, leveraging the social behavior of whales, is introduced to optimize the annotation process. This research significantly contributes to the field of automated whale monitoring, offering a scalable and adaptable solution that reduces annotation costs and improves the accuracy of population estimates. The developed system holds promise for enhancing conservation strategies and providing valuable insights into whale movements and behaviors.Item Improving Neural Radiance Fields for More Efficient, Tailored, View-Synthesis(University of Waterloo, 2024-09-17) Nair, Saeejith MuralidharanNeural radiance fields (NeRFs) have revolutionized novel view synthesis, enabling high-quality 3D scene reconstruction from sparse 2D images. However, their computational intensity often hinders real-time applications and deployment on resource-constrained devices. Traditional NeRF models can require days of training for a single scene and demand significant computational resources for rendering, with some implementations necessitating over 150 million network evaluations per rendered image. While various approaches have been proposed to improve NeRF efficiency, they often employ fixed network architectures that may not be optimal for all scenes. This research introduces NAS-NeRF, an new approach that employs generative neural architecture search (NAS) to discover compact, scene-specialized NeRF architectures. NAS, a technique for automatically designing neural network architectures, is investigated as a potential method for optimizing NeRFs by tailoring network architectures to the specific complexities of individual scenes. NAS-NeRF reformulates the NeRF architecture into configurable field cells, enabling efficient exploration of the architecture space while maintaining compatibility with various NeRF variants. Our method incorporates a scene-specific optimization strategy that considers the unique characteristics of each 3D environment to guide architecture search. We also introduce a quality-constrained generation approach that allows for the specification of target performance metrics within the search process. Experiments on the Blender synthetic dataset demonstrate the effectiveness of NAS-NeRF in generating a family of architectures tailored to different efficiency-quality trade-offs. Our most efficient models (NAS-NeRF XXS) achieve up to 23× reduction in parameters and 22× fewer FLOPs compared to baseline NeRF, with only a 5.3% average drop in structural similarity (SSIM). Meanwhile, our high-quality models (NAS-NeRF S) match or exceed baseline performance while reducing parameters by 2-4× and offering up to 1.93× faster inference. These results suggest that high-quality novel view synthesis can be achieved with more compact models, particularly when architectures are tailored to specific scenes. NAS-NeRF contributes to the ongoing research into efficient 3D scene representation methods, helping enable applications in resource-constrained environments and real-time scenarios.Item Microarray Image Denoising Leveraging Autoencoders and Attention-Based Architectures with Synthetic Training Data(University of Waterloo, 2024-09-16) Czarnecki, ChrisMicroarray technology has for many years remained a golden standard in transcriptomics. However, preparation of physical slides in wet labs involves procedures which tend to introduce occasional dirt and noise into the slide. Having to repeat experiments due to environmental noise present in the scanned images leads to increased reagent and labor costs. Motivated by the high costs of repeated wet lab procedures we explore denoising methods in the narrow subfield of microarray image analysis. We propose SADGE, a domain-relevant metric to quantify the denoising power of methods considered. We introduce a synthetic data generation protocol which permits the creation of very large microarray image datasets programmatically and provides noise-free ground truth useful for objective quantification of denoising. We also train several deep learning architectures for the denoising task, with several of them beating the current state-of-the-art method on both PSNR and SADGE metrics. We propose a new training modality leveraging EATME module to condition the image reconstruction on ground-truth expression values and we introduce an additional loss term (DEL) which further enhances the denoising capabilities of the model while ensuring minimal information loss. Collectively, innovations outlined in our work constitute a significant contribution to the field of microarray image denoising, influencing the cost-effectiveness of microarray experiments and thus impacting a wide range of clinical and biotechnological applications.Item Financialization of the Housing Market: A Contribution to Modern Urban Rent Theory(University of Waterloo, 2024-09-16) Wright, KirstenA great deal of wealth is produced through the economic activity of cities. There is a gap, however, in the formal apparatus in standard economic theory for analyzing the distribution of this enormous value created in cities. In the context of a widely-felt housing crisis, we explore how the capture of urban value by financial actors through the financialization of the housing market affects ownership patterns in urban areas, and the ultimate implications of these processes for urban productivity. We hypothesize that financialization induces a shift towards tenancy among the urban workforce that is likely to result in decreased urban productivity through a range of channels. To examine this hypothesis, we construct an agent-based model with a land market and production sector in which productivity scales superlinearly with city population. This work brings together urban agglomeration effects, Ricardian rent theory and a spatially explicit land market model in a novel way. In our model, transportation costs determine the size of the city, and the available locational rents. Rising productivity increases wages and urban land values, so the value of increased productivity is transferred to land owners. Investors attempt to capture these productive gains by purchasing land. These financial actors can bid against residents to purchase urban land. The interaction of agents determines the distribution of property ownership, city size, and wages. City size and wages provide a measure of urban productivity. The evolving pattern of property ownership tells us how residents are distributed between the tenant class and the owner class. We then explore a range of channels through which financialization might result in decreased urban productivity. When we add this link in the model, we see that financialization not only transforms the class structure of the city and the distribution of urban wealth, it disrupts the relationship between population growth and productivity, reducing the wealth and resilience of the urban system. To illustrate the uses of this kind of computational model for economic policy analysis, we run six policy experiments with and without the productivity link. Contributions of this work include: integrating classical rent theory into an agent-based urban model; linking urban rent dynamics with urban productivity, and population growth; incorporating urban scaling literature into the model framework; examining the impacts of financialization on wealth distribution and urban productivity; creating a framework for a broader understanding of public policies in an urban system; and examining the qualitative effects of various public policies on wealth distribution, productivity, and class.Item Assessment of Acoustic Markers of Conversational Difficulty(University of Waterloo, 2024-09-06) Ellag, MenatallaHuman conversations, one of the most complex behaviors, require the real-time coordination of speech production and comprehension, involving cognitive, social, and biological dimensions. There has been a rising need for laboratory and clinical assessments to evolve to capture the essence of everyday interactions. The cognitive demands of interactive conversation, which require listeners to process and store information while simultaneously planning their responses, often exceed those encountered in standard clinical tests. These assessments must encompass diverse contexts and participant groups, including varying hearing statuses, challenging listening environments such as background noise, the use of assistive devices that may alter the listening experience, and different conversation types such as relational versus transactional exchanges, dyadic versus group interactions, and face-to-face versus remote interactions. This study consists of two investigations exploring how different conditions affect acoustic measures of speech production and conversational behavior. The first study was an extension of a study originally conducted for content analysis and participants’ subjective rating questionnaires, focusing on hearing-impaired (HI) individuals. It examined the impact of face masks and remote microphones on communication dynamics. Four native English-speaking HI participants engaged in free-form conversations within small groups under a constant background noise of 55 dBA. Interestingly, the results showed that using remote microphones shortened floor-transfer offsets (FTOs) and extended conversation durations, suggesting improved communication. When participants did not wear a face mask, interpausal unit (IPU) durations were shorter with remote microphones than without, indicating easier communication. However, no significant difference was found between the two mask conditions, suggesting that face masks affect both speech perception and production by decreasing inhalation and exhalation volumes, thereby limiting the duration of utterances. Face masks are speculated to increase resistance to airflow, reducing subglottal pressure and consequently lowering fundamental frequency (F0). Despite no significant differences in articulation rate and floor transfer rate, the constant noise environment, presented at lower levels compared to previous studies, may have likely limited the potential for pronounced effects. The second study involved normal-hearing (NH) individuals, investigating the effects of conversation type (free-form vs. task-based) and noise presence (70 dB SPL) on conversational dynamics. Dyadic interactions among NH participants were examined. Task-based conversations exhibited structured patterns with longer FTOs and higher floor transfer rates, while free-form conversations showed greater FTO variability, more frequent overlaps, longer IPUs, and increased pause durations and rates. Noise presence increased IPU durations and pause lengths but did not significantly alter floor-transfer rates or FTO variability. Both conversation types experienced increased articulation rates and speech levels in noise. Contrary to the expected change as part of the Lombard effect, the increase in articulation rates may be attributed to the noise acting as a stressor. Meanwhile, the increase in mean speech levels was less pronounced than expected, possibly due to the specific noise characteristics and the use of closed headphones. These studies shine a light on the complexity of communicative interactions and the necessity of accounting for a wide spectrum of factors in experimental designs. The findings highlight the importance of considering both environmental conditions and conversation types when researching speech perception, production, and conversational dynamics. This research provides valuable insights for academic studies and the development of hearing-assistive technologies, emphasizing the need for assessments that reflect the varied nature of everyday communication.Item Addressing Data Scarcity in Domain Generalization for Computer Vision Applications in Image Classification(University of Waterloo, 2024-08-30) Kaai, KimathiDomain generalization (DG) for image classification is a crucial task in machine learning that focuses on transferring domain-invariant knowledge from multiple source domains to an unseen target domain. Traditional DG methods assume that classes of interest are present across multiple domains (domain-shared), which helps mitigate spurious correlations between domain and class. However, in real-world scenarios, data scarcity often leads to classes being present in only a single domain (domain-linked), resulting in poor generalization performance. This thesis introduces the domain-linked DG task and proposes a novel methodology to address this challenge. This thesis proposes FOND, a "Fairness-inspired cONtrastive learning objective for Domain-linked domain generalization," which leverages domain-shared classes to learn domain-invariant representations for domain-linked classes. FOND is designed to enhance generalization by minimizing the impact of task-irrelevant domain-specific features. The theoretical analysis in this thesis extends existing domain adaptation error bounds to the domain-linked DG task, providing insights into the factors that influence generalization performance. Key theoretical findings include the understanding that domain-shared classes typically have more samples and learn domain-invariant features more effectively than domain-linked classes. This analysis informs the design of FOND, ensuring that it addresses the unique challenges of domain-linked DG. Furthermore, experiments are performed across multiple datasets and experimental settings to evaluate the effectiveness of various current methodologies. The proposed method achieves state-of-the-art performance in domain-linked DG tasks, with minimal trade-offs in the performance of domain-shared classes. Experimental results highlight the impact of shared-class settings, total class size, and inter-domain variations on the generalizability of domain-linked classes. Visualizations of learned representations further illustrate the robustness of FOND in capturing domain-invariant features. In summary, this thesis advocates future DG research for domain-linked classes by (1) theoretically and experimentally analyzing the factors impacting domain-linked class representation learning, (2) demonstrating the ineffectiveness of current state-of-the-art DG approaches, and (3) proposing an algorithm to learn generalizable representations for domain-linked classes by transferring useful representations from domain-shared ones.Item The Effects of Stimulus Statistics on Representational Similarity in a Model of Mouse Visual Cortex(University of Waterloo, 2024-08-30) Torabian, ParsaDeep convolutional neural networks have emerged as convincing models of the visual cortex, demonstrating remarkable ability to predict neural activity. However, the specific combination of factors that optimally align these models with biological vision remains an open question. Network architecture, training objectives, and the statistics of training data all likely play a role, but their relative contributions and interactions are not fully understood. In this study, we focus on the role of training data in shaping the representations learned by deep networks. We investigate how the degree of 'realism' in the training data affects the similarity between network activations and neural recordings from mouse visual cortex. We hypothesised that training on more naturalistic stimuli would lead to greater brain-model similarity, as the visual system has evolved to process the statistics of the natural world. We leveraged the Unity video-game engine to generate custom training datasets with the ability to control for three distinct factors: the realism of the virtual environment, the motion statistics of the simulated agent, and the optics of the modelled eye. Deep networks were trained on datasets generated from all eight permutations of these three experiment variables using a self-supervised learning approach. The trained models were subsequently compared to mouse neural data from the Allen Institute using representational similarity analysis. Our results reveal that the realism of the virtual environment has a substantial and consistent effect on brain-model similarity. Networks trained on the more realistic meadow-environment exhibited significantly higher similarity to mouse visual cortex across multiple areas. In contrast, the effects of motion statistics and visual optics were more subtle and area-specific. Furthermore, all possible interactions between these three factors were statistically significant, suggesting complex nonlinear relationships.Item Towards an Optical Biopsy Tool Using Photon Absorption Remote Sensing(University of Waterloo, 2024-08-28) Veugen, JennaStreamlining diagnosis is more important than ever, as the long wait times, resource constraints, and diagnostic inaccuracies place burdens on the healthcare system that climb each year. The development of a tool capable of instantaneous in situ diagnosis would eliminate the excess time and resources used in current diagnostic procedures, and thereby relieve some of these burdens. This could be achieved with an optical biopsy by leveraging light-matter interactions for advanced microscopy in an endoscopic form. However, to date there is no technology able to provide diagnostically equivalent image quality to the gold standard for diagnosis in an endoscopic form. Photon Absorption Remote Sensing (PARS) is a novel imaging modality that utilizes optical absorption contrast to achieve label-free, non-contact microscopy. PARS technology holds promising potential in resolving many of the challenges faced in the development of an optical biopsy tool. This thesis explores the initial development of a PARS endoscope capable of in vivo microvascular imaging through multiple phases of development. The first stage investigated the performance of a dual green PARS bench-top system, utilizing green excitation and detection wavelengths to address chromatic aberrations in the final endoscopic form. The system was confined to a green excitation wavelength in order to target the absorption of hemoglobin for vascular imaging. It was then paired with a green detection wavelength for the first time, unlike typical PARS microscopes that rely on near-infrared (NIR) wavelengths for detection. Both phantom and in vivo samples were imaged to validate the performance of the system, showing functionality and sensitivity comparable to NIR PARS systems. The next phase explored the transition of a stationary PARS bench-top system to a free imaging head using optical fiber. This introduced many challenges, such as high losses and inherent noise, that had to be addressed through careful design, assembly and optimization. Two types of specialized optical fiber were tested by imaging phantom targets and in vivo chicken embryo samples. The double clad fiber setup showed strong performance with excellent contrast, signal to noise ratio and sensitivity in the PARS images. The final stage included miniaturizing the imaging head to achieve an endoscopic form factor. Various miniature objective lens designs were developed, and tested in the system. The successful design was capable of imaging both in phantoms and in vivo, demonstrating, for the first time, vasculature imaged using PARS through optical fiber. This research lays the groundwork in the development of a PARS endoscope capable of providing a gold standard quality, instantaneous diagnosis in situ. It demonstrates a successful design capable of capturing relevant biomarkers in vivo using endoscopic PARS technology. The improved understanding of the design requirements for a more efficient system, and insight into the fundamental limitations, highlight future directions to further improve this device. This puts us one step closer towards achieving a successful optical biopsy tool that could streamline diagnosis, improve the outcome, safety and experience of the patient, and significantly reduce the cost burden on the health system.Item Language Guided Out-of-Bounding Box Pose Estimation for Robust Ice Hockey Analysis(University of Waterloo, 2024-08-27) Balaji, BaveshAccurate estimation of human pose and the pose of interacting objects, such as hockey sticks, is fundamental in vision-driven hockey analytics and crucial for tasks like action recognition and player assessment. Estimating 2D keypoints from monocular video is challenging, particularly in fast-paced sports such as ice hockey, where motion blur, occlusions, bulky equipment, color similarities, and constant camera panning complicate accurate pose prediction. This thesis addresses these challenges with contributions on three fronts. First, recognizing the lack of an existing benchmark, we present a comparative study of four state-of-the-art human pose estimation approaches using a real-world ice hockey dataset. This analysis aims to understand the impact of each model on ice hockey pose estimation and investigate their respective advantages and disadvantages. Building on insights from this comparative study, we develop an ensemble model for jointly predicting player and stick poses. The ensemble comprises two networks: one trained from scratch to predict all keypoints, and another utilizing a unique transfer learning paradigm to incorporate knowledge from large-scale human pose datasets. Despite achieving promising results, we observe that these top-down approaches yield suboptimal outcomes due to constraints such as requiring all keypoints to be within a bounding box and accommodating only one player per bounding box. To overcome these issues, we introduce an image and text based multi-modal solution called TokenCLIPose, which predicts stick keypoints without encapsulating them within a bounding box. By focusing on capturing only the player in a bounding box and treating their stick as missing, our model predicts out-of-bounding box keypoints. To incorporate the context of the missing keypoints, we use keypoint-specific text prompts to leverage the rich semantic representations provided by language. This dissertation’s findings advance the state-of-the-art in 2D pose estimation for ice hockey, outperforming existing methods by 2.6% on our dataset, and provide a robust foundation for further developments in vision-driven sports analytics.Item Scaling Laws for Compute Optimal Biosignal Transformers(University of Waterloo, 2024-08-20) Fortin, ThomasScaling laws which predict the optimal balance between number of model parameters and number of training tokens given a fixed compute budget have recently been developed for language transformers. These allow model developers to allocate their compute budgets such that they can achieve optimal performance. This thesis develops such scaling laws for the Biosignal Transformer trained separately on both accelerometer data and EEG data. This is done by applying methods used by other researchers to develop similar scaling laws for language transformer models. These are referred to as the iso-FLOP curve method and the parametric loss function method. The Biosignal Transformer model is a transformer model which is designed specifically to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as input. For example, the Biosignal Transformer can be trained to detect or classify seizures from EEG signals. The Biosignal Transformer is also of particular interest because it is designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve performance on downstream tasks with smaller labelled fine-tuning datasets. This work develops scaling laws which optimize for the best unsupervised pre-training loss given a fixed compute budget. Results show that the developed scaling laws are successful at predicting a balance between number of parameters and number of training tokens for compute budgets five times larger than those used to develop them such that pre-training loss is minimized. Researchers who intend to scale up the Biosignal Transformer should use these scaling laws to attain optimal pre-training loss from their given compute budgets when applying unsupervised pre-training with the Biosignal Transformer.Item Talker Sensitivity to Turn-Taking in Conversation(University of Waterloo, 2024-08-19) Masters, BenjaminTurn-taking in conversation is a complex phenomenon that requires talkers to, at a minimum, simultaneously plan and produce their own speech and listen to and comprehend the speech of their partner(s). Given this necessary division of attention, the increase in listening difficulty introduced by hearing impairments can have confounding effects on a person's ability to communicate, and evaluating listening effort during communication remains difficult. One of the most detrimental effects of hearing loss is the impact it has on one's ability to communicate effectively though, thus the assessment of listening effort in natural environments is especially important. This thesis takes two approaches to evaluating listening effort in conversation. The first analyzes the response of the pupil at the temporal scale of turn-taking to understand how effort and attention are allocated between speaking, listening, and other task demands. Pupillary temporal response functions to turn-taking are derived and analyzed for systematic differences that exist across people and acoustic environmental conditions, and are further analyzed to determine differences in pupil response based on expected difficulty of a conversation. The second approach analyzes behavioral changes related to the timing of turn-taking to understand how talkers identify that communication difficulty is being experienced by a conversational partner. The floor transfer offset (FTO), defined as the time it takes one talker to begin their turn after another has ended theirs, was manipulated during interactive conversations to mimic the observed increase in magnitude and variability of FTOs in difficult listening environments. To enable this, an audio processing framework was developed to track the state of a conversation in near real-time and manipulate the perceived response time of talkers. The findings suggest that the timing of turn-taking is not used a cue by talkers to infer difficulty.Item Autonomous Robotic System Conducting Nasopharyngeal Swabbing(University of Waterloo, 2024-08-15) Lee, Peter Qiu JiunThe nasopharyngeal swab test is a procedure where a healthcare worker inserts a swab through the nose until it reaches the nasopharynx located at the back of the nasal cavity in order to collect secretions that can later be examined for illnesses. This procedure saw heightened use to detect cases during the COVID-19 pandemic. Its ubiquity also highlighted fragilities in the healthcare system by way of the hazards to healthcare workers from infectious patients and the pressures a pandemic can inflict upon an unready healthcare system. In this thesis we consider and propose an autonomous robotic system for performing nasopharyngeal swab tests by use of a collaborative robotic manipulator arm, under the ideology that the hardware and techniques could eventually be applied to other types of close-contact tasks to support the healthcare system. We also assume that prospective patients would be standing unrestrained in front of the arm, which adds the challenges of adjusting to arbitrary poses of the head and compensating for natural head motion. We first designed an instrumented end-effector to attach to a robotic arm to enable suitable vision and force sensing capabilities for the task. Next, we developed a finite element modeling simulation environment to describe the deformation of the swab as it moves through the nasal cavity, and solve an optimization problem to find ideal paths through the nasal cavity. A visual servo system was designed to properly align the swab next to the nose using visual information using advances in deep learning and state-estimation, which we validated with a number of human trials. A torque controlled force compliant system was designed and evaluated to determine the feasibility of using force measurements to correct for misalignment when the swab is inserted into a nasal cavity phantom. Finally, we integrated all the system components into a cohesive system for performing nasopharyngeal swab tests. We created a simulator using a nasal cavity phantom and a second robot arm to mimic natural motions of the head. This simulator was leveraged to perform extensive experimentation that found promising controller configurations that were able to compensate for head motion.Item Topic Segmentation of Recorded Meetings(University of Waterloo, 2024-08-13) Lazoja, IlirVideo chapters allow videos to be more easily digestible and can be an important pre-processing step for other video-processing tasks. In many cases, the creator can easily chapter their own videos, especially for well-edited structured videos. However, some types of videos, such as recorded meetings, are more loosely structured with less obvious breaks which makes them more cumbersome to chapter and thus would highly benefit from being automated. One approach to chaptering these types of videos is through performing topic segmentation on the transcript of the video, especially if the video is rich in dialogue. Topic segmentation is the task of dividing text based on when the topic of the text changes, most commonly performed on large bodies of written text. This thesis will detail how well state-of-the-art approaches for topic segmentation performs on recorded meetings, as well as present and evaluate strategies to improve performance for recorded meetings and express shortcomings of the common metrics used for topic segmentation.