Systems Design Engineering

Permanent URI for this collection

This is the collection for the University of Waterloo's Department of Systems Design Engineering.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Recent Submissions

Now showing 1 - 20 of 751
  • Item
    The importance of incidence angle for GLCM texture features and ancillary data sources for automatic sea ice mapping.
    (University of Waterloo, 2024-09-19) Pena Cantu, Fernando Jose
    Sea ice is a critical component of Earth’s polar regions. Monitoring it is vital for navigation and construction in the Arctic and crucial to understand and mitigate the impacts of climate change. Synthetic aperture radar (SAR) imagery, particularly dual polarized SAR, is commonly used for this purpose due to its ability to penetrate clouds and provide data in nearly all weather conditions. However, relying solely on HH and HV polarizations for automated sea ice mapping models has limitations, as different ice types and conditions may yield similar backscatter signatures. To enhance the accuracy of these classification models, researchers have explored the integration of additional features, including hand-crafted texture features, learned features, and supplementary data sources. This thesis makes two main contributions to the field of automated sea ice mapping. The first contribution investigates the dependence of incidence angle (IA) on gray level co-occurrence matrix texture features (GLCM) and its impact on sea ice classification. The methodology involved extracting GLCM features from SAR images in dB units and analyzing their dependence on IA using linear regression and class separability metrics. In addition, a Bayesian classifier was trained to compare the classification performance with and without incorporating the IA dependence. The results indicated that the IA effect had a minor impact on classification performance (≈ 1%), with linear regression results indicating that the IA dependence accounts for approximately less 10% of the variance in most cases. The second contribution evaluates the importance of various data inputs for automated sea ice mapping using the AI4Arctic dataset. A U-Net based model was trained with SAR imagery, passive microwave data from AMSR2, weather data from ERA5, and ancillary data. Ablation studies and the addition of individual data inputs were conducted to assess their impact on model performance. The results demonstrated that including AMSR2, time, and location data significantly increased model performance, especially for the classification accuracy of major ice types in stage of development (SOD). ERA5 data had mixed effects, as it was found not to increase performance when AMSR2 was already included. These findings are critical for the development of more accurate and efficient automated sea ice mapping systems. The minimal impact of IA dependence on GLCM features suggests that accounting for IA may not be necessary, simplifying the feature extraction process. Identifying the most valuable data inputs allows for the optimization of model performance, ensuring better resource allocation and enhanced operational capabilities in sea ice monitoring. This research provides a foundation for future studies and developments in automated sea ice mapping, contributing to more effective climate monitoring and maritime navigation safety.
  • Item
    Automatic Whale Detection using Deep learning
    (University of Waterloo, 2024-09-17) Patel, Muhammed
    Accurate monitoring of whale populations is essential for conservation efforts, yet traditional surveying methods are often time-consuming, expensive, and limited in coverage. This thesis investigates the automation of whale detection using state-of-the-art (SOTA) deep learning techniques applied to high-resolution aerial imagery. By leveraging advancements in computer vision, specifically object detection models, this research aims to develop a robust and efficient system for identifying and counting whales from aerial surveys. The study formulates whale detection as a small object detection problem and evaluates the performance of various SOTA models, including Faster R-CNN, YOLOv8, and Deformable DETR, paired with modern backbone architectures such as ConvNext-T, Swin-T, and ResNet-50. The influence of input image size and context on model performance is systematically explored by testing patch sizes ranging from 256 to 4096 pixels, marking this study as the first to investigate the efficacy of such large patch sizes in the remote sensing domain. Results indicate that the Faster R-CNN model with a ConvNext-T backbone achieves the highest detection accuracy, with an average precision of 0.878 at an IoU threshold of 0.1, particularly when trained on larger patch sizes. The study also addresses the challenge of domain adaptation by implementing an active learning framework, designed to enhance model performance on new survey data with varying environmental conditions. A novel portfolio-based acquisition function, leveraging the social behavior of whales, is introduced to optimize the annotation process. This research significantly contributes to the field of automated whale monitoring, offering a scalable and adaptable solution that reduces annotation costs and improves the accuracy of population estimates. The developed system holds promise for enhancing conservation strategies and providing valuable insights into whale movements and behaviors.
  • Item
    Improving Neural Radiance Fields for More Efficient, Tailored, View-Synthesis
    (University of Waterloo, 2024-09-17) Nair, Saeejith Muralidharan
    Neural radiance fields (NeRFs) have revolutionized novel view synthesis, enabling high-quality 3D scene reconstruction from sparse 2D images. However, their computational intensity often hinders real-time applications and deployment on resource-constrained devices. Traditional NeRF models can require days of training for a single scene and demand significant computational resources for rendering, with some implementations necessitating over 150 million network evaluations per rendered image. While various approaches have been proposed to improve NeRF efficiency, they often employ fixed network architectures that may not be optimal for all scenes. This research introduces NAS-NeRF, an new approach that employs generative neural architecture search (NAS) to discover compact, scene-specialized NeRF architectures. NAS, a technique for automatically designing neural network architectures, is investigated as a potential method for optimizing NeRFs by tailoring network architectures to the specific complexities of individual scenes. NAS-NeRF reformulates the NeRF architecture into configurable field cells, enabling efficient exploration of the architecture space while maintaining compatibility with various NeRF variants. Our method incorporates a scene-specific optimization strategy that considers the unique characteristics of each 3D environment to guide architecture search. We also introduce a quality-constrained generation approach that allows for the specification of target performance metrics within the search process. Experiments on the Blender synthetic dataset demonstrate the effectiveness of NAS-NeRF in generating a family of architectures tailored to different efficiency-quality trade-offs. Our most efficient models (NAS-NeRF XXS) achieve up to 23× reduction in parameters and 22× fewer FLOPs compared to baseline NeRF, with only a 5.3% average drop in structural similarity (SSIM). Meanwhile, our high-quality models (NAS-NeRF S) match or exceed baseline performance while reducing parameters by 2-4× and offering up to 1.93× faster inference. These results suggest that high-quality novel view synthesis can be achieved with more compact models, particularly when architectures are tailored to specific scenes. NAS-NeRF contributes to the ongoing research into efficient 3D scene representation methods, helping enable applications in resource-constrained environments and real-time scenarios.
  • Item
    Microarray Image Denoising Leveraging Autoencoders and Attention-Based Architectures with Synthetic Training Data
    (University of Waterloo, 2024-09-16) Czarnecki, Chris
    Microarray technology has for many years remained a golden standard in transcriptomics. However, preparation of physical slides in wet labs involves procedures which tend to introduce occasional dirt and noise into the slide. Having to repeat experiments due to environmental noise present in the scanned images leads to increased reagent and labor costs. Motivated by the high costs of repeated wet lab procedures we explore denoising methods in the narrow subfield of microarray image analysis. We propose SADGE, a domain-relevant metric to quantify the denoising power of methods considered. We introduce a synthetic data generation protocol which permits the creation of very large microarray image datasets programmatically and provides noise-free ground truth useful for objective quantification of denoising. We also train several deep learning architectures for the denoising task, with several of them beating the current state-of-the-art method on both PSNR and SADGE metrics. We propose a new training modality leveraging EATME module to condition the image reconstruction on ground-truth expression values and we introduce an additional loss term (DEL) which further enhances the denoising capabilities of the model while ensuring minimal information loss. Collectively, innovations outlined in our work constitute a significant contribution to the field of microarray image denoising, influencing the cost-effectiveness of microarray experiments and thus impacting a wide range of clinical and biotechnological applications.
  • Item
    Financialization of the Housing Market: A Contribution to Modern Urban Rent Theory
    (University of Waterloo, 2024-09-16) Wright, Kirsten
    A great deal of wealth is produced through the economic activity of cities. There is a gap, however, in the formal apparatus in standard economic theory for analyzing the distribution of this enormous value created in cities. In the context of a widely-felt housing crisis, we explore how the capture of urban value by financial actors through the financialization of the housing market affects ownership patterns in urban areas, and the ultimate implications of these processes for urban productivity. We hypothesize that financialization induces a shift towards tenancy among the urban workforce that is likely to result in decreased urban productivity through a range of channels. To examine this hypothesis, we construct an agent-based model with a land market and production sector in which productivity scales superlinearly with city population. This work brings together urban agglomeration effects, Ricardian rent theory and a spatially explicit land market model in a novel way. In our model, transportation costs determine the size of the city, and the available locational rents. Rising productivity increases wages and urban land values, so the value of increased productivity is transferred to land owners. Investors attempt to capture these productive gains by purchasing land. These financial actors can bid against residents to purchase urban land. The interaction of agents determines the distribution of property ownership, city size, and wages. City size and wages provide a measure of urban productivity. The evolving pattern of property ownership tells us how residents are distributed between the tenant class and the owner class. We then explore a range of channels through which financialization might result in decreased urban productivity. When we add this link in the model, we see that financialization not only transforms the class structure of the city and the distribution of urban wealth, it disrupts the relationship between population growth and productivity, reducing the wealth and resilience of the urban system. To illustrate the uses of this kind of computational model for economic policy analysis, we run six policy experiments with and without the productivity link. Contributions of this work include: integrating classical rent theory into an agent-based urban model; linking urban rent dynamics with urban productivity, and population growth; incorporating urban scaling literature into the model framework; examining the impacts of financialization on wealth distribution and urban productivity; creating a framework for a broader understanding of public policies in an urban system; and examining the qualitative effects of various public policies on wealth distribution, productivity, and class.
  • Item
    Assessment of Acoustic Markers of Conversational Difficulty
    (University of Waterloo, 2024-09-06) Ellag, Menatalla
    Human conversations, one of the most complex behaviors, require the real-time coordination of speech production and comprehension, involving cognitive, social, and biological dimensions. There has been a rising need for laboratory and clinical assessments to evolve to capture the essence of everyday interactions. The cognitive demands of interactive conversation, which require listeners to process and store information while simultaneously planning their responses, often exceed those encountered in standard clinical tests. These assessments must encompass diverse contexts and participant groups, including varying hearing statuses, challenging listening environments such as background noise, the use of assistive devices that may alter the listening experience, and different conversation types such as relational versus transactional exchanges, dyadic versus group interactions, and face-to-face versus remote interactions. This study consists of two investigations exploring how different conditions affect acoustic measures of speech production and conversational behavior. The first study was an extension of a study originally conducted for content analysis and participants’ subjective rating questionnaires, focusing on hearing-impaired (HI) individuals. It examined the impact of face masks and remote microphones on communication dynamics. Four native English-speaking HI participants engaged in free-form conversations within small groups under a constant background noise of 55 dBA. Interestingly, the results showed that using remote microphones shortened floor-transfer offsets (FTOs) and extended conversation durations, suggesting improved communication. When participants did not wear a face mask, interpausal unit (IPU) durations were shorter with remote microphones than without, indicating easier communication. However, no significant difference was found between the two mask conditions, suggesting that face masks affect both speech perception and production by decreasing inhalation and exhalation volumes, thereby limiting the duration of utterances. Face masks are speculated to increase resistance to airflow, reducing subglottal pressure and consequently lowering fundamental frequency (F0). Despite no significant differences in articulation rate and floor transfer rate, the constant noise environment, presented at lower levels compared to previous studies, may have likely limited the potential for pronounced effects. The second study involved normal-hearing (NH) individuals, investigating the effects of conversation type (free-form vs. task-based) and noise presence (70 dB SPL) on conversational dynamics. Dyadic interactions among NH participants were examined. Task-based conversations exhibited structured patterns with longer FTOs and higher floor transfer rates, while free-form conversations showed greater FTO variability, more frequent overlaps, longer IPUs, and increased pause durations and rates. Noise presence increased IPU durations and pause lengths but did not significantly alter floor-transfer rates or FTO variability. Both conversation types experienced increased articulation rates and speech levels in noise. Contrary to the expected change as part of the Lombard effect, the increase in articulation rates may be attributed to the noise acting as a stressor. Meanwhile, the increase in mean speech levels was less pronounced than expected, possibly due to the specific noise characteristics and the use of closed headphones. These studies shine a light on the complexity of communicative interactions and the necessity of accounting for a wide spectrum of factors in experimental designs. The findings highlight the importance of considering both environmental conditions and conversation types when researching speech perception, production, and conversational dynamics. This research provides valuable insights for academic studies and the development of hearing-assistive technologies, emphasizing the need for assessments that reflect the varied nature of everyday communication.
  • Item
    Addressing Data Scarcity in Domain Generalization for Computer Vision Applications in Image Classification
    (University of Waterloo, 2024-08-30) Kaai, Kimathi
    Domain generalization (DG) for image classification is a crucial task in machine learning that focuses on transferring domain-invariant knowledge from multiple source domains to an unseen target domain. Traditional DG methods assume that classes of interest are present across multiple domains (domain-shared), which helps mitigate spurious correlations between domain and class. However, in real-world scenarios, data scarcity often leads to classes being present in only a single domain (domain-linked), resulting in poor generalization performance. This thesis introduces the domain-linked DG task and proposes a novel methodology to address this challenge. This thesis proposes FOND, a "Fairness-inspired cONtrastive learning objective for Domain-linked domain generalization," which leverages domain-shared classes to learn domain-invariant representations for domain-linked classes. FOND is designed to enhance generalization by minimizing the impact of task-irrelevant domain-specific features. The theoretical analysis in this thesis extends existing domain adaptation error bounds to the domain-linked DG task, providing insights into the factors that influence generalization performance. Key theoretical findings include the understanding that domain-shared classes typically have more samples and learn domain-invariant features more effectively than domain-linked classes. This analysis informs the design of FOND, ensuring that it addresses the unique challenges of domain-linked DG. Furthermore, experiments are performed across multiple datasets and experimental settings to evaluate the effectiveness of various current methodologies. The proposed method achieves state-of-the-art performance in domain-linked DG tasks, with minimal trade-offs in the performance of domain-shared classes. Experimental results highlight the impact of shared-class settings, total class size, and inter-domain variations on the generalizability of domain-linked classes. Visualizations of learned representations further illustrate the robustness of FOND in capturing domain-invariant features. In summary, this thesis advocates future DG research for domain-linked classes by (1) theoretically and experimentally analyzing the factors impacting domain-linked class representation learning, (2) demonstrating the ineffectiveness of current state-of-the-art DG approaches, and (3) proposing an algorithm to learn generalizable representations for domain-linked classes by transferring useful representations from domain-shared ones.
  • Item
    The Effects of Stimulus Statistics on Representational Similarity in a Model of Mouse Visual Cortex
    (University of Waterloo, 2024-08-30) Torabian, Parsa
    Deep convolutional neural networks have emerged as convincing models of the visual cortex, demonstrating remarkable ability to predict neural activity. However, the specific combination of factors that optimally align these models with biological vision remains an open question. Network architecture, training objectives, and the statistics of training data all likely play a role, but their relative contributions and interactions are not fully understood. In this study, we focus on the role of training data in shaping the representations learned by deep networks. We investigate how the degree of 'realism' in the training data affects the similarity between network activations and neural recordings from mouse visual cortex. We hypothesised that training on more naturalistic stimuli would lead to greater brain-model similarity, as the visual system has evolved to process the statistics of the natural world. We leveraged the Unity video-game engine to generate custom training datasets with the ability to control for three distinct factors: the realism of the virtual environment, the motion statistics of the simulated agent, and the optics of the modelled eye. Deep networks were trained on datasets generated from all eight permutations of these three experiment variables using a self-supervised learning approach. The trained models were subsequently compared to mouse neural data from the Allen Institute using representational similarity analysis. Our results reveal that the realism of the virtual environment has a substantial and consistent effect on brain-model similarity. Networks trained on the more realistic meadow-environment exhibited significantly higher similarity to mouse visual cortex across multiple areas. In contrast, the effects of motion statistics and visual optics were more subtle and area-specific. Furthermore, all possible interactions between these three factors were statistically significant, suggesting complex nonlinear relationships.
  • Item
    Towards an Optical Biopsy Tool Using Photon Absorption Remote Sensing
    (University of Waterloo, 2024-08-28) Veugen, Jenna
    Streamlining diagnosis is more important than ever, as the long wait times, resource constraints, and diagnostic inaccuracies place burdens on the healthcare system that climb each year. The development of a tool capable of instantaneous in situ diagnosis would eliminate the excess time and resources used in current diagnostic procedures, and thereby relieve some of these burdens. This could be achieved with an optical biopsy by leveraging light-matter interactions for advanced microscopy in an endoscopic form. However, to date there is no technology able to provide diagnostically equivalent image quality to the gold standard for diagnosis in an endoscopic form. Photon Absorption Remote Sensing (PARS) is a novel imaging modality that utilizes optical absorption contrast to achieve label-free, non-contact microscopy. PARS technology holds promising potential in resolving many of the challenges faced in the development of an optical biopsy tool. This thesis explores the initial development of a PARS endoscope capable of in vivo microvascular imaging through multiple phases of development. The first stage investigated the performance of a dual green PARS bench-top system, utilizing green excitation and detection wavelengths to address chromatic aberrations in the final endoscopic form. The system was confined to a green excitation wavelength in order to target the absorption of hemoglobin for vascular imaging. It was then paired with a green detection wavelength for the first time, unlike typical PARS microscopes that rely on near-infrared (NIR) wavelengths for detection. Both phantom and in vivo samples were imaged to validate the performance of the system, showing functionality and sensitivity comparable to NIR PARS systems. The next phase explored the transition of a stationary PARS bench-top system to a free imaging head using optical fiber. This introduced many challenges, such as high losses and inherent noise, that had to be addressed through careful design, assembly and optimization. Two types of specialized optical fiber were tested by imaging phantom targets and in vivo chicken embryo samples. The double clad fiber setup showed strong performance with excellent contrast, signal to noise ratio and sensitivity in the PARS images. The final stage included miniaturizing the imaging head to achieve an endoscopic form factor. Various miniature objective lens designs were developed, and tested in the system. The successful design was capable of imaging both in phantoms and in vivo, demonstrating, for the first time, vasculature imaged using PARS through optical fiber. This research lays the groundwork in the development of a PARS endoscope capable of providing a gold standard quality, instantaneous diagnosis in situ. It demonstrates a successful design capable of capturing relevant biomarkers in vivo using endoscopic PARS technology. The improved understanding of the design requirements for a more efficient system, and insight into the fundamental limitations, highlight future directions to further improve this device. This puts us one step closer towards achieving a successful optical biopsy tool that could streamline diagnosis, improve the outcome, safety and experience of the patient, and significantly reduce the cost burden on the health system.
  • Item
    Language Guided Out-of-Bounding Box Pose Estimation for Robust Ice Hockey Analysis
    (University of Waterloo, 2024-08-27) Balaji, Bavesh
    Accurate estimation of human pose and the pose of interacting objects, such as hockey sticks, is fundamental in vision-driven hockey analytics and crucial for tasks like action recognition and player assessment. Estimating 2D keypoints from monocular video is challenging, particularly in fast-paced sports such as ice hockey, where motion blur, occlusions, bulky equipment, color similarities, and constant camera panning complicate accurate pose prediction. This thesis addresses these challenges with contributions on three fronts. First, recognizing the lack of an existing benchmark, we present a comparative study of four state-of-the-art human pose estimation approaches using a real-world ice hockey dataset. This analysis aims to understand the impact of each model on ice hockey pose estimation and investigate their respective advantages and disadvantages. Building on insights from this comparative study, we develop an ensemble model for jointly predicting player and stick poses. The ensemble comprises two networks: one trained from scratch to predict all keypoints, and another utilizing a unique transfer learning paradigm to incorporate knowledge from large-scale human pose datasets. Despite achieving promising results, we observe that these top-down approaches yield suboptimal outcomes due to constraints such as requiring all keypoints to be within a bounding box and accommodating only one player per bounding box. To overcome these issues, we introduce an image and text based multi-modal solution called TokenCLIPose, which predicts stick keypoints without encapsulating them within a bounding box. By focusing on capturing only the player in a bounding box and treating their stick as missing, our model predicts out-of-bounding box keypoints. To incorporate the context of the missing keypoints, we use keypoint-specific text prompts to leverage the rich semantic representations provided by language. This dissertation’s findings advance the state-of-the-art in 2D pose estimation for ice hockey, outperforming existing methods by 2.6% on our dataset, and provide a robust foundation for further developments in vision-driven sports analytics.
  • Item
    Scaling Laws for Compute Optimal Biosignal Transformers
    (University of Waterloo, 2024-08-20) Fortin, Thomas
    Scaling laws which predict the optimal balance between number of model parameters and number of training tokens given a fixed compute budget have recently been developed for language transformers. These allow model developers to allocate their compute budgets such that they can achieve optimal performance. This thesis develops such scaling laws for the Biosignal Transformer trained separately on both accelerometer data and EEG data. This is done by applying methods used by other researchers to develop similar scaling laws for language transformer models. These are referred to as the iso-FLOP curve method and the parametric loss function method. The Biosignal Transformer model is a transformer model which is designed specifically to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as input. For example, the Biosignal Transformer can be trained to detect or classify seizures from EEG signals. The Biosignal Transformer is also of particular interest because it is designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve performance on downstream tasks with smaller labelled fine-tuning datasets. This work develops scaling laws which optimize for the best unsupervised pre-training loss given a fixed compute budget. Results show that the developed scaling laws are successful at predicting a balance between number of parameters and number of training tokens for compute budgets five times larger than those used to develop them such that pre-training loss is minimized. Researchers who intend to scale up the Biosignal Transformer should use these scaling laws to attain optimal pre-training loss from their given compute budgets when applying unsupervised pre-training with the Biosignal Transformer.
  • Item
    Talker Sensitivity to Turn-Taking in Conversation
    (University of Waterloo, 2024-08-19) Masters, Benjamin
    Turn-taking in conversation is a complex phenomenon that requires talkers to, at a minimum, simultaneously plan and produce their own speech and listen to and comprehend the speech of their partner(s). Given this necessary division of attention, the increase in listening difficulty introduced by hearing impairments can have confounding effects on a person's ability to communicate, and evaluating listening effort during communication remains difficult. One of the most detrimental effects of hearing loss is the impact it has on one's ability to communicate effectively though, thus the assessment of listening effort in natural environments is especially important. This thesis takes two approaches to evaluating listening effort in conversation. The first analyzes the response of the pupil at the temporal scale of turn-taking to understand how effort and attention are allocated between speaking, listening, and other task demands. Pupillary temporal response functions to turn-taking are derived and analyzed for systematic differences that exist across people and acoustic environmental conditions, and are further analyzed to determine differences in pupil response based on expected difficulty of a conversation. The second approach analyzes behavioral changes related to the timing of turn-taking to understand how talkers identify that communication difficulty is being experienced by a conversational partner. The floor transfer offset (FTO), defined as the time it takes one talker to begin their turn after another has ended theirs, was manipulated during interactive conversations to mimic the observed increase in magnitude and variability of FTOs in difficult listening environments. To enable this, an audio processing framework was developed to track the state of a conversation in near real-time and manipulate the perceived response time of talkers. The findings suggest that the timing of turn-taking is not used a cue by talkers to infer difficulty.
  • Item
    Autonomous Robotic System Conducting Nasopharyngeal Swabbing
    (University of Waterloo, 2024-08-15) Lee, Peter Qiu Jiun
    The nasopharyngeal swab test is a procedure where a healthcare worker inserts a swab through the nose until it reaches the nasopharynx located at the back of the nasal cavity in order to collect secretions that can later be examined for illnesses. This procedure saw heightened use to detect cases during the COVID-19 pandemic. Its ubiquity also highlighted fragilities in the healthcare system by way of the hazards to healthcare workers from infectious patients and the pressures a pandemic can inflict upon an unready healthcare system. In this thesis we consider and propose an autonomous robotic system for performing nasopharyngeal swab tests by use of a collaborative robotic manipulator arm, under the ideology that the hardware and techniques could eventually be applied to other types of close-contact tasks to support the healthcare system. We also assume that prospective patients would be standing unrestrained in front of the arm, which adds the challenges of adjusting to arbitrary poses of the head and compensating for natural head motion. We first designed an instrumented end-effector to attach to a robotic arm to enable suitable vision and force sensing capabilities for the task. Next, we developed a finite element modeling simulation environment to describe the deformation of the swab as it moves through the nasal cavity, and solve an optimization problem to find ideal paths through the nasal cavity. A visual servo system was designed to properly align the swab next to the nose using visual information using advances in deep learning and state-estimation, which we validated with a number of human trials. A torque controlled force compliant system was designed and evaluated to determine the feasibility of using force measurements to correct for misalignment when the swab is inserted into a nasal cavity phantom. Finally, we integrated all the system components into a cohesive system for performing nasopharyngeal swab tests. We created a simulator using a nasal cavity phantom and a second robot arm to mimic natural motions of the head. This simulator was leveraged to perform extensive experimentation that found promising controller configurations that were able to compensate for head motion.
  • Item
    Topic Segmentation of Recorded Meetings
    (University of Waterloo, 2024-08-13) Lazoja, Ilir
    Video chapters allow videos to be more easily digestible and can be an important pre-processing step for other video-processing tasks. In many cases, the creator can easily chapter their own videos, especially for well-edited structured videos. However, some types of videos, such as recorded meetings, are more loosely structured with less obvious breaks which makes them more cumbersome to chapter and thus would highly benefit from being automated. One approach to chaptering these types of videos is through performing topic segmentation on the transcript of the video, especially if the video is rich in dialogue. Topic segmentation is the task of dividing text based on when the topic of the text changes, most commonly performed on large bodies of written text. This thesis will detail how well state-of-the-art approaches for topic segmentation performs on recorded meetings, as well as present and evaluate strategies to improve performance for recorded meetings and express shortcomings of the common metrics used for topic segmentation.
  • Item
    Robust 3D Human Modeling for Baseball Sports Analytics
    (University of Waterloo, 2024-08-12) Bright, Jerrin
    In the fast-paced world of baseball, maximizing pitcher performance while minimizing runs relies on understanding subtle variations in mechanics. Traditional analysis methods, reliant on pre-recorded offline numerical data, struggle in the dynamic flow of live games. Although seemingly ideal, broadcast video analysis faces significant challenges due to motion blur, occlusion, and low resolution. This research proposes a novel 3D human modeling technique and a pitch statistics identification system that are robust to the aforementioned challenges. Specifically, we propose a technique called Distribution and Depth-Aware Human Mesh Recovery (D2A-HMR), a depth and distribution-aware 3D human mesh recovery technique that extracts pseudo-depth from each frame and utilizes a transformer network with self- and cross-attention to create a 3D mesh that extracts the 3D pose coordinates. The network is regularized using various loss functions including a silhouette loss function, joint reprojection loss functions, and a distribution loss function which utilize normalizing flow to learn the deviation between the underlying predicted and ground truth distributions. Furthermore, we propose a focused augmentation strategy specifically designed to address the motion blur issue caused by fast-moving motion. Following that, we introduce the PitcherNet system, which is built upon the D2A-HMR and motion blur augmentation strategy. PitcherNet proposes an automated analysis system that analyzes pitcher kinematics directly from live broadcast video, providing valuable pitch statistics (pitch velocity, release point, pitch position, release extension, and pitch handedness). The system relies solely on the broadcast videos as its input and leverages computer vision and pattern recognition to generate reliable pitch statistics from the game. First, PitcherNet isolates the pitcher and batter in each frame using a role classification network. Next, PitcherNet extracts the kinematic information representing the pitcher’s joints and surface using a refined version of D2A-HMR model. Additionally, we enhance the generalizability of the 3D human model by incorporating additional in-the-wild high-resolution videos from the Internet. Finally, PitcherNet employs Temporal Convolutional Network (TCN) and kinematic-driven heuristics to capture the pitch statistics, which can be used to analyze baseball pitchers.
  • Item
    Planning Renewable Electricity Using Life-Cycle Analysis
    (University of Waterloo, 2024-07-16) Ali, Mir Sadek
    It has been predicted that by the mid-21st century worldwide energy demand will grow two to three times the current level of demand. Expanding the global electric power generation capacity will be problematic using the three predominant methods, namely, nuclear fission, fossil fuels and hydropower. There are few suitable sites left for new large-scale hydropower dams. Both fossil fuels and nuclear fission have widespread environmental consequences to their use and the supply of fuel for these two technologies is a non-renewable resource. Renewable energy system (RES) technologies have been proposed as the means to expanding energy markets in a sustainable manner. A formative step in deploying RES will be the design of a standardized methodology for determining policy and planning decisions to initiate market and government support for these nascent technologies. This thesis outlines the design of a RES planning model based on the life-cycle analysis (LA) methodology. The proposed model will integrate a climatologically-based renewable energy optimization and simulation (REOS) model into the LCA. Goal-attainment algorithms will be used to find feasible installed capacities for power generation which will meet a prescribed load demand and simultaneously attempt to meet desired policy targets. The policy targets here will be the per-kilowatt hour price of power, life-cycle air-borne CO2 emissions, and the land requirements of the system. An analysis of the performance of RES technologies in two Canadian cities that already have mature electricity utilities is done to demonstrate the methodology.
  • Item
    Toward Automated Detection of Landfast Ice Polynyas in C-Band Synthetic Aperture Radar Imagery with Convolutional Neural Networks
    (University of Waterloo, 2024-07-12) Brubacher, Neil
    Landfast ice polynyas - areas of open water surrounded by ice - are important features in many Northern coastal communities, and their automated detection from spaceborne synthetic aperture radar (SAR) imagery is positioned to support on-ice travel safety under changing Arctic sea ice and climate conditions. The characteristically small spatial scales and sparse distribution of landfast ice polynyas present key challenges to their detection, and limit the suitability of established methods developed for SAR-based sea ice and open water classification at broader spatial scales. This thesis explores the development of deep learning-based object detection networks for landfast ice polynya detection in dual-polarized C-band SAR imagery, having three main contributions. The first is a characterization of landfast ice polynya signatures and separability in SAR imagery based on datasets of polynyas mapped over several seasons near the communities of Sanikiluaq, NU, and Nain, NL. Results from this analysis highlight the challenging and variable nature of polynya signatures in dual-polarized backscatter intensity, motivating the use of convolutional neural networks (CNNs) to capture relevant textural, geometric and contextual polynya features. The second contribution is the development and evaluation of CNN-based object detection networks for polynya detection, drawing on advancements in the natural-scene small object detection field to address the challenging size and sparsity characteristics of polynyas. A simplified detection network architecture optimized for polynya detection in terms of feature representation capacity, feature map resolution, and training loss balancing is found to reliably detect polynyas with sufficient size and local contrast, and demonstrates good generalization to regions not seen in training. The third contribution is an assessment of detection model generalizability between imagery produced by Sentinel-1 (S1) and Radarsat Constellation Mission (RCM) SAR sensors, illustrating the ability for models trained only on S1 imagery to effectively extract and classify polynya features in RCM despite differences in resolution and noise characteristics. Across regions and sensors, missed polynyas are found to have smaller sizes and weaker signatures than detected polynyas, while false predictions are often caused by boundary areas between smooth and rough landfast ice. These represent fundamental limits to polynya / landfast ice separability in the medium-resolution, dual-polarized C-band SAR imagery used in this thesis, motivating future research into multi-temporal, multi-frequency, and/or higher-resolution SAR imagery for polynya detection. Ongoing and future progress in the development of robust landfast ice hazard detection systems is positioned to support community sea ice safety and monitoring.
  • Item
    Dynamic Alert Design Based on Driver’s Cognitive State for Take-over Request in Automated Vehicles
    (University of Waterloo, 2024-07-03) Umpaipant, Wachirawit
    This thesis investigates the effectiveness of dynamic alert systems tailored to drivers' cognitive states in automated driving environments, focusing on enhancing takeover readiness during critical transitions. Utilizing a large-scale immersive driving simulation, the study evaluated drivers' response times and physiological measures when reacting to various alert intensities and the presence of a secondary typing task. The experiment revealed that dynamic alerts significantly improved response times and takeover performance, especially in high-distraction scenarios. Drivers responded more effectively when alerts were adjusted to their cognitive load, with strong alerts resulting in the fastest reaction times under distracted conditions. On average, dynamic alerts reduced response times by approximately 1.75 seconds compared to static alerts. Additionally, higher lateral accelerations were observed under strong alerts, indicating more decisive maneuvering. Self-rated attention-capturing scores were notably higher with dynamic alerts, particularly under strong alert conditions and in the presence of secondary tasks. The ANOVA results showed significant improvements in attention capturing and overall alert effectiveness when dynamic alerts were employed, demonstrating the robust design’s ability to capture attention and enhance driver responsiveness. The study confirmed that adaptive alert designs, which adjust based on the driver's cognitive state, can markedly enhance overall driving experience and safety. Participants reported higher levels of confidence with dynamic alerts, especially in scenarios involving secondary tasks. Despite the strong alerts, annoyance levels remained low, indicating that dynamic alerts are effective without causing undue stress. These results underscore the potential of using adaptive systems to improve safety and efficiency in automated driving, advocating for a more nuanced approach to system alerts that considers the variable cognitive states of drivers. Future research should validate these findings with on-road studies, explore a broader range of alert modalities, and refine physiological monitoring techniques to further enhance adaptive alert systems.
  • Item
    Practical Application of Machine Learning to Water Pipe Failure Prediction
    (University of Waterloo, 2024-06-24) Laven, Kevin
    As water networks age, many utilities are faced with rising water main break rates and insufficient replacement funds. Machine learning is a promising tool to support efficient water pipe replacement decisions. This thesis explores the practical application of machine learning for water pipe failure prediction using a dataset of over 10 million pipe-year records from four countries. Analysis of predictive factors shows that length, age, diameter, material, and failure history are each significant. Two novel relationships with break rate are observed: with respect to diameter, an inverse linear relationship, and with respect to age a peak at around 40 years followed by a decline lasting several decades. A method is presented for predicting both probability of failure and the expected number of failures in a given pipe and time period. By inferring units, encoding categorical features, and normalizing for different utility practices, it is proposed that a single model can generalize across utilities, geographies, and time periods without any utility-specific data cleansing. The model is trained and tested on a leave-one-utility-out basis, with training data from time periods strictly prior to test data. The resulting Area Under the Curve for the Receiver Operating Characteristic of over 0.85 and Cumulate Lift at 10% of over 5.0 demonstrate the practical applicability of the model, matching the performance of models trained and tested on each utility’s own data. Within this model, a method of cross-encoding categorical features with numerical features is introduced to enable integration of data sets from diverse contributors. The applicability of these performance metrics and model outputs to common utility water main replacement decision making processes is also shown.
  • Item
    Modal Interaction in Electrostatic MEMS Mirrors
    (University of Waterloo, 2024-05-31) Rahmanian, Sasan
    The impetus of this work is to introduce nonlinear modal interactions as novel actuation mechanism for electrostatic MEMS-based scanning micromirrors. Modal interactions refer to the engagement of two or more modes of vibration in a system, creating a bridge to channel vibration energy from a directly excited mode to one or more of the coupled modes. In chapter two, this report carries out a comprehensive literature review of the different types of mode coupling in nonlinear resonators. First, internal resonance in general nonlinear oscillators are addressed. Second, we limit our focus to mode coupling in electrostatic MEMS. As an initial test-bed, we examine in chapter three the modulation equations governing a system of two nonlinearly coupled 1-DOF oscillators involved in a 2:1 parametric modal interaction. Simulations show that as the excitation frequency varies in the vicinity of the directly excited higher-frequency oscillator, the amplitude of its motions saturate. Meanwhile, the amplitude of the lower-frequency oscillator undergoes large motions under the influence of a parametric ‘energy pump’. The fourth chapter reports on nonlinear modal interaction in a MEMS made of an electrostatically actuated curved-beam. We characterize the first few in-plane and out-of-plane bending modes of the beam. Thermal noise excitation is utilized to extract the out-of-plane natural frequencies, whereas the in-plane natural frequencies are captured using pulse excitation. Then, the frequency response of the MEMS in the neighbor of the first symmetric and second symmetric in-plane modes. Characterization results discloses a 2:1 ratio between the second symmetric and the first anti-symmetric in-plane modes. We show that this anti-symmetric mode can be effectively excited via the energy channel between it and the second symmetric mode when the latter is driven directly by external electrostatic forcing. In the fifth chapter, we establish bending-torsional equations of motion for a symmetric electrostatic MEMS actuator that can capture the 2:1 modal interaction between its in-plane bending and out-of-plane rotational motions. Our approach demonstrates that incorporating the linear slopes into the cross-sectional shear strains efficiently originates quadratic couplings between the bending and torsional motions whose existence depends on non-vanishing first moments of area of the microbeam's cross-section. According to imperfections in microdevice fabrication, we assumed a minuscule offset in positions between the centroid of the as-fabricated and as-designed cross-sections of the microbeams. Energy approach is exploited to derive the equations of motion (EoM). The static response of the MEMS actuator together with its tuned eigenmodes are examined in this chapter. Chapter six reports the frequency- and voltage-displacement behaviors of the mirror addressing the 2:1 and 3:1 flexural torsional internal resonance experimentally and numerically. The numerical simulation results indicate that the in-plane motion, which is the directly excited mode, saturates upon the initiation of a 2:1 energy pathway between the bending and torsional motions. Through suitable tuning of the AC frequency, the amplitude of the in-plane motion is minimized, while the amplitude of the torsional motion, an indirectly excited mode, is maximized. The numerical simulation results demonstrate that the actuator's torsional motion, when subjected to a 1:2:1 electro-flexural-torsional modal interactions, is triggered by applying a maximum voltage of 10 V, resulting in about 15 degrees rotational angle. Further, prolific frequency combs are generated as a result of secondary Hopf bifurcations along the large-amplitude response branches, capturing quasi-periodicity in the MEMS dynamics. The experimental results demonstrate the mirror's dynamics exhibiting 3:1 flexural-torsional modal interaction that provides an efficient out-of-plane rotation drive through in-plane excitation. The present study is a platform for the implementation of a novel actuation mechanism of MEMS scanning micromirrors using parametric modal interaction. Conclusion remarks and propose future work with the are presented seven chapter.