Systems Design Engineering

Permanent URI for this collection

This is the collection for the University of Waterloo's Department of Systems Design Engineering.

Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).

Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.

Browse

Recent Submissions

Now showing 1 - 20 of 746
  • Item
    Assessment of Acoustic Markers of Conversational Difficulty
    (University of Waterloo, 2024-09-06) Ellag, Menatalla
    Human conversations, one of the most complex behaviors, require the real-time coordination of speech production and comprehension, involving cognitive, social, and biological dimensions. There has been a rising need for laboratory and clinical assessments to evolve to capture the essence of everyday interactions. The cognitive demands of interactive conversation, which require listeners to process and store information while simultaneously planning their responses, often exceed those encountered in standard clinical tests. These assessments must encompass diverse contexts and participant groups, including varying hearing statuses, challenging listening environments such as background noise, the use of assistive devices that may alter the listening experience, and different conversation types such as relational versus transactional exchanges, dyadic versus group interactions, and face-to-face versus remote interactions. This study consists of two investigations exploring how different conditions affect acoustic measures of speech production and conversational behavior. The first study was an extension of a study originally conducted for content analysis and participants’ subjective rating questionnaires, focusing on hearing-impaired (HI) individuals. It examined the impact of face masks and remote microphones on communication dynamics. Four native English-speaking HI participants engaged in free-form conversations within small groups under a constant background noise of 55 dBA. Interestingly, the results showed that using remote microphones shortened floor-transfer offsets (FTOs) and extended conversation durations, suggesting improved communication. When participants did not wear a face mask, interpausal unit (IPU) durations were shorter with remote microphones than without, indicating easier communication. However, no significant difference was found between the two mask conditions, suggesting that face masks affect both speech perception and production by decreasing inhalation and exhalation volumes, thereby limiting the duration of utterances. Face masks are speculated to increase resistance to airflow, reducing subglottal pressure and consequently lowering fundamental frequency (F0). Despite no significant differences in articulation rate and floor transfer rate, the constant noise environment, presented at lower levels compared to previous studies, may have likely limited the potential for pronounced effects. The second study involved normal-hearing (NH) individuals, investigating the effects of conversation type (free-form vs. task-based) and noise presence (70 dB SPL) on conversational dynamics. Dyadic interactions among NH participants were examined. Task-based conversations exhibited structured patterns with longer FTOs and higher floor transfer rates, while free-form conversations showed greater FTO variability, more frequent overlaps, longer IPUs, and increased pause durations and rates. Noise presence increased IPU durations and pause lengths but did not significantly alter floor-transfer rates or FTO variability. Both conversation types experienced increased articulation rates and speech levels in noise. Contrary to the expected change as part of the Lombard effect, the increase in articulation rates may be attributed to the noise acting as a stressor. Meanwhile, the increase in mean speech levels was less pronounced than expected, possibly due to the specific noise characteristics and the use of closed headphones. These studies shine a light on the complexity of communicative interactions and the necessity of accounting for a wide spectrum of factors in experimental designs. The findings highlight the importance of considering both environmental conditions and conversation types when researching speech perception, production, and conversational dynamics. This research provides valuable insights for academic studies and the development of hearing-assistive technologies, emphasizing the need for assessments that reflect the varied nature of everyday communication.
  • Item
    Addressing Data Scarcity in Domain Generalization for Computer Vision Applications in Image Classification
    (University of Waterloo, 2024-08-30) Kaai, Kimathi
    Domain generalization (DG) for image classification is a crucial task in machine learning that focuses on transferring domain-invariant knowledge from multiple source domains to an unseen target domain. Traditional DG methods assume that classes of interest are present across multiple domains (domain-shared), which helps mitigate spurious correlations between domain and class. However, in real-world scenarios, data scarcity often leads to classes being present in only a single domain (domain-linked), resulting in poor generalization performance. This thesis introduces the domain-linked DG task and proposes a novel methodology to address this challenge. This thesis proposes FOND, a "Fairness-inspired cONtrastive learning objective for Domain-linked domain generalization," which leverages domain-shared classes to learn domain-invariant representations for domain-linked classes. FOND is designed to enhance generalization by minimizing the impact of task-irrelevant domain-specific features. The theoretical analysis in this thesis extends existing domain adaptation error bounds to the domain-linked DG task, providing insights into the factors that influence generalization performance. Key theoretical findings include the understanding that domain-shared classes typically have more samples and learn domain-invariant features more effectively than domain-linked classes. This analysis informs the design of FOND, ensuring that it addresses the unique challenges of domain-linked DG. Furthermore, experiments are performed across multiple datasets and experimental settings to evaluate the effectiveness of various current methodologies. The proposed method achieves state-of-the-art performance in domain-linked DG tasks, with minimal trade-offs in the performance of domain-shared classes. Experimental results highlight the impact of shared-class settings, total class size, and inter-domain variations on the generalizability of domain-linked classes. Visualizations of learned representations further illustrate the robustness of FOND in capturing domain-invariant features. In summary, this thesis advocates future DG research for domain-linked classes by (1) theoretically and experimentally analyzing the factors impacting domain-linked class representation learning, (2) demonstrating the ineffectiveness of current state-of-the-art DG approaches, and (3) proposing an algorithm to learn generalizable representations for domain-linked classes by transferring useful representations from domain-shared ones.
  • Item
    The Effects of Stimulus Statistics on Representational Similarity in a Model of Mouse Visual Cortex
    (University of Waterloo, 2024-08-30) Torabian, Parsa
    Deep convolutional neural networks have emerged as convincing models of the visual cortex, demonstrating remarkable ability to predict neural activity. However, the specific combination of factors that optimally align these models with biological vision remains an open question. Network architecture, training objectives, and the statistics of training data all likely play a role, but their relative contributions and interactions are not fully understood. In this study, we focus on the role of training data in shaping the representations learned by deep networks. We investigate how the degree of 'realism' in the training data affects the similarity between network activations and neural recordings from mouse visual cortex. We hypothesised that training on more naturalistic stimuli would lead to greater brain-model similarity, as the visual system has evolved to process the statistics of the natural world. We leveraged the Unity video-game engine to generate custom training datasets with the ability to control for three distinct factors: the realism of the virtual environment, the motion statistics of the simulated agent, and the optics of the modelled eye. Deep networks were trained on datasets generated from all eight permutations of these three experiment variables using a self-supervised learning approach. The trained models were subsequently compared to mouse neural data from the Allen Institute using representational similarity analysis. Our results reveal that the realism of the virtual environment has a substantial and consistent effect on brain-model similarity. Networks trained on the more realistic meadow-environment exhibited significantly higher similarity to mouse visual cortex across multiple areas. In contrast, the effects of motion statistics and visual optics were more subtle and area-specific. Furthermore, all possible interactions between these three factors were statistically significant, suggesting complex nonlinear relationships.
  • Item
    Towards an Optical Biopsy Tool Using Photon Absorption Remote Sensing
    (University of Waterloo, 2024-08-28) Veugen, Jenna
    Streamlining diagnosis is more important than ever, as the long wait times, resource constraints, and diagnostic inaccuracies place burdens on the healthcare system that climb each year. The development of a tool capable of instantaneous in situ diagnosis would eliminate the excess time and resources used in current diagnostic procedures, and thereby relieve some of these burdens. This could be achieved with an optical biopsy by leveraging light-matter interactions for advanced microscopy in an endoscopic form. However, to date there is no technology able to provide diagnostically equivalent image quality to the gold standard for diagnosis in an endoscopic form. Photon Absorption Remote Sensing (PARS) is a novel imaging modality that utilizes optical absorption contrast to achieve label-free, non-contact microscopy. PARS technology holds promising potential in resolving many of the challenges faced in the development of an optical biopsy tool. This thesis explores the initial development of a PARS endoscope capable of in vivo microvascular imaging through multiple phases of development. The first stage investigated the performance of a dual green PARS bench-top system, utilizing green excitation and detection wavelengths to address chromatic aberrations in the final endoscopic form. The system was confined to a green excitation wavelength in order to target the absorption of hemoglobin for vascular imaging. It was then paired with a green detection wavelength for the first time, unlike typical PARS microscopes that rely on near-infrared (NIR) wavelengths for detection. Both phantom and in vivo samples were imaged to validate the performance of the system, showing functionality and sensitivity comparable to NIR PARS systems. The next phase explored the transition of a stationary PARS bench-top system to a free imaging head using optical fiber. This introduced many challenges, such as high losses and inherent noise, that had to be addressed through careful design, assembly and optimization. Two types of specialized optical fiber were tested by imaging phantom targets and in vivo chicken embryo samples. The double clad fiber setup showed strong performance with excellent contrast, signal to noise ratio and sensitivity in the PARS images. The final stage included miniaturizing the imaging head to achieve an endoscopic form factor. Various miniature objective lens designs were developed, and tested in the system. The successful design was capable of imaging both in phantoms and in vivo, demonstrating, for the first time, vasculature imaged using PARS through optical fiber. This research lays the groundwork in the development of a PARS endoscope capable of providing a gold standard quality, instantaneous diagnosis in situ. It demonstrates a successful design capable of capturing relevant biomarkers in vivo using endoscopic PARS technology. The improved understanding of the design requirements for a more efficient system, and insight into the fundamental limitations, highlight future directions to further improve this device. This puts us one step closer towards achieving a successful optical biopsy tool that could streamline diagnosis, improve the outcome, safety and experience of the patient, and significantly reduce the cost burden on the health system.
  • Item
    Language Guided Out-of-Bounding Box Pose Estimation for Robust Ice Hockey Analysis
    (University of Waterloo, 2024-08-27) Balaji, Bavesh
    Accurate estimation of human pose and the pose of interacting objects, such as hockey sticks, is fundamental in vision-driven hockey analytics and crucial for tasks like action recognition and player assessment. Estimating 2D keypoints from monocular video is challenging, particularly in fast-paced sports such as ice hockey, where motion blur, occlusions, bulky equipment, color similarities, and constant camera panning complicate accurate pose prediction. This thesis addresses these challenges with contributions on three fronts. First, recognizing the lack of an existing benchmark, we present a comparative study of four state-of-the-art human pose estimation approaches using a real-world ice hockey dataset. This analysis aims to understand the impact of each model on ice hockey pose estimation and investigate their respective advantages and disadvantages. Building on insights from this comparative study, we develop an ensemble model for jointly predicting player and stick poses. The ensemble comprises two networks: one trained from scratch to predict all keypoints, and another utilizing a unique transfer learning paradigm to incorporate knowledge from large-scale human pose datasets. Despite achieving promising results, we observe that these top-down approaches yield suboptimal outcomes due to constraints such as requiring all keypoints to be within a bounding box and accommodating only one player per bounding box. To overcome these issues, we introduce an image and text based multi-modal solution called TokenCLIPose, which predicts stick keypoints without encapsulating them within a bounding box. By focusing on capturing only the player in a bounding box and treating their stick as missing, our model predicts out-of-bounding box keypoints. To incorporate the context of the missing keypoints, we use keypoint-specific text prompts to leverage the rich semantic representations provided by language. This dissertation’s findings advance the state-of-the-art in 2D pose estimation for ice hockey, outperforming existing methods by 2.6% on our dataset, and provide a robust foundation for further developments in vision-driven sports analytics.
  • Item
    Scaling Laws for Compute Optimal Biosignal Transformers
    (University of Waterloo, 2024-08-20) Fortin, Thomas
    Scaling laws which predict the optimal balance between number of model parameters and number of training tokens given a fixed compute budget have recently been developed for language transformers. These allow model developers to allocate their compute budgets such that they can achieve optimal performance. This thesis develops such scaling laws for the Biosignal Transformer trained separately on both accelerometer data and EEG data. This is done by applying methods used by other researchers to develop similar scaling laws for language transformer models. These are referred to as the iso-FLOP curve method and the parametric loss function method. The Biosignal Transformer model is a transformer model which is designed specifically to be trained on tasks that use biosignals such as EEG, ECG, and accelerometer data as input. For example, the Biosignal Transformer can be trained to detect or classify seizures from EEG signals. The Biosignal Transformer is also of particular interest because it is designed to use unsupervised pre-training on large unlabelled biosignal datasets to improve performance on downstream tasks with smaller labelled fine-tuning datasets. This work develops scaling laws which optimize for the best unsupervised pre-training loss given a fixed compute budget. Results show that the developed scaling laws are successful at predicting a balance between number of parameters and number of training tokens for compute budgets five times larger than those used to develop them such that pre-training loss is minimized. Researchers who intend to scale up the Biosignal Transformer should use these scaling laws to attain optimal pre-training loss from their given compute budgets when applying unsupervised pre-training with the Biosignal Transformer.
  • Item
    Talker Sensitivity to Turn-Taking in Conversation
    (University of Waterloo, 2024-08-19) Masters, Benjamin
    Turn-taking in conversation is a complex phenomenon that requires talkers to, at a minimum, simultaneously plan and produce their own speech and listen to and comprehend the speech of their partner(s). Given this necessary division of attention, the increase in listening difficulty introduced by hearing impairments can have confounding effects on a person's ability to communicate, and evaluating listening effort during communication remains difficult. One of the most detrimental effects of hearing loss is the impact it has on one's ability to communicate effectively though, thus the assessment of listening effort in natural environments is especially important. This thesis takes two approaches to evaluating listening effort in conversation. The first analyzes the response of the pupil at the temporal scale of turn-taking to understand how effort and attention are allocated between speaking, listening, and other task demands. Pupillary temporal response functions to turn-taking are derived and analyzed for systematic differences that exist across people and acoustic environmental conditions, and are further analyzed to determine differences in pupil response based on expected difficulty of a conversation. The second approach analyzes behavioral changes related to the timing of turn-taking to understand how talkers identify that communication difficulty is being experienced by a conversational partner. The floor transfer offset (FTO), defined as the time it takes one talker to begin their turn after another has ended theirs, was manipulated during interactive conversations to mimic the observed increase in magnitude and variability of FTOs in difficult listening environments. To enable this, an audio processing framework was developed to track the state of a conversation in near real-time and manipulate the perceived response time of talkers. The findings suggest that the timing of turn-taking is not used a cue by talkers to infer difficulty.
  • Item
    Autonomous Robotic System Conducting Nasopharyngeal Swabbing
    (University of Waterloo, 2024-08-15) Lee, Peter Qiu Jiun
    The nasopharyngeal swab test is a procedure where a healthcare worker inserts a swab through the nose until it reaches the nasopharynx located at the back of the nasal cavity in order to collect secretions that can later be examined for illnesses. This procedure saw heightened use to detect cases during the COVID-19 pandemic. Its ubiquity also highlighted fragilities in the healthcare system by way of the hazards to healthcare workers from infectious patients and the pressures a pandemic can inflict upon an unready healthcare system. In this thesis we consider and propose an autonomous robotic system for performing nasopharyngeal swab tests by use of a collaborative robotic manipulator arm, under the ideology that the hardware and techniques could eventually be applied to other types of close-contact tasks to support the healthcare system. We also assume that prospective patients would be standing unrestrained in front of the arm, which adds the challenges of adjusting to arbitrary poses of the head and compensating for natural head motion. We first designed an instrumented end-effector to attach to a robotic arm to enable suitable vision and force sensing capabilities for the task. Next, we developed a finite element modeling simulation environment to describe the deformation of the swab as it moves through the nasal cavity, and solve an optimization problem to find ideal paths through the nasal cavity. A visual servo system was designed to properly align the swab next to the nose using visual information using advances in deep learning and state-estimation, which we validated with a number of human trials. A torque controlled force compliant system was designed and evaluated to determine the feasibility of using force measurements to correct for misalignment when the swab is inserted into a nasal cavity phantom. Finally, we integrated all the system components into a cohesive system for performing nasopharyngeal swab tests. We created a simulator using a nasal cavity phantom and a second robot arm to mimic natural motions of the head. This simulator was leveraged to perform extensive experimentation that found promising controller configurations that were able to compensate for head motion.
  • Item
    Topic Segmentation of Recorded Meetings
    (University of Waterloo, 2024-08-13) Lazoja, Ilir
    Video chapters allow videos to be more easily digestible and can be an important pre-processing step for other video-processing tasks. In many cases, the creator can easily chapter their own videos, especially for well-edited structured videos. However, some types of videos, such as recorded meetings, are more loosely structured with less obvious breaks which makes them more cumbersome to chapter and thus would highly benefit from being automated. One approach to chaptering these types of videos is through performing topic segmentation on the transcript of the video, especially if the video is rich in dialogue. Topic segmentation is the task of dividing text based on when the topic of the text changes, most commonly performed on large bodies of written text. This thesis will detail how well state-of-the-art approaches for topic segmentation performs on recorded meetings, as well as present and evaluate strategies to improve performance for recorded meetings and express shortcomings of the common metrics used for topic segmentation.
  • Item
    Robust 3D Human Modeling for Baseball Sports Analytics
    (University of Waterloo, 2024-08-12) Bright, Jerrin
    In the fast-paced world of baseball, maximizing pitcher performance while minimizing runs relies on understanding subtle variations in mechanics. Traditional analysis methods, reliant on pre-recorded offline numerical data, struggle in the dynamic flow of live games. Although seemingly ideal, broadcast video analysis faces significant challenges due to motion blur, occlusion, and low resolution. This research proposes a novel 3D human modeling technique and a pitch statistics identification system that are robust to the aforementioned challenges. Specifically, we propose a technique called Distribution and Depth-Aware Human Mesh Recovery (D2A-HMR), a depth and distribution-aware 3D human mesh recovery technique that extracts pseudo-depth from each frame and utilizes a transformer network with self- and cross-attention to create a 3D mesh that extracts the 3D pose coordinates. The network is regularized using various loss functions including a silhouette loss function, joint reprojection loss functions, and a distribution loss function which utilize normalizing flow to learn the deviation between the underlying predicted and ground truth distributions. Furthermore, we propose a focused augmentation strategy specifically designed to address the motion blur issue caused by fast-moving motion. Following that, we introduce the PitcherNet system, which is built upon the D2A-HMR and motion blur augmentation strategy. PitcherNet proposes an automated analysis system that analyzes pitcher kinematics directly from live broadcast video, providing valuable pitch statistics (pitch velocity, release point, pitch position, release extension, and pitch handedness). The system relies solely on the broadcast videos as its input and leverages computer vision and pattern recognition to generate reliable pitch statistics from the game. First, PitcherNet isolates the pitcher and batter in each frame using a role classification network. Next, PitcherNet extracts the kinematic information representing the pitcher’s joints and surface using a refined version of D2A-HMR model. Additionally, we enhance the generalizability of the 3D human model by incorporating additional in-the-wild high-resolution videos from the Internet. Finally, PitcherNet employs Temporal Convolutional Network (TCN) and kinematic-driven heuristics to capture the pitch statistics, which can be used to analyze baseball pitchers.
  • Item
    Planning Renewable Electricity Using Life-Cycle Analysis
    (University of Waterloo, 2024-07-16) Ali, Mir Sadek
    It has been predicted that by the mid-21st century worldwide energy demand will grow two to three times the current level of demand. Expanding the global electric power generation capacity will be problematic using the three predominant methods, namely, nuclear fission, fossil fuels and hydropower. There are few suitable sites left for new large-scale hydropower dams. Both fossil fuels and nuclear fission have widespread environmental consequences to their use and the supply of fuel for these two technologies is a non-renewable resource. Renewable energy system (RES) technologies have been proposed as the means to expanding energy markets in a sustainable manner. A formative step in deploying RES will be the design of a standardized methodology for determining policy and planning decisions to initiate market and government support for these nascent technologies. This thesis outlines the design of a RES planning model based on the life-cycle analysis (LA) methodology. The proposed model will integrate a climatologically-based renewable energy optimization and simulation (REOS) model into the LCA. Goal-attainment algorithms will be used to find feasible installed capacities for power generation which will meet a prescribed load demand and simultaneously attempt to meet desired policy targets. The policy targets here will be the per-kilowatt hour price of power, life-cycle air-borne CO2 emissions, and the land requirements of the system. An analysis of the performance of RES technologies in two Canadian cities that already have mature electricity utilities is done to demonstrate the methodology.
  • Item
    Toward Automated Detection of Landfast Ice Polynyas in C-Band Synthetic Aperture Radar Imagery with Convolutional Neural Networks
    (University of Waterloo, 2024-07-12) Brubacher, Neil
    Landfast ice polynyas - areas of open water surrounded by ice - are important features in many Northern coastal communities, and their automated detection from spaceborne synthetic aperture radar (SAR) imagery is positioned to support on-ice travel safety under changing Arctic sea ice and climate conditions. The characteristically small spatial scales and sparse distribution of landfast ice polynyas present key challenges to their detection, and limit the suitability of established methods developed for SAR-based sea ice and open water classification at broader spatial scales. This thesis explores the development of deep learning-based object detection networks for landfast ice polynya detection in dual-polarized C-band SAR imagery, having three main contributions. The first is a characterization of landfast ice polynya signatures and separability in SAR imagery based on datasets of polynyas mapped over several seasons near the communities of Sanikiluaq, NU, and Nain, NL. Results from this analysis highlight the challenging and variable nature of polynya signatures in dual-polarized backscatter intensity, motivating the use of convolutional neural networks (CNNs) to capture relevant textural, geometric and contextual polynya features. The second contribution is the development and evaluation of CNN-based object detection networks for polynya detection, drawing on advancements in the natural-scene small object detection field to address the challenging size and sparsity characteristics of polynyas. A simplified detection network architecture optimized for polynya detection in terms of feature representation capacity, feature map resolution, and training loss balancing is found to reliably detect polynyas with sufficient size and local contrast, and demonstrates good generalization to regions not seen in training. The third contribution is an assessment of detection model generalizability between imagery produced by Sentinel-1 (S1) and Radarsat Constellation Mission (RCM) SAR sensors, illustrating the ability for models trained only on S1 imagery to effectively extract and classify polynya features in RCM despite differences in resolution and noise characteristics. Across regions and sensors, missed polynyas are found to have smaller sizes and weaker signatures than detected polynyas, while false predictions are often caused by boundary areas between smooth and rough landfast ice. These represent fundamental limits to polynya / landfast ice separability in the medium-resolution, dual-polarized C-band SAR imagery used in this thesis, motivating future research into multi-temporal, multi-frequency, and/or higher-resolution SAR imagery for polynya detection. Ongoing and future progress in the development of robust landfast ice hazard detection systems is positioned to support community sea ice safety and monitoring.
  • Item
    Dynamic Alert Design Based on Driver’s Cognitive State for Take-over Request in Automated Vehicles
    (University of Waterloo, 2024-07-03) Umpaipant, Wachirawit
    This thesis investigates the effectiveness of dynamic alert systems tailored to drivers' cognitive states in automated driving environments, focusing on enhancing takeover readiness during critical transitions. Utilizing a large-scale immersive driving simulation, the study evaluated drivers' response times and physiological measures when reacting to various alert intensities and the presence of a secondary typing task. The experiment revealed that dynamic alerts significantly improved response times and takeover performance, especially in high-distraction scenarios. Drivers responded more effectively when alerts were adjusted to their cognitive load, with strong alerts resulting in the fastest reaction times under distracted conditions. On average, dynamic alerts reduced response times by approximately 1.75 seconds compared to static alerts. Additionally, higher lateral accelerations were observed under strong alerts, indicating more decisive maneuvering. Self-rated attention-capturing scores were notably higher with dynamic alerts, particularly under strong alert conditions and in the presence of secondary tasks. The ANOVA results showed significant improvements in attention capturing and overall alert effectiveness when dynamic alerts were employed, demonstrating the robust design’s ability to capture attention and enhance driver responsiveness. The study confirmed that adaptive alert designs, which adjust based on the driver's cognitive state, can markedly enhance overall driving experience and safety. Participants reported higher levels of confidence with dynamic alerts, especially in scenarios involving secondary tasks. Despite the strong alerts, annoyance levels remained low, indicating that dynamic alerts are effective without causing undue stress. These results underscore the potential of using adaptive systems to improve safety and efficiency in automated driving, advocating for a more nuanced approach to system alerts that considers the variable cognitive states of drivers. Future research should validate these findings with on-road studies, explore a broader range of alert modalities, and refine physiological monitoring techniques to further enhance adaptive alert systems.
  • Item
    Practical Application of Machine Learning to Water Pipe Failure Prediction
    (University of Waterloo, 2024-06-24) Laven, Kevin
    As water networks age, many utilities are faced with rising water main break rates and insufficient replacement funds. Machine learning is a promising tool to support efficient water pipe replacement decisions. This thesis explores the practical application of machine learning for water pipe failure prediction using a dataset of over 10 million pipe-year records from four countries. Analysis of predictive factors shows that length, age, diameter, material, and failure history are each significant. Two novel relationships with break rate are observed: with respect to diameter, an inverse linear relationship, and with respect to age a peak at around 40 years followed by a decline lasting several decades. A method is presented for predicting both probability of failure and the expected number of failures in a given pipe and time period. By inferring units, encoding categorical features, and normalizing for different utility practices, it is proposed that a single model can generalize across utilities, geographies, and time periods without any utility-specific data cleansing. The model is trained and tested on a leave-one-utility-out basis, with training data from time periods strictly prior to test data. The resulting Area Under the Curve for the Receiver Operating Characteristic of over 0.85 and Cumulate Lift at 10% of over 5.0 demonstrate the practical applicability of the model, matching the performance of models trained and tested on each utility’s own data. Within this model, a method of cross-encoding categorical features with numerical features is introduced to enable integration of data sets from diverse contributors. The applicability of these performance metrics and model outputs to common utility water main replacement decision making processes is also shown.
  • Item
    Modal Interaction in Electrostatic MEMS Mirrors
    (University of Waterloo, 2024-05-31) Rahmanian, Sasan
    The impetus of this work is to introduce nonlinear modal interactions as novel actuation mechanism for electrostatic MEMS-based scanning micromirrors. Modal interactions refer to the engagement of two or more modes of vibration in a system, creating a bridge to channel vibration energy from a directly excited mode to one or more of the coupled modes. In chapter two, this report carries out a comprehensive literature review of the different types of mode coupling in nonlinear resonators. First, internal resonance in general nonlinear oscillators are addressed. Second, we limit our focus to mode coupling in electrostatic MEMS. As an initial test-bed, we examine in chapter three the modulation equations governing a system of two nonlinearly coupled 1-DOF oscillators involved in a 2:1 parametric modal interaction. Simulations show that as the excitation frequency varies in the vicinity of the directly excited higher-frequency oscillator, the amplitude of its motions saturate. Meanwhile, the amplitude of the lower-frequency oscillator undergoes large motions under the influence of a parametric ‘energy pump’. The fourth chapter reports on nonlinear modal interaction in a MEMS made of an electrostatically actuated curved-beam. We characterize the first few in-plane and out-of-plane bending modes of the beam. Thermal noise excitation is utilized to extract the out-of-plane natural frequencies, whereas the in-plane natural frequencies are captured using pulse excitation. Then, the frequency response of the MEMS in the neighbor of the first symmetric and second symmetric in-plane modes. Characterization results discloses a 2:1 ratio between the second symmetric and the first anti-symmetric in-plane modes. We show that this anti-symmetric mode can be effectively excited via the energy channel between it and the second symmetric mode when the latter is driven directly by external electrostatic forcing. In the fifth chapter, we establish bending-torsional equations of motion for a symmetric electrostatic MEMS actuator that can capture the 2:1 modal interaction between its in-plane bending and out-of-plane rotational motions. Our approach demonstrates that incorporating the linear slopes into the cross-sectional shear strains efficiently originates quadratic couplings between the bending and torsional motions whose existence depends on non-vanishing first moments of area of the microbeam's cross-section. According to imperfections in microdevice fabrication, we assumed a minuscule offset in positions between the centroid of the as-fabricated and as-designed cross-sections of the microbeams. Energy approach is exploited to derive the equations of motion (EoM). The static response of the MEMS actuator together with its tuned eigenmodes are examined in this chapter. Chapter six reports the frequency- and voltage-displacement behaviors of the mirror addressing the 2:1 and 3:1 flexural torsional internal resonance experimentally and numerically. The numerical simulation results indicate that the in-plane motion, which is the directly excited mode, saturates upon the initiation of a 2:1 energy pathway between the bending and torsional motions. Through suitable tuning of the AC frequency, the amplitude of the in-plane motion is minimized, while the amplitude of the torsional motion, an indirectly excited mode, is maximized. The numerical simulation results demonstrate that the actuator's torsional motion, when subjected to a 1:2:1 electro-flexural-torsional modal interactions, is triggered by applying a maximum voltage of 10 V, resulting in about 15 degrees rotational angle. Further, prolific frequency combs are generated as a result of secondary Hopf bifurcations along the large-amplitude response branches, capturing quasi-periodicity in the MEMS dynamics. The experimental results demonstrate the mirror's dynamics exhibiting 3:1 flexural-torsional modal interaction that provides an efficient out-of-plane rotation drive through in-plane excitation. The present study is a platform for the implementation of a novel actuation mechanism of MEMS scanning micromirrors using parametric modal interaction. Conclusion remarks and propose future work with the are presented seven chapter.
  • Item
    Implementing Fairness in Real-World Healthcare Machine Learning through Datasheet for Database
    (University of Waterloo, 2024-05-28) Murugan, Anand
    Healthcare Machine Learning (HML) models are revolutionizing the healthcare industry, promising improved patient outcomes and enhanced public health. However, it is essential to ensure fairness, i.e., models delivering equitable performance to all individuals, irrespective of their inherent or acquired characteristics. This requires a thorough examination of the data used and the specific applications of these models. This study conducted a six-year systematic survey of models trained on the Medical Information Mart for Intensive Care (MIMIC) clinical research database (CRD) – one of the most popular and widely used HML databases to explore the link between data and fairness in HML. The results were striking: for the popular MIMIC IV – ICU mortality task, a naive baseline outperformed the state-of-the-art (SOTA) model in prediction performance, demonstrating greater fairness across subgroups (while still somewhat unfair). These findings demonstrate the urgent need to integrate fairness into healthcare machine learning models and a greater need to include practitioners in HML modeling. To achieve this, we propose a data-centric approach to fairness through our ‘Datasheet for MIMIC IV v2.0 CRD’, modeled after the recent works recommending datasheets for datasets. Given that MIMIC is large and complex, this datasheet will assist practitioners in identifying data anomalies and task-specific feature-target relationships during modeling, thereby fostering the development of equitable HML models.
  • Item
    Deep Graph Neural Networks for Spatiotemporal Forecasting of Sub-Seasonal Sea Ice: A Case Study in Hudson Bay
    (University of Waterloo, 2024-05-27) Gousseau, Zacharie
    This thesis introduces GraphSIFNet, a novel graph-based deep learning framework for spatiotemporal sea ice forecasting. GraphSIFNet employs a Graph Long-Short Term Memory (GCLSTM) module within a sequence-to-sequence architecture to predict daily sea ice concentration (SIC) and sea ice presence (SIP) in Hudson Bay over a 90-day time horizon. The use of graph networks allows the domain to be discretized into arbitrarily specified meshes. This study demonstrates the model's ability to forecast over an irregular mesh with higher spatial resolution near shorelines, and lower resolution otherwise. Utilizing atmospheric data from ERA5 and oceanographic data from GLORYS12, the model is trained to model complex spatial relationships pertinent to sea ice dynamics. Results demonstrate the model's superior skill over a linear combination of persistence and climatology as a statistical baseline. The model showed skill particularly in short- to medium-term (up to 35 days) SIC forecasts, with a noted reduction in root mean squared error by up to 10\% over the statistical baseline during the break-up season, and up to 5\% in the freeze-up season. Long-term (up to 90 days) SIP forecasts also showed significant improvements over the baseline, with increases in accuracy of around 10\% even at a lead time of 90 days. Variable importance analysis via feature ablation was conducted which highlighted current sea ice concentration and thickness as critical predictors. Thickness was shown to be important at longer lead times during the melting season suggesting its importance as an indicator of ice longevity, while concentration was shown to be more critical at shorter lead times which suggests it may act as an indicator of immediate ice integrity. The thesis lays the groundwork for future exploration into dynamic mesh-based forecasting, the use of more complex graph structures, and mesh-based forecasting of climate phenomena beyond sea ice.
  • Item
    Comparing 2-level and 3-level graded collision warning systems under distracted driving conditions
    (University of Waterloo, 2024-05-16) Shariatmadari, Khatereh
    This study delves into a comprehensive exploration of driver performance by comparing the effects of a 3-level graded collision warning system with those of a 2-level graded system. Employing a within-between-subject design, the experiment seeks to unravel the impact of graded warning levels (2-stage and 3-stage) on driving performance in both normal and critical driving conditions. Forty participants were recruited to undergo precise testing within a controlled driving simulator environment. The experimental setup involves dividing participants into two groups, each exposed to distinct collision warning paradigms. The first group experiences a two-level graded warning system, while the second group encounters a three-level graded warning system, structured based on Time to Collision (TTC) metrics. Each participant drove eight scenarios, including four normal and four critical scenarios. This strategic design allows for a comprehensive evaluation of the influence of warning system intricacies on various facets of driving behavior. The study encompasses an array of dependent variables, including eye-tracking data, wristband-derived physiological metrics, driver response times, and the incidence of collisions. This multifaceted approach ensures a holistic understanding of the drivers’ reactions under different collision warning paradigms. Results indicated that the 3-level graded system significantly reduced response times and collision frequencies compared to the 2-level system across both normal and critical driving conditions. Additionally, the 3-level system demonstrated better mitigation of driver distraction. While driving conditions did not significantly affect eye-tracking data, the warning level had a significant impact, with the 3-level system showing superior results. However, neither warning level nor driving condition significantly affected physiological data, including Electrodermal Activity (EDA), Heart Rate (HR) and Heart Rate Variability (HRV). Subjective evaluations highlighted the impact of collision warnings on driver performance, particularly in high-speed scenarios. Moreover, auditory warning modalities were preferred by a majority of participants. These findings provide valuable insights for the development of advanced collision warning systems, emphasizing the importance of multi-level warnings and preferred warning modalities in enhancing driver safety and reducing collision risks in diverse driving environments.
  • Item
    Applications of Strongly Coupled Electrostatic NEMS
    (University of Waterloo, 2024-04-30) Mouharrar, Hamza
    This work explores potential applications of electrostatic nanoelectromechanical systems (NEMS) in inertial sensing and Frequency Comb (FC) generation. NEMS inertial sensors exhibit exceptional sensitivity with low power consumption, making them ideal for portable gas sensors. We equip a novel ZnO NEMS with Metal-Organic Frameworks (MOFs) to ensure selectivity to volatile organic compounds (VOCs), resulting in a sensor with sensitivity ranging from 0.33 to 0.71 Hz/ppm and limits of detection from 4 to 9 ppb. This high sensitivity is attributed to the high porosity and large surface area of MAF-6. These findings pave the way for the development of MOF-coated NEMS sensors, promising advances in the field of gas sensing. We also present a novel low-power generation technique for frequency combs (FC) developed using modal interactions in electrostatic NEMS. Experimental results show a broadband FCs spectrum with a coherent phase. The proposed technique is flexible, enabling the generation of multiple frequency combs and fine-tuning of their Free Spectral Range (FSR). Additionally, we show an innovative approach that leverages internal resonances within a NEMS-phononic cavity to generate soliton frequency combs with over 3000 spectral lines, offering a breakthrough for quantum computing and metrology. The soliton generator can seamlessly be integrated into portable devices, aligning with contemporary miniaturized technology.
  • Item
    On Landmarks for Introducing 3D SLAM Structure to VPR
    (University of Waterloo, 2024-04-29) Bradley, Matthew
    Simultaneous Localization and Mapping (SLAM) is a critical foundation to a wide variety of robotic applications. Visual SLAM systems rely on Visual Place Recognition (VPR) for map maintenance and loop-closing so their quality suffers when VPR performance is impacted. In most VPR systems images are described compactly and stored for later comparison, with matches indicating that a scene has been seen before and has been revisited. Changes in illumination are a common difficulty for VPR image descriptors based on vocabularies of local features. Global descriptors which incorporate high-level structure are more robust to illumination, but are often sensitive to changes in viewpoint. There is an overall focus in VPR on describing single images despite the fact that SLAM systems recover 3D structure from the environment, and that this structure is both illumination invariant and remains the same regardless of vantage point. Work leveraging SLAM-recovered structure in the form of 3D points, in conjunction with LiDAR scan descriptors, has demonstrated superior VPR performance under harsh illumination compared with SoTA visual vocabulary descriptors. However, performance in general is not as high. A significant observed limitation was difficulty matching pseudo-LiDAR scans with significantly differing sub-regions. This is due to an assumption by the LiDAR descriptors used, that the entire volume of two corresponding scans should match. This does not fit well with the inherent sparsity of accumulated pointclouds from traversal by visual SLAM, due to differences in route, incomplete coverage, and the inherent sparsity of SLAM feature tracking in general. What is needed is an approach based on matching sub-regions which are common between pseudo-scans, in other words an approach performing place recognition based on landmarks. Here we explore generation of landmarks from accumulated SLAM structure through various clustering-based techniques, as well as the application of SoTA Grassmannian Graph-based association to match them. We present the challenges and successes of this approach to introducing 3D structure into VPR and propose various avenues of exploration to address the challenges faced. One of the foremost challenges is that pointclouds derived from SLAM are very sparse and uneven, making reliable and repeatable clustering difficult to achieve. We make significant improvement in landmark quality by using semantic labeling to provide better separation before clustering. While this has a noticeable impact on the number of outlier landmarks, we also find that there is an extreme sensitivity to outliers in the association method used. This sensitivity persists across data sets and seems inherent to this method of association. This precludes effective place recognition at this time, however in future work we expect this will be alleviated through the use of landmark descriptors for more effective outlier rejection. Descriptors can also provide putative associations which can be beneficial to landmark matching. We also propose various other enhancements to help improve landmark generation and association of landmarks for place recognition. It is our firm expectation that incorporation of 3D structure from SLAM systems into underlying VPR will be mutually beneficial, with VPR systems gaining additional descriptive capability which is fully invariant to illumination but more stable than viewpoint-sensitive 2D image structure.