Spatial-Temporal Computer Vision Methods for Automated Vision-Based Visual Inspection

Midwinter, Max Xuhao Xue

Spatial-Temporal Computer Vision Methods for Automated Vision-Based Visual Inspection

Files

Midwinter_MaxXuHaoXue.pdf (110.58 MB)

Date

2026-06-08

Authors

Midwinter, Max Xuhao Xue

Advisor

Yeum, Chul Min

Publisher

University of Waterloo

Abstract

The objective of this thesis is to investigate how spatial and temporal context can be leveraged to enhance automated vision-based visual inspection (AVVI). The prevailing paradigm in AVVI is the single-shot supervised deep semantic inference model, where an image is processed independently and the resulting semantic prediction is compared against labeled data to generate a supervision signal. While these methods have demonstrated strong performance for defect detection tasks, they often neglect the spatial and temporal context in which inspection data are collected. In practice, engineers rarely make decisions based on a single observation in isolation; instead, they rely on contextual information such as multiple viewpoints of a region of interest, geometric cues for estimating defect scale, and comparisons with previous inspection records. This thesis therefore explores how contextual information inherent in inspection workflows can be incorporated directly into the inference process. Three research challenges are investigated in my thesis: leveraging multi-view imagery to improve defect segmentation, developing and evaluating spatial inference models for defect quantification in civil infrastructure, and enabling visual change detection between unordered sets of inspection data. In Chapter 3, multi-view spatial relationships between inspection images are used to refine segmentations from an unsupervised feature-clustering semantic segmentation model through a novel iterative stochastic consensus algorithm. In Chapter 4, a civil infrastructure RGB-D dataset is created using a custom handheld Light Detection and Ranging scanner, consisting of five short- to medium-span overpass bridges used to benchmark monocular metric depth estimation methods for defect measurement. In Chapter 5, synchronized pairs of novel view synthesis models are constructed to generate pixel-aligned renders of the same structure across inspection events, enabling visual change detection. Finally, Chapter 6 discusses the implications of this research for industrial inspection workflows and possible directions for future work.