Spatial-Temporal Computer Vision Methods for Automated Vision-Based Visual Inspection

Midwinter, Max Xuhao Xue

Spatial-Temporal Computer Vision Methods for Automated Vision-Based Visual Inspection

dc.contributor.author	Midwinter, Max Xuhao Xue
dc.date.accessioned	2026-06-08T20:18:04Z
dc.date.available	2026-06-08T20:18:04Z
dc.date.issued	2026-06-08
dc.date.submitted	2026-05-25
dc.description.abstract	The objective of this thesis is to investigate how spatial and temporal context can be leveraged to enhance automated vision-based visual inspection (AVVI). The prevailing paradigm in AVVI is the single-shot supervised deep semantic inference model, where an image is processed independently and the resulting semantic prediction is compared against labeled data to generate a supervision signal. While these methods have demonstrated strong performance for defect detection tasks, they often neglect the spatial and temporal context in which inspection data are collected. In practice, engineers rarely make decisions based on a single observation in isolation; instead, they rely on contextual information such as multiple viewpoints of a region of interest, geometric cues for estimating defect scale, and comparisons with previous inspection records. This thesis therefore explores how contextual information inherent in inspection workflows can be incorporated directly into the inference process. Three research challenges are investigated in my thesis: leveraging multi-view imagery to improve defect segmentation, developing and evaluating spatial inference models for defect quantification in civil infrastructure, and enabling visual change detection between unordered sets of inspection data. In Chapter 3, multi-view spatial relationships between inspection images are used to refine segmentations from an unsupervised feature-clustering semantic segmentation model through a novel iterative stochastic consensus algorithm. In Chapter 4, a civil infrastructure RGB-D dataset is created using a custom handheld Light Detection and Ranging scanner, consisting of five short- to medium-span overpass bridges used to benchmark monocular metric depth estimation methods for defect measurement. In Chapter 5, synchronized pairs of novel view synthesis models are constructed to generate pixel-aligned renders of the same structure across inspection events, enabling visual change detection. Finally, Chapter 6 discusses the implications of this research for industrial inspection workflows and possible directions for future work.
dc.identifier.uri	https://hdl.handle.net/10012/23572
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	visual inspection
dc.subject	AI
dc.subject	deep learning
dc.subject	computer vision
dc.title	Spatial-Temporal Computer Vision Methods for Automated Vision-Based Visual Inspection
dc.type	Doctoral Thesis
uws-etd.degree	Doctor of Philosophy
uws-etd.degree.department	Civil and Environmental Engineering
uws-etd.degree.discipline	Civil Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Yeum, Chul Min
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Midwinter_MaxXuHaoXue.pdf
Size:: 110.58 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses