A Unified and Hybrid Approach for Image-based Scene Change Detection and Pose-agnostic Object Anomaly Detection

No Thumbnail Available

Date

2024-12-17

Advisor

Zelek, John

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Image-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control. This is an object-central task that seeks to find anomalies from pose-known query images of a tested object given a set of reference images of a standard anomaly-free object. There is also a similar task: Image-based Scene Change Detection which focuses on the differences in a scene instead of an object. Image-based Scene Change Detection is a critical task in mapping and monitoring a scene, seeking to find the semantic changes in a scene described by two sets of images (reference and query) captured at different timestamps. For those industrial detection tasks, image sensors are widely used for their ease of use and low cost to acquire. However, the most commonly used image sensors: RGB cameras are only capable of capturing 2D information in the form of an RGB image from a specific angle of an object or a scene. While, in the context of imaged-based anomaly detection and change detection, reference images and query images are often taken from different poses; and the poses of the query views can be unknown. As a result, reference images and query images cannot be compared easily. Recent learning-based methods, for example, OmniposeAD and SplatPose employ Novel View Synthesis (NVS) Methods, i.e., NeRF and Gaussian Splatting to bridge the gap by simultaneously localizing the query image with respect to the reference images and synthesizing pseudo reference images for the query views for direct pixel-to-pixel comparison. However, these learning-based methods suffer from long localization overhead during the inference stage because inversed Neural Radiance Field methods, e.g., INeRF, can take hundreds of gradient descent steps to localize and refine the poses. This paper introduces a hybrid approach SplatPose+ that maintains both a learning-based model (Gaussian Splatting) for NVS and a structure from motion (SfM) model (Hierarchical Localization) for localization, which takes advantage of the fast training and inference of 3D Gaussian Splatting and the fast localization of Hierarchical Localization. On the Image-based Pose-Agnostic 3D Anomaly Detection task, although our proposed pipeline requires the computation of an additional SfM model, it offers real-time inference speeds and faster training compared to SplatPose. Quality-wise, we achieve a new SOTA on the Pose-agnostic Anomaly Detection benchmark with the Multi-Pose Anomaly Detection (MAD-SIM) dataset. On the Image-based Scene Change Detection task, we achieve a higher IoU than previous supervised methods on the binary change detection sub-task without environment variations. Moreover, we demonstrate the potential of combining SAM2 with SplatPose+ to further refine the object-level change masks toward higher accuracy.

Description

Keywords

change detection, anomaly detection, 3DGS, novel view synthesis

LC Subject Headings

Citation