Zero-Shot Monocular Motion Segmentation: A Fusion of Deep Learning and Geometric Approaches

Loading...
Thumbnail Image

Date

2024-04-29

Authors

Huang, Yuxiang

Advisor

Zelek, John

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Identifying and segmenting moving objects from a moving monocular camera is difficult when there is unknown camera motion, different types of object motions and complex scene structures. Deep learning methods achieve impressive results for generic motion segmentation, but they require massive training data and do not generalize well to novel scenes and objects. Conversely, recent geometric methods show promising results by fusing different geometric models together, but they require manually corrected point trajectories and cannot generate a coherent segmentation mask. This work proposes an innovative zero-shot motion segmentation approach that seamlessly combines the strengths of deep learning and geometric methods. The proposed method first generates object proposals for every video frame by using state-of-the-art foundation models, and then extracts different object-specific motion cues. Finally, the method uses multi-view spectral clustering to synergistically fuse different motion cues together to cluster objects into distinct motion groups, resulting in a coherent segmentation. The key contributions of this work are as follows: 1) Proposing the first zero-shot motion segmentation pipeline that performs dense motion segmentation on different scenes and object classes without any training. 2)This work is the first to combine epipolar geometry and optical flow-based motion models for motion segmentation. Multi-view spectral clustering is used to effectively combine different motion models to achieve good motion segmentation results in complex scenes Through extensive experimentation and comparative analysis, we validate the efficacy of the proposed method. Despite not being trained on any data, the method is able to achieve competitive results on real-world datasets, some of which are even better than those of the state-of-the-art motion segmentation methods trained in a supervised manner. This work not only contributes to the advancement of monocular motion segmentation, but also shows that combining different geometric motion models and motion cues is very important in analyzing the motions of objects.

Description

Keywords

computer vision, motion segmentation, monocular motion segmentation, video segmentation

LC Keywords

Citation