Zero-Shot Monocular Motion Segmentation: A Fusion of Deep Learning and Geometric Approaches
Loading...
Date
2024-04-29
Authors
Huang, Yuxiang
Advisor
Zelek, John
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Identifying and segmenting moving objects from a moving monocular camera is difficult when there is unknown camera motion, different types of object motions and complex scene structures. Deep learning methods achieve impressive results for generic motion segmentation, but they require massive training data and do not generalize well to novel scenes and objects. Conversely, recent geometric methods show promising results by fusing different geometric models together, but they require manually corrected point trajectories and cannot generate a coherent segmentation mask.
This work proposes an innovative zero-shot motion segmentation approach that seamlessly combines the strengths of deep learning and geometric methods. The proposed method first generates object proposals for every video frame by using state-of-the-art foundation models, and then extracts different object-specific motion cues. Finally, the method uses multi-view spectral clustering to synergistically fuse different motion cues together to cluster objects into distinct motion groups, resulting in a coherent segmentation. The key contributions of this work are as follows:
1) Proposing the first zero-shot motion segmentation pipeline that performs dense motion segmentation on different scenes and object classes without any training.
2)This work is the first to combine epipolar geometry and optical flow-based motion models for motion segmentation. Multi-view spectral clustering is used to effectively combine different motion models to achieve good motion segmentation results in complex scenes
Through extensive experimentation and comparative analysis, we validate the efficacy of the proposed method. Despite not being trained on any data, the method is able to achieve competitive results on real-world datasets, some of which are even better than those of the state-of-the-art motion segmentation methods trained in a supervised manner. This work not only contributes to the advancement of monocular motion segmentation, but also shows that combining different geometric motion models and motion cues is very important in analyzing the motions of objects.
Description
Keywords
computer vision, motion segmentation, monocular motion segmentation, video segmentation