Instance Segmentation with Occlusion Order Supervision: Two Problems

No Thumbnail Available

Date

2025-09-05

Advisor

Veksler, Olga

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Joint architectures for predicting instance masks with additional depth-related information have shown improvements in performance on both the original segmentation tasks and their corresponding depth-related tasks. Closely related to depth, occlusion can also provide strong cues for segmentation. Although occlusion ordering between instances provides less supervision than full pixel-wise depth, occlusions can be easily annotated for existing datasets. Motivated by the above, we propose two problems that incorporate occlusion information into standard segmentation tasks with their corresponding methods. For both methods, we explore the level of supervision occlusion provides in terms of segmentation accuracy by comparing our results with baseline segmentation approaches trained without occlusion supervision. Firstly, we develop an end-to-end framework to perform instance segmentation and per-instance global occlusion order prediction simultaneously by appending a global occlusion order head to standard instance segmentation architectures. Our approach to occlusions differs from most prior work. Prior work performs occlusion estimation in a local pairwise manner: given two instances, classify one of them as the occluder. Due to locality, for scenes known not to have occlusion cycles, such approaches can (and do) produce occlusion cycles, which are errors. Unlike most prior work, we directly label instances with their occlusion-order labels, where an instance with a larger label occludes any neighboring instances with smaller labels. This approach is cycle-free by design. Using cross-entropy with occlusion-order labels fails as occlusion-order labels do not have a fixed semantic meaning. Therefore, we develop a novel regularized loss function for successful training. Our framework achieves high occlusion-order accuracy with improved performance in instance segmentation, likely due to the added supervision. Secondly, we develop a new direction for occlusion-supervised amodal instance segmentation (AIS). AIS is an emerging task that segments the complete object instance, both the visible and occluded parts. All prior work for AIS can be divided into two groups: (i) methods constructing a synthetic amodal dataset; (ii) methods using human-annotated amodal masks. The drawback of methods in the first group is that constructing a realistic synthetic dataset is difficult. The drawback of methods in the second group is that human annotation is prone to error, as humans must reason about invisible regions. Our method requires neither a synthetic dataset nor human-annotated amodal masks, but instead uses ground-truth modal masks and the pairwise occlusion order between instances. By using the occlusion order to reason about the visible and invisible parts of the amodal mask, we develop effective loss functions for the visible and occluded parts of the predicted mask. We achieve comparable accuracies to those of the same architecture trained on human-annotated amodal masks, despite not using amodal masks for training.

Description

Keywords

instance segmentation, occlusion ordering, computer vision

LC Subject Headings

Citation