Multi-Object Tracking using Mamba and an Investigation into Data Association Strategies
Loading...
Date
2025-03-19
Authors
Advisor
Zelek, John
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Multi-Object Tracking (MOT) is a critical component of computer vision, with applications spanning autonomous driving, video surveillance, sports analytics, and more. Despite significant advancements in tracking algorithms and computational power, challenges such as maintaining long-term identity associations, handling dynamic object counts, managing irregular movements, and mitigating occlusions persist, particularly in complex and dynamic environments. This research addresses these challenges by proposing a learning-based motion model that leverages past trajectories to improve motion prediction and object re-identification, and we also investigate how to maximize the performance of trackers with data association.
Inspired by recent advancements in state-space models (SSMs), particularly Mamba, we propose a novel learning-based architecture for motion prediction that combines the strengths of Mamba and self-attention layers to effectively capture non-linear motion patterns within the Tracking-By-Detection (TBD) paradigm. Mamba's input-dependent sequence modeling capabilities enable efficient and robust handling of long-range temporal dependencies, making it well for complex motion prediction tasks. Building on this foundation, we explore hybrid data association strategies to improve object tracking robustness, particularly in scenarios with occlusions and identity switches. By integrating stronger cues such as Intersection over Union (IoU) for spatial consistency and Re-Identification (Re-ID) for appearance-based matching, we enhance the reliability of object associations across frames, reducing errors in long-term tracking. Fast motion and partial overlaps often lead to identity mismatches in object tracking. Traditionally, spatial association relies on IoU, which can struggle in such scenarios. To address this, we enhance the cost matrix by incorporating Height-based IoU to handle partial overlaps more effectively. Additionally, we extend the original bounding boxes with a buffer to account for fast motion, thereby improving the robustness and accuracy of the spatial association process. Additionally, we study the impact of dynamically updating the feature bank for Re-ID during the matching stage, culminating in a refined weighted cost matrix. To further address challenges in identity switching and trajectory consistency, we introduce the concept of virtual detections in overlapping scenarios and explore its effectiveness in mitigating ID switches.
Developing a robust and accurate MOT tracker demands a critical interplay between accurate motion modeling and a sophisticated combination of stronger and weaker cues in data association. Through extensive experimental evaluations on challenging benchmarks such as DanceTrack and SportsMOT, the proposed approaches achieve significant performance gains, with HOTA scores of 63.16% and 77.26% respectively, surpassing multiple existing state-of-the-art methods. Notably, our approach outperforms DiffMOT by 0.9% on DanceTrack and 0.06% on SportsMOT, while achieving 3- 7% improvements over other learning-based motion models. This work contributes to advancing MOT systems capable of achieving high performance across diverse and demanding scenarios.
Description
Keywords
Multi-Object Tracking, Object Detection, Mamba, Data Association