Superpixel Salient Object Detection

dc.contributor.authorPark, Jinman
dc.date.accessioned2025-08-28T17:31:57Z
dc.date.available2025-08-28T17:31:57Z
dc.date.issued2025-08-28
dc.date.submitted2025-07-23
dc.description.abstractSalient object detection (SOD) is a core problem in computer vision that involves identifying and segmenting the most visually prominent regions in an image. Its relevance spans a wide range of applications, including image understanding, object recognition, scene parsing, and human-computer interaction, as well as safety-critical domains such as autonomous driving, robotics, and medical imaging. Despite substantial progress, modern SOD models often rely on dense, pixel-level computation that imposes high computational and memory costs, limiting their deployment in resource-constrained environments. This thesis investigates an alternative paradigm for salient object detection based on superpixel representations—compact, perceptually homogeneous regions that reduce spatial redundancy while preserving boundary structure. Although superpixels offer significant efficiency advantages, their irregular and heterogeneous nature presents unique challenges for integration into modern deep learning frameworks. Furthermore, traditional augmentation strategies and transfer learning pipelines are not readily compatible with superpixel-based models, complicating training and generalization. To address these challenges, we propose SuperFormer, a lightweight vision transformer architecture tailored for superpixel-based saliency detection. Our contributions include: (1) formulating SOD as a superpixel-to-superpixel learning task to reduce computational overhead, (2) introducing a heterogeneity-aware feature representation that fuses color, texture, and shape cues, (3) adapting vision transformer architectures to operate on irregular superpixel inputs via novel positional encodings and mix-attention decoding, (4) designing superpixel-specific augmentation strategies and demonstrating the effectiveness of ImageNet pre-training in this context, and (5) conducting extensive evaluations on seven benchmark datasets, where our models (SF-S, SF-XS, and SF-XXS) achieve state-of-the- art results among lightweight SOD methods in terms of accuracy, FLOPs, and parameter efficiency. Overall, this thesis demonstrates that with the proper architectural adaptations and training strategies, superpixel representations can enable efficient, interpretable, and high-performing salient object detection, laying the groundwork for broader adoption of structured visual abstractions in deep learning.
dc.identifier.urihttps://hdl.handle.net/10012/22314
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectsuperpixel
dc.subjectsalient object detection
dc.subjectvision transformers
dc.subjectheterogeneity
dc.subjectefficiency
dc.subjectfeature augmentation
dc.titleSuperpixel Salient Object Detection
dc.typeDoctoral Thesis
uws-etd.degreeDoctor of Philosophy
uws-etd.degree.departmentSystems Design Engineering
uws-etd.degree.disciplineSystem Design Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorFieguth, Paul
uws.contributor.advisorClausi, David
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Park_Jinman.pdf
Size:
11.36 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: