Addressing Domain Shifts for Computer Vision Applications via Language

Liu, Chang

Addressing Domain Shifts for Computer Vision Applications via Language

dc.contributor.author	Liu, Chang
dc.date.accessioned	2025-05-23T13:48:40Z
dc.date.available	2025-05-23T13:48:40Z
dc.date.issued	2025-05-23
dc.date.submitted	2025-05-07
dc.description.abstract	Semantic segmentation is used in safety-critical applications such as autonomous driving and cancer diagnosis, where accurately identifying small and rare objects is essential. However, pixel-level annotations are expensive and time-consuming, and distribution shifts (e.g. daytime to snowy weather in self-driving, color variations between tumor scans across hospitals) between datasets further degrade model generalization capabilities. Unsupervised domain adaptation for semantic segmentation (DASS) addresses this challenge by training models on labeled source distributions and adapting them to unlabeled target domains. Existing DASS methods rely on either vision-only approaches or language-based techniques. Vision-only frameworks, such as masking and utilizing multi-resolution crops, implicitly learn spatial relationships between different image patches but often suffer from noisy pseudo-labels biased toward the source domain. To mitigate noisy predictions, language-based DASS methods leverage generalized representations from large-scale language pre-training. However, those approaches use generic class-level prompts (e.g., "a photo of a \{class\}") and fail to capture complex spatial relationships between objects, which are key for dense prediction tasks like semantic segmentation. To address these limitations, we propose LangDA, a language-guided DASS framework that enhances spatial context-awareness by leveraging vision-language models (VLMs). LangDA generates scene-level descriptions (e.g., "a pedestrian is on the sidewalk, and the street is lined with buildings") to encode object relationships. At an image-level, LangDA aligns an image's feature representation with the corresponding scene-level text embedding, improving the model’s ability to generalize across domains. LangDA eliminates the need for cumbersome manual prompt tuning and expensive human feedback, ensuring consistency and reproducibility. LangDA achieves state-of-the-art performance on three self-driving DASS benchmarks: Synthia to Cityscapes, Cityscapes to ACDC, and Cityscapes to DarkZurich, surpassing existing methods by 2.6\%, 1.4\%, and 3.9\%, respectively. Ablation studies confirm the effectiveness of context-aware image-level alignment over pixel-level alignment. These results demonstrate LangDA’s capability to leverage spatial relationships encoded in language to accurately segment objects under domain shift.
dc.identifier.uri	https://hdl.handle.net/10012/21775
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Semantic Segmentation
dc.subject	Machine Learning
dc.subject	Unsupervised Domain Adaptation
dc.subject	Deep Learning
dc.subject	Image Segmentation
dc.subject	Self-driving
dc.subject	Domain Shift
dc.subject	Distribution Shift
dc.subject	Computer Vision
dc.subject	Vision Language Models
dc.subject	Large Language Models
dc.subject	Cross-model
dc.subject	Multi-modal
dc.subject	Artificial Intelligence
dc.title	Addressing Domain Shifts for Computer Vision Applications via Language
dc.type	Master Thesis
uws-etd.degree	Master of Applied Science
uws-etd.degree.department	Systems Design Engineering
uws-etd.degree.discipline	System Design Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Rambhatla, Sirisha
uws.contributor.advisor	Wong, Alexander
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Liu_Chang.pdf
Size:: 99.67 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Systems Design Engineering