The Libraries will be performing routine maintenance on UWSpace on October 13th, 2025, from 8 - 9 am ET. UWSpace will be unavailable during this time. Service should resume by 9 am ET.
 

Addressing Domain Shifts for Computer Vision Applications via Language

dc.contributor.authorLiu, Chang
dc.date.accessioned2025-05-23T13:48:40Z
dc.date.available2025-05-23T13:48:40Z
dc.date.issued2025-05-23
dc.date.submitted2025-05-07
dc.description.abstractSemantic segmentation is used in safety-critical applications such as autonomous driving and cancer diagnosis, where accurately identifying small and rare objects is essential. However, pixel-level annotations are expensive and time-consuming, and distribution shifts (e.g. daytime to snowy weather in self-driving, color variations between tumor scans across hospitals) between datasets further degrade model generalization capabilities. Unsupervised domain adaptation for semantic segmentation (DASS) addresses this challenge by training models on labeled source distributions and adapting them to unlabeled target domains. Existing DASS methods rely on either vision-only approaches or language-based techniques. Vision-only frameworks, such as masking and utilizing multi-resolution crops, implicitly learn spatial relationships between different image patches but often suffer from noisy pseudo-labels biased toward the source domain. To mitigate noisy predictions, language-based DASS methods leverage generalized representations from large-scale language pre-training. However, those approaches use generic class-level prompts (e.g., "a photo of a \{class\}") and fail to capture complex spatial relationships between objects, which are key for dense prediction tasks like semantic segmentation. To address these limitations, we propose LangDA, a language-guided DASS framework that enhances spatial context-awareness by leveraging vision-language models (VLMs). LangDA generates scene-level descriptions (e.g., "a pedestrian is on the sidewalk, and the street is lined with buildings") to encode object relationships. At an image-level, LangDA aligns an image's feature representation with the corresponding scene-level text embedding, improving the model’s ability to generalize across domains. LangDA eliminates the need for cumbersome manual prompt tuning and expensive human feedback, ensuring consistency and reproducibility. LangDA achieves state-of-the-art performance on three self-driving DASS benchmarks: Synthia to Cityscapes, Cityscapes to ACDC, and Cityscapes to DarkZurich, surpassing existing methods by 2.6\%, 1.4\%, and 3.9\%, respectively. Ablation studies confirm the effectiveness of context-aware image-level alignment over pixel-level alignment. These results demonstrate LangDA’s capability to leverage spatial relationships encoded in language to accurately segment objects under domain shift.
dc.identifier.urihttps://hdl.handle.net/10012/21775
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectSemantic Segmentation
dc.subjectMachine Learning
dc.subjectUnsupervised Domain Adaptation
dc.subjectDeep Learning
dc.subjectImage Segmentation
dc.subjectSelf-driving
dc.subjectDomain Shift
dc.subjectDistribution Shift
dc.subjectComputer Vision
dc.subjectVision Language Models
dc.subjectLarge Language Models
dc.subjectCross-model
dc.subjectMulti-modal
dc.subjectArtificial Intelligence
dc.titleAddressing Domain Shifts for Computer Vision Applications via Language
dc.typeMaster Thesis
uws-etd.degreeMaster of Applied Science
uws-etd.degree.departmentSystems Design Engineering
uws-etd.degree.disciplineSystem Design Engineering
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorRambhatla, Sirisha
uws.contributor.advisorWong, Alexander
uws.contributor.affiliation1Faculty of Engineering
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Liu_Chang.pdf
Size:
99.67 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: