The Libraries will be performing routine maintenance on UWSpace on October 13th, 2025, from 8 - 9 am ET. UWSpace will be unavailable during this time. Service should resume by 9 am ET.
 

Fine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging

dc.contributor.authorCarbone, Kathryn
dc.date.accessioned2025-09-19T18:22:02Z
dc.date.available2025-09-19T18:22:02Z
dc.date.issued2025-09-19
dc.date.submitted2025-09-15
dc.description.abstractImage analysis in domains that produce large amounts of complex visual data, like medicine, is challenging due to time and labour-constraints on domain experts. Visual entity linking (VEL) is a preliminary image processing task which links regions of interest (RoIs) to known entities in structured knowledge bases (KBs), thereby using knowledge to scaffold image understanding. We study a targeted VEL problem in which a specific user-highlighted RoI within the image is used to query a textual KB for information about the RoI, which can support downstream tasks such as similar case retrieval and question answering. For example, a doctor reviewing an MRI scan may wish to obtain images with similar presentations of a medically relevant RoI, such as a brain tumor, for comparison. By linking this RoI to its corresponding KB document, search of an imaging database with VEL-guided automatically-generated tags can be performed in a knowledge-aware manner based on exact or semantically similar entity tag matching. Cross-modal embedding models like CLIP present straightforward solutions through the dual encoding of KB entries and either whole images or cropped RoIs, which can then be matched by a vector similarity search between these respective learned representations. However, using the whole image as the query may retrieve KB entries related to other aspects of the image besides the RoI; at the same time, using the RoI alone as the query ignores context, which is critical for recognizing and linking complex entities such as those found in medical images. To address these shortcomings, this thesis proposes VELCRO—visual entity linking with contrastive RoI alignment—which adapts an image segmentation model to VEL using contrastive learning by aligning the contextual embeddings produced by its decoder with the KB. This strategy preserves the information contained in the surrounding image while focusing KB alignment specifically on the RoI. To accomplish this, VELCRO performs segmentation and contrastive alignment in one end-to-end model via a novel loss function that combines the two objectives. Experimental results on medical VEL show that VELCRO achieves an overall linking accuracy of 95.2% compared to 83.9% for baseline approaches.
dc.identifier.urihttps://hdl.handle.net/10012/22489
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/carbonkat/VELCRO
dc.relation.urihttps://www.codabench.org/competitions/1847/
dc.subjectartificial intelligence
dc.subjecthealthcare
dc.subjectmulti-modal
dc.subjectrepresentation learning
dc.titleFine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.contributor.advisorCohen, Robin
uws.contributor.advisorGolab, Lukasz
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Carbone_Kathryn.pdf
Size:
8.58 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: