Fine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging

Carbone, Kathryn

Fine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging

dc.contributor.author	Carbone, Kathryn
dc.date.accessioned	2025-09-19T18:22:02Z
dc.date.available	2025-09-19T18:22:02Z
dc.date.issued	2025-09-19
dc.date.submitted	2025-09-15
dc.description.abstract	Image analysis in domains that produce large amounts of complex visual data, like medicine, is challenging due to time and labour-constraints on domain experts. Visual entity linking (VEL) is a preliminary image processing task which links regions of interest (RoIs) to known entities in structured knowledge bases (KBs), thereby using knowledge to scaffold image understanding. We study a targeted VEL problem in which a specific user-highlighted RoI within the image is used to query a textual KB for information about the RoI, which can support downstream tasks such as similar case retrieval and question answering. For example, a doctor reviewing an MRI scan may wish to obtain images with similar presentations of a medically relevant RoI, such as a brain tumor, for comparison. By linking this RoI to its corresponding KB document, search of an imaging database with VEL-guided automatically-generated tags can be performed in a knowledge-aware manner based on exact or semantically similar entity tag matching. Cross-modal embedding models like CLIP present straightforward solutions through the dual encoding of KB entries and either whole images or cropped RoIs, which can then be matched by a vector similarity search between these respective learned representations. However, using the whole image as the query may retrieve KB entries related to other aspects of the image besides the RoI; at the same time, using the RoI alone as the query ignores context, which is critical for recognizing and linking complex entities such as those found in medical images. To address these shortcomings, this thesis proposes VELCRO—visual entity linking with contrastive RoI alignment—which adapts an image segmentation model to VEL using contrastive learning by aligning the contextual embeddings produced by its decoder with the KB. This strategy preserves the information contained in the surrounding image while focusing KB alignment specifically on the RoI. To accomplish this, VELCRO performs segmentation and contrastive alignment in one end-to-end model via a novel loss function that combines the two objectives. Experimental results on medical VEL show that VELCRO achieves an overall linking accuracy of 95.2% compared to 83.9% for baseline approaches.
dc.identifier.uri	https://hdl.handle.net/10012/22489
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.relation.uri	https://github.com/carbonkat/VELCRO
dc.relation.uri	https://www.codabench.org/competitions/1847/
dc.subject	artificial intelligence
dc.subject	healthcare
dc.subject	multi-modal
dc.subject	representation learning
dc.title	Fine-Grained Visual Entity Linking through Promptable Segmentation: Applications in Medical Imaging
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Cohen, Robin
uws.contributor.advisor	Golab, Lukasz
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Carbone_Kathryn.pdf
Size:: 8.58 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science