Harmonizing the Scale: An End-to-End Self-Supervised Approach for Cross-Modal Data Retrieval in Histopathology Archives

dc.contributor.advisorTizhoosh, Hamid
dc.contributor.advisorRahnamayan, Shahryar
dc.contributor.authorMaleki, Danial
dc.date.accessioned2023-09-18T19:48:41Z
dc.date.issued2023-09-18
dc.date.submitted2023-09-13
dc.description.abstractIn recent years, the exponential growth of data across various domains has necessitated the development of advanced techniques to process and analyze multi-modal big data. This is particularly relevant in the medical domain where data comes in diverse formats, such as images, reports, and molecular data. Consequently, bidirectional cross-modal data retrieval has become crucial for numerous research disciplines and domains. Cross-modal retrieval seeks to identify a shared latent space where different modalities, such as image-text pairs, are closely related. Obtaining high-quality vision and text embeddings is vital for achieving this objective. Although training language models is feasible due to the availability of public data and the absence of labelling requirements, training vision models to generate effective embeddings can be challenging due to the scarcity of labelled data when relying on supervised models. To address this challenge, an end-to-end approach to learning vision embeddings in a self-supervised manner, coined H-DINO+LILE, is introduced through a modification of the DINO model. The suggested innovation to improve the DINO model involves transforming the existing local and global patching scheme into a new harmonizing patching approach, termed H-DINO, where the magnitude of various augmentations is consistently maintained. This method captures the contextual information of images more consistently, thereby improving feature representation and retrieval accuracy. Furthermore, a unique architecture is proposed that integrates self-supervised learning and cross-modal retrieval modules in a back-to-back configuration, enabling improved representation of cross-modal and individual modalities using self-attention and cross-attention modules. This architecture features end-to-end training with a new loss term that facilitates image and text representation in the joint latent space. The efficacy of the proposed framework is validated on various private and public datasets across diverse tasks such as patch-based (sub-images) and WSI-based (whole slide images) retrieval, as well as text retrieval tasks. This thesis demonstrates that the proposed framework significantly bolsters cross-modal retrieval within the medical domain. Moreover, its applicability extends beyond the medical field, as it can be utilized in other domains that require cross-modal retrieval and contain patching of gigapixel images in their methodologies.en
dc.identifier.urihttp://hdl.handle.net/10012/19872
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectMachine Learingen
dc.subjectSelf Supervised Learningen
dc.subjectCross Modality Retrievalen
dc.subjectDigital Pathologyen
dc.titleHarmonizing the Scale: An End-to-End Self-Supervised Approach for Cross-Modal Data Retrieval in Histopathology Archivesen
dc.typeDoctoral Thesisen
uws-etd.degreeDoctor of Philosophyen
uws-etd.degree.departmentSystems Design Engineeringen
uws-etd.degree.disciplineSystem Design Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo2024-09-17T19:48:41Z
uws-etd.embargo.terms1 yearen
uws.comment.hiddenDanial Maleki PhD Thesisen
uws.contributor.advisorTizhoosh, Hamid
uws.contributor.advisorRahnamayan, Shahryar
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Maleki_Danial.pdf
Size:
25.81 MB
Format:
Adobe Portable Document Format
Description:
Danial_Maleki_PhD_Thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: