Harmonizing the Scale: An End-to-End Self-Supervised Approach for Cross-Modal Data Retrieval in Histopathology Archives
dc.contributor.advisor | Tizhoosh, Hamid | |
dc.contributor.advisor | Rahnamayan, Shahryar | |
dc.contributor.author | Maleki, Danial | |
dc.date.accessioned | 2023-09-18T19:48:41Z | |
dc.date.issued | 2023-09-18 | |
dc.date.submitted | 2023-09-13 | |
dc.description.abstract | In recent years, the exponential growth of data across various domains has necessitated the development of advanced techniques to process and analyze multi-modal big data. This is particularly relevant in the medical domain where data comes in diverse formats, such as images, reports, and molecular data. Consequently, bidirectional cross-modal data retrieval has become crucial for numerous research disciplines and domains. Cross-modal retrieval seeks to identify a shared latent space where different modalities, such as image-text pairs, are closely related. Obtaining high-quality vision and text embeddings is vital for achieving this objective. Although training language models is feasible due to the availability of public data and the absence of labelling requirements, training vision models to generate effective embeddings can be challenging due to the scarcity of labelled data when relying on supervised models. To address this challenge, an end-to-end approach to learning vision embeddings in a self-supervised manner, coined H-DINO+LILE, is introduced through a modification of the DINO model. The suggested innovation to improve the DINO model involves transforming the existing local and global patching scheme into a new harmonizing patching approach, termed H-DINO, where the magnitude of various augmentations is consistently maintained. This method captures the contextual information of images more consistently, thereby improving feature representation and retrieval accuracy. Furthermore, a unique architecture is proposed that integrates self-supervised learning and cross-modal retrieval modules in a back-to-back configuration, enabling improved representation of cross-modal and individual modalities using self-attention and cross-attention modules. This architecture features end-to-end training with a new loss term that facilitates image and text representation in the joint latent space. The efficacy of the proposed framework is validated on various private and public datasets across diverse tasks such as patch-based (sub-images) and WSI-based (whole slide images) retrieval, as well as text retrieval tasks. This thesis demonstrates that the proposed framework significantly bolsters cross-modal retrieval within the medical domain. Moreover, its applicability extends beyond the medical field, as it can be utilized in other domains that require cross-modal retrieval and contain patching of gigapixel images in their methodologies. | en |
dc.identifier.uri | http://hdl.handle.net/10012/19872 | |
dc.language.iso | en | en |
dc.pending | false | |
dc.publisher | University of Waterloo | en |
dc.subject | Machine Learing | en |
dc.subject | Self Supervised Learning | en |
dc.subject | Cross Modality Retrieval | en |
dc.subject | Digital Pathology | en |
dc.title | Harmonizing the Scale: An End-to-End Self-Supervised Approach for Cross-Modal Data Retrieval in Histopathology Archives | en |
dc.type | Doctoral Thesis | en |
uws-etd.degree | Doctor of Philosophy | en |
uws-etd.degree.department | Systems Design Engineering | en |
uws-etd.degree.discipline | System Design Engineering | en |
uws-etd.degree.grantor | University of Waterloo | en |
uws-etd.embargo | 2024-09-17T19:48:41Z | |
uws-etd.embargo.terms | 1 year | en |
uws.comment.hidden | Danial Maleki PhD Thesis | en |
uws.contributor.advisor | Tizhoosh, Hamid | |
uws.contributor.advisor | Rahnamayan, Shahryar | |
uws.contributor.affiliation1 | Faculty of Engineering | en |
uws.peerReviewStatus | Unreviewed | en |
uws.published.city | Waterloo | en |
uws.published.country | Canada | en |
uws.published.province | Ontario | en |
uws.scholarLevel | Graduate | en |
uws.typeOfResource | Text | en |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Maleki_Danial.pdf
- Size:
- 25.81 MB
- Format:
- Adobe Portable Document Format
- Description:
- Danial_Maleki_PhD_Thesis
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 6.4 KB
- Format:
- Item-specific license agreed upon to submission
- Description: