Histopathology Image Analysis and NLP for Digital Pathology

Allada, Aishwarya Allada

Histopathology Image Analysis and NLP for Digital Pathology

Files

allada_aishwaryakrishna.pdf (9.48 MB)

Date

2021-08-27

Authors

Allada, Aishwarya Allada

Advisor

Crowley, Mark

Publisher

University of Waterloo

Abstract

Information technologies based on ML with quantitative imaging and texts are playing an essential role, particularly in general medicine and oncology. DL in particular has demonstrated significant breakthroughs in Computer Vision and NLP which could enhance disease detection and the establishment of efficient treatments. Furthermore, considering a large number of people with cancer and the substantial volume of data generated during cancer treatment, there is a significant interest in the use of AI to improve oncologic care. In digital pathology, high-resolution microscope images of tissue samples are stored along with written medical reports in databases that are used by pathologists. The diagnosis is made through tissue analysis of the biopsy sample and is written as a brief unstructured report which is stored as free text in Electronic Medical Record (EMR)systems. For the transition towards digitization of medical records to achieve its maximum benefits, these reports must be accessible and usable by medical practitioners to easily understand them and help them precisely identify the disease. Concerning the histopathology images, which is the basis of diagnosis and study of diseases of the tissues, image analysis helps us identify the disease’s location and allows us to classify the type of cancer. Recently, due to the abundant accumulation of WSIs, there has been an increased demand for effective and efficient gigapixel image analysis, such as computer-aided diagnosis using DL techniques. Also, due to the high diversity of shapes and structures in WSIs, it is not possible to use conventional DL techniques for classification. Though computer-aided diagnosis using DL has good prediction accuracy, in the medical domain, there is a need to explain the prediction of the model to have a better understanding beyond standard quantitative performance evaluation. This thesis presents three different findings. Firstly, I provide a comparative analysis of various transformer models such as BioBERT, Clinical BioBERT, BioMed-RoBERTaand TF-IDF and our results demonstrate the effectiveness of various word embedding techniques for pathology reports in the classification task. Secondly, with the help of slide labels of WSIs, I classify them to their disease types, with an architecture having an attention mechanism and instance-level clustering. Finally, I introduced a method to fuse the features of the pathology reports and the features of their respective images. I investigated the effect of the combination of the features in the classification of both histopathology images and their respective reports simultaneously. This proved to be better than the individual classification tasks achieving an accuracy of 95.73%.