Explainable AI for retinal OCT diagnosis
MetadataShow full item record
Artificial intelligence methods such as deep learning are leading to great progress in complex tasks that are usually associated with human intelligence and experience. Deep learning models have matched if not bettered human performance for medical diagnosis tasks including retinal diagnosis. Given a sufficient amount of data and computational resources, these models can perform classification and segmentation as well as related tasks such as image quality improvement. The adoption of these systems in actual healthcare centers has been limited due to the lack of reasoning behind their decisions. This black box nature along with upcoming regulations for transparency and privacy exacerbates the ethico-legal challenges faced by deep learning systems. The attribution methods are a way to explain the decisions of a deep learning model by generating a heatmap of the features which have the most contribution to the model's decision. These are generally compared in quantitative terms for standard machine learning datasets. However, the ability of these methods to generalize to specific data distributions such as retinal OCT has not been thoroughly evaluated. In this thesis, multiple attribution methods to explain the decisions of deep learning models for retinal diagnosis are compared. It is evaluated if the methods considered the best for explainability outperform the methods with a relatively simpler theoretical background. A review of current deep learning models for retinal diagnosis and the state-of-the-art explainability methods for medical diagnosis is provided. A commonly used deep learning model is trained on a large public dataset of OCT images and the attributions are generated using various methods. A quantitative and qualitative comparison of these approaches is done using several performance metrics and a large panel of experienced retina specialists. The initial quantitative metrics include the runtime of the method, RMSE, and Spearman's rank correlation for a single instance of the model. Later, two stronger metrics - robustness and sensitivity are presented. These evaluate the consistency amongst different instances of the same model and the ability to highlight the features with the most effect on the model output respectively. Similarly, the initial qualitative analysis involves the comparison between the heatmaps and a clinician's markings in terms of cosine similarity. Next, a panel of 14 clinicians rated the heatmaps of each method. Their subjective feedback, reasons for preference, and general feedback about using such a system are also documented. It is concluded that the explainability methods can make the decision process of deep learning models more transparent and the choice of the method should account for the preference of the domain experts. There is a high degree of acceptance from the clinicians surveyed for using such systems. The future directions regarding system improvements and enhancements are also discussed.
Cite this version of the work
Amitojdeep Singh Brar (2021). Explainable AI for retinal OCT diagnosis. UWSpace. http://hdl.handle.net/10012/16701