Williams, Justine2023-01-262023-01-262023-01-262023-01-24http://hdl.handle.net/10012/19123With recent events, such as the Covid-19 pandemic, it is increasingly important to develop strategies to combat viral diseases. Due to technological advancements, computer-aided drug design and machine learning (ML)-based hit identification strategies have gained popularity. Applying these techniques to identify novel scaffolds and/or repurpose existing therapeutics for viral diseases is a promising approach. As an avenue to improve existing classification models for antiviral applications, this thesis aimed to make improvements to non-binding data selection within these models. We created a classification model using molecular fingerprints to assess the performance of machine learning predictions when the model is trained using randomly selected and rationally selected non-binding datasets. Our analyses revealed that machine learning predictions can be improved using a rational selection approach. We further used this approach and trained three machine learning models based on XGBoost, Random Forest, and Support Vector Machine to predict potential inhibitors for the SARS-CoV2 main protease (Mpro) enzyme. Probability-ranked hits from the combined model were further analyzed using classical structure-based methods. The binding modes and affinities of the hits were identified using AutoDock Vina, and molecular dynamics simulations-enabled MM-GBSA calculations. The top hits identified from this multi-step screening approach revealed potential candidates that show improved affinity and stability than existing non-covalent Mpro inhibitors. Thus, our approach and the model could be useful for screening large ligand libraries.enChemistryMolecular DockingMolecular DynamicsDrug DiscoveryMachine Learning Model for Repurposing Drugs to Target Viral DiseasesMaster Thesis