Show simple item record

dc.contributor.authorHamzeian, Donya
dc.date.accessioned2021-02-26 19:23:58 (GMT)
dc.date.available2021-02-26 19:23:58 (GMT)
dc.date.issued2021-02-26
dc.date.submitted2021-02-23
dc.identifier.urihttp://hdl.handle.net/10012/16834
dc.description.abstractThe COVID-19 Open Research Dataset (CORD-19) is a collection of over 400,000 of scholarly papers (as of January 11th, 2021) about COVID-19, SARS-CoV-2, and related coronaviruses curated by the Allen Institute for AI. Carrying out an exploratory literature review of these papers has become a time-sensitive and exhausting challenge during the pandemic. The topic modeling pipeline presented in this thesis helps researchers gain an overview of the topics addressed in the papers. The preprocessing framework identifies Unified Medical Language System (UMLS) entities by using MedLinker, which handles Word Sense Disambiguation (WSD) through a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model. The topic model used in this research is a Variational Autoencoder implementing ProdLDA, which is an extension to the Latent Dirichlet Allocation (LDA) model. Applying the pipeline to the CORD-19 dataset achieved a topic coherence value of 0.7 and topic diversity measures of almost 100%.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.relation.urihttps://github.com/DonyaHamzeian/BiomedicalTopicModellingen
dc.relation.urihttps://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challengeen
dc.subjectmachine learningen
dc.subjecttopic modellingen
dc.subjectprodLDAen
dc.subjectLatent Dirichlet Allocationen
dc.subjectBERTen
dc.subjectMedLinkeren
dc.subjectCORD-19en
dc.subjectautomatic exploratory literature reviewen
dc.subjectscoping reviewen
dc.subject.lcshCOVID-19 Pandemic, 2020- , in mass mediaen
dc.subject.lcshMachine learningen
dc.titleUsing Machine Learning Algorithms for Finding the Topics of COVID-19 Open Research Dataset Automaticallyen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentStatistics and Actuarial Scienceen
uws-etd.degree.disciplineStatisticsen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorGhodsi, Ali
uws.contributor.advisorChen, Helen (Assistant Professor)
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages