Show simple item record

dc.contributor.authorZhou, Pei-Yuan
dc.date.accessioned2020-09-10 15:38:39 (GMT)
dc.date.available2020-09-10 15:38:39 (GMT)
dc.date.issued2020-09-10
dc.date.submitted2020-08-27
dc.identifier.urihttp://hdl.handle.net/10012/16276
dc.description.abstractIn recent years, machine learning approaches have important empirical successes on analysing data such as images, signals, texts and speeches with applications in biomedical and clinical areas. However, from the perspective of modelling, many machine learning methods still encounter crucial problems such as the lack of transparency and interpretability. Frequent Pattern Mining or Association Mining methods intend to solve the problem of interpretability, but they also encounter serious problems such as requiring exhaustive search and producing overwhelming numbers of patterns. From the perspective of data analysis, they do not render high prediction accuracy particularly for data with low volume, rare or imbalanced groups, rare cases or biases due to subtle overlapping or entanglement of the statistical and functional associations at the data source level. Hence, Professor Andrew K.C. Wong and I have developed a novel Pattern Discovery and Disentanglement (PDD) Method to discover explicit patterns and unveil knowledge from relational datasets even encompassing imbalanced groups, biases and anomalies. The statistically significant high-order patterns, pattern clusters and rare patterns are discovered in the disentangled Attribute Value Association (AVA) Spaces. They may be embedded in a relational dataset but overlapping or entangled with each other so that they are masked or obscured at the data level. The patterns discovered from the disentangled association source can be used for explicitly interpreting the original data, predicting the functional groups/classes and detecting anomalies and/or outliers. When class labels are not given, pattern/entity clustering can be more effectively discovered from the disentangled attribute value association (AVA) space than from the original records. The objective of this Master Thesis is to develop and validate the efficacy of PDD for genomic and clinical data analysis using a) protein sequence data, b) public clinical records from UCI dataset and c) a clinical dataset obtained from the School of Public Health and Health Systems at the University of Waterloo. The experimental results with superior performance in unsupervised and supervised learning than existing methods are presented in interpretable knowledge representation frameworks, interlinking the AVA disentangled sources, patterns, pattern/entity clusters and individual entities. In the clinical cases, it reveals the symptomatic patterns of individual patients, disease complexes/groups and subtle etiological sources. Hence it will have impacts in machine learning on genomic and clinical data with broad applications.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectpattern discoveryen
dc.subjectclinical data analysisen
dc.subjectmachine learningen
dc.subjectdata miningen
dc.titlePattern Discovery and Disentanglement for Clinical Data Analysisen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentSystems Design Engineeringen
uws-etd.degree.disciplineSystem Design Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws.contributor.advisorWong, Andrew
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages