Show simple item record

dc.contributor.authorAttia, Omar
dc.date.accessioned2021-01-21 17:53:03 (GMT)
dc.date.available2021-01-21 17:53:03 (GMT)
dc.date.issued2021-01-21
dc.date.submitted2021-01-13
dc.identifier.urihttp://hdl.handle.net/10012/16717
dc.description.abstractMachine learning data repair systems (e.g. HoloClean) have achieved state-of-the-art performance for the data repair problem on many datasets. However, these systems face significant challenges with sparse datasets. In this work, the challenges presented by such datasets to machine learning data repair systems are investigated. Dataset-independent methods are presented to mitigate the effects of data sparseness. Finally, experimental results are validated on a large, sparse real-world dataset: Census. Showing that the problem size can be reduced by more than 70%, saving significant computational costs, while still getting high accuracy data repairs (94.5% accuracy).en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectdata cleaningen
dc.subjectdata imputationen
dc.subjectmachine learningen
dc.subjectsparse dataen
dc.subjectstructured dataen
dc.subjectdata qualityen
dc.subjectdata scienceen
dc.titleScaling Machine Learning Data Repair Systems for Sparse Datasetsen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Mathematicsen
uws-etd.embargo.terms0en
uws.contributor.advisorIlyas, Ihab
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages