Show simple item record

dc.contributor.authorSharma, Rakshit
dc.date.accessioned2024-04-15 14:46:42 (GMT)
dc.date.available2026-04-15 00:00:00 (GMT)
dc.date.issued2024-04-15
dc.date.submitted2024-04-08
dc.identifier.urihttp://hdl.handle.net/10012/20438
dc.description.abstractIn the dynamic landscape of Machine Learning (ML) applications, data quality comes out to be an important factor that impacts the performance of ML models. Through this thesis, we present a study that proposes innovative methods for enhancing data quality through an iterative data recapture approach. This research primarily focuses on univariate time-series data where specific patterns can be extracted. We start by discussing existing data capture methods, where the data is collected manually or using some hardware devices. The proposed methods, namely Sessionized Recapture Strategy (SRS) and Robust Single Capture Method (RSCM), are meticulously detailed, offering distinct strategies for iterative data recapture. The Single Capture Method (SCM) and Recapture and Visualize Method (RVM) serve as the two baseline methods, with their data capture time and a consequential False Positive Rate (FPR). SRS is the enhancement of RVM, and RSCM is the enhancement of SCM. This thesis also introduces an outlier detection algorithm named Outlier detection through ParameterlEss Robust Algorithm (OPERA), which, when added with SCM and RVM, results in SRS and RSCM, respectively. Compared with the baseline methods, the proposed methods show promising results and improvement in the data quality of the captured data. The experiments are performed on two datasets: one dataset is captured in the Embedded Systems Lab on one of the ANVIL products for Future Technology Devices International (FTDI) chips, and the second dataset is Electrocardiogram (ECG), provided by PhysioNet and is publicly available. The research concludes with synthesizing key findings and recommendations for practitioners seeking to optimize model performance through enhanced data quality.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectdata capture verificationen
dc.subjectoutlier detectionen
dc.subjectanomaly detectionen
dc.subjectparameterlessen
dc.subjectrobust outlier detectionen
dc.subjectOPERAen
dc.subjectdata capture strategiesen
dc.subjectANVILen
dc.subjectECG5000en
dc.subjectdata capture issuesen
dc.titleImpact of data quality on ML models: Improving data quality with Outlier Detectionen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.embargo.terms2 yearsen
uws.contributor.advisorFischmeister, Sebastian
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages