Show simple item record

dc.contributor.authorTavakoli, Fatemeh
dc.date.accessioned2023-10-19 13:53:31 (GMT)
dc.date.available2023-10-19 13:53:31 (GMT)
dc.date.issued2023-10-19
dc.date.submitted2023-10-11
dc.identifier.urihttp://hdl.handle.net/10012/20046
dc.description.abstractHistorically, forest fire prediction methods have leaned on heuristics, local insights, and basic statistical models, often neglecting the complex interplay of variables such as temperature, humidity, wind speed, and vegetation type. The lack of real-time prediction capabilities, paired with unpredictable weather patterns attributed to climate change, underscores the shortcomings of traditional methods, especially in geographically varied regions like Canada. In contrast, machine learning provides the adaptability needed for real-time responses, effectively harnessing updated data and addressing region-specific forest fire risks. The shift towards machine learning is both a timely and revolutionary approach. This research addresses the urgent need for effective forest fire prediction and management strategies, specifically in the Canadian context, by harnessing machine learning methodologies. Using Copernicus’s reanalysis data, this study establishes a comprehensive predictive framework employing four cutting-edge machine learning algorithms. Random Forest, XGBoost, LightGBM, and CatBoost. The study features a robust data pre-processing pipeline, class imbalance correction, and rigorous model evaluation measures. Key contributions include the creation of a feature-rich dataset, comprehensive methods for addressing the class imbalance in large scale datasets, and the development of a machine learning framework tailored for forest fire classification. The findings have significant implications for data-driven forest management strategies, with the aim of facilitating proactive fire prevention measures on a large scale. One primary challenge encountered was the inherent class imbalance in fire classification datasets, with a striking 158:1 ratio between "non-fire" and "fire" events. To address this, the study utilized various re-sampling strategies, encompassing under-sampling, over-sampling, and hybrid techniques. Specific methods employed included NearMiss, SMOTE, and SMOTE-ENN. The NearMiss method with a 0.09 sampling ratio was found to be particularly effective in addressing this imbalance. When combined with NearMiss version 3 at a 0.09 ratio, the XGBoost model outperformed its peers, showcasing an accuracy of 98.08%, a sensitivity of 86.06%, and a specificity of 93.03%. The findings indicate that while high recall from NearMiss Version 3 optimized sensitivity, there was sometimes a trade-off with precision.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectFire Predictionen
dc.subjectFire Classificationen
dc.subjectMachine Learningen
dc.subjectBig Dataen
dc.titleDataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Predictionen
dc.typeMaster Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.embargo.terms0en
uws.contributor.advisorNaik, Kshirasagar
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages