Dataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Prediction

Tavakoli, Fatemeh

Dataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Prediction

dc.contributor.advisor	Naik, Kshirasagar
dc.contributor.author	Tavakoli, Fatemeh
dc.date.accessioned	2023-10-19T13:53:31Z
dc.date.available	2023-10-19T13:53:31Z
dc.date.issued	2023-10-19
dc.date.submitted	2023-10-11
dc.description.abstract	Historically, forest fire prediction methods have leaned on heuristics, local insights, and basic statistical models, often neglecting the complex interplay of variables such as temperature, humidity, wind speed, and vegetation type. The lack of real-time prediction capabilities, paired with unpredictable weather patterns attributed to climate change, underscores the shortcomings of traditional methods, especially in geographically varied regions like Canada. In contrast, machine learning provides the adaptability needed for real-time responses, effectively harnessing updated data and addressing region-specific forest fire risks. The shift towards machine learning is both a timely and revolutionary approach. This research addresses the urgent need for effective forest fire prediction and management strategies, specifically in the Canadian context, by harnessing machine learning methodologies. Using Copernicus’s reanalysis data, this study establishes a comprehensive predictive framework employing four cutting-edge machine learning algorithms. Random Forest, XGBoost, LightGBM, and CatBoost. The study features a robust data pre-processing pipeline, class imbalance correction, and rigorous model evaluation measures. Key contributions include the creation of a feature-rich dataset, comprehensive methods for addressing the class imbalance in large scale datasets, and the development of a machine learning framework tailored for forest fire classification. The findings have significant implications for data-driven forest management strategies, with the aim of facilitating proactive fire prevention measures on a large scale. One primary challenge encountered was the inherent class imbalance in fire classification datasets, with a striking 158:1 ratio between "non-fire" and "fire" events. To address this, the study utilized various re-sampling strategies, encompassing under-sampling, over-sampling, and hybrid techniques. Specific methods employed included NearMiss, SMOTE, and SMOTE-ENN. The NearMiss method with a 0.09 sampling ratio was found to be particularly effective in addressing this imbalance. When combined with NearMiss version 3 at a 0.09 ratio, the XGBoost model outperformed its peers, showcasing an accuracy of 98.08%, a sensitivity of 86.06%, and a specificity of 93.03%. The findings indicate that while high recall from NearMiss Version 3 optimized sensitivity, there was sometimes a trade-off with precision.	en
dc.identifier.uri	http://hdl.handle.net/10012/20046
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	Fire Prediction	en
dc.subject	Fire Classification	en
dc.subject	Machine Learning	en
dc.subject	Big Data	en
dc.title	Dataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Prediction	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Naik, Kshirasagar
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Tavakoli_Fatemeh.pdf
Size:: 3.51 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering