Kaur, Parveen2023-08-182023-08-182023-08-182023-08-17http://hdl.handle.net/10012/19720Forest fires pose a significant and urgent threat to ecosystems and human lives, necessitating accurate prediction for effective mitigation strategies. Predicting forest fires has been a longstanding challenge due to the complex and dynamic nature of fire behavior. Traditional approaches to forest fire prediction, dating back to the 1950s, relied on simplistic statistical models and manual observations to identify fire-prone areas. However, these classical solutions were limited in their ability to capture the intricate interplay of various environmental factors that influence fire ignition. Since then, the field of forest fire prediction has undergone remarkable advancements, driven by the availability of heterogeneous data sources, advancements in computing power, and the emergence of machine learning techniques. The advent of remote sensing technologies, weather stations, and geospatial data has provided rich and diverse datasets for analyzing fire-related variables such as weather conditions, vegetation indices, topography, and historical fire records. Furthermore, the rapid progress in machine learning algorithms has enabled the development of sophisticated models capable of extracting meaningful patterns and relationships from these large-scale and complex datasets. These advancements have revolutionized forest fire prediction by improving the performance and reliability of predictive models, facilitating proactive decision-making, and enhancing the effectiveness of mitigation strategies. Our study employs a comprehensive data collection framework to enhance forest fire prediction capabilities. The framework integrates data from remote sensing satellites, ground-based weather stations, and other relevant sources, facilitating the capture of crucial meteorological, biophysical, and topographical attributes. By leveraging these heterogeneous data sources, we create a unified database that spans a substantial 18-year period and offers a high temporal resolution for detailed analysis. However, one of the primary challenges encountered in forest fire prediction is the issue of data imbalance, where the number of non-fire instances significantly surpasses fire instances in the dataset. To address this challenge, advanced spatial subsampling, and downsampling techniques are employed, effectively mitigating the data imbalance issue and ensuring a more balanced representation of fire and non-fire instances for model training. Leveraging machine learning methods such as Random Forest, XGBoost, and MultiLayer Perceptron, our study evaluates the performance of these models in forest fire prediction. The results reveal the exceptional performance of XGBoost, achieving an impressive ROC-AUC score of 87.2\% and a sensitivity of 75\%. This study highlights the importance of incorporating meteorological data and fire history to improve prediction performance and showcases the potential of machine learning techniques in addressing forest fire prediction challenges. The findings contribute to proactive risk assessment, robust mitigation strategies, and preserving ecosystems and human lives.enWildfireImbalanced dataUndersamplingMachine LearningRandom ForestXGBoostForest Fire Prediction Using Heterogeneous Data Sources and Machine Learning MethodsMaster Thesis