Data Balancing and Hyper-parameter Optimization for Machine Learning Algorithms for Secure IoT Networks
MetadataShow full item record
Nowadays, many industries rely on Machine Learning (ML) algorithms and their ability to learn from existing data to make inferences about new unlabeled data. Applying ML algorithms to the network security domain is not new. However, without proper data preprocessing and proper optimization of the hyper-parameters (HPs) of ML algorithms, these algorithms might not achieve their full potential. Furthermore, attacks on network infrastructures come in a variety of forms and at different frequencies. Cyber-security experts often require the help of an automated process that filters and classifies attacks. To apply specific preventive measures for securing networks, the classification of the attack type is key. Many ML models have been proposed as a base for Network Intrusion Detection (NID) systems. However, their performance varies based on multiple factors. For instance, an ML model fitted on a highly imbalanced dataset may be biased toward over-represented attack types. On the other hand, paying attention only to the ML model’s performance in the minority classes can negatively affect its performance in the majority classes or overall performance. This research proposes a framework that applies pre-processing steps, including data balancing, and utilizes optimization techniques to tune the HPs of random forest, gradient boosting machine, and deep neural networks. The conducted experiments in this research provide a performance comparison between three different optimization algorithms: Tree-structured Parzen Estimator (TPE), Bayesian Optimization and Hyperband (BOHB), and Particle Swarm Optimization (PSO). The research results show that through data balancing and optimization of the HPs and architecture of deep neural networks, their performance can improve significantly: false alarm rate of 0% and only 1.79% using the BoT-IoT and the ToN-IoT benchmark datasets, respectively. To address the issue of imbalanced datasets, this research gives a data balancing algorithm and compares its performance to other existing approaches that use: Random Over-Sampling (ROS), Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Generative Adversarial Networks (GAN). The data balancing algorithm is combined with Convolutional Neural Networks (CNN) to extract spatial features and classify the different attack types. Using the NSL-KDD and the BoT-IoT datasets for benchmarking, the proposed system achieves high performance in the minority classes: recall scores of 70.50% and 72.08% on the User to Root (U2R) and Remote to Local (R2L) attack classes of the NSL-KDD dataset, respectively, while maintaining an overall False Alarm Rate (FAR) of 6.50% and a recall of 90.46% on the binary classification task. The proposed system scores a weighted average F1-Score of 99.45% on the multi-class classification task using the BoT-IoT dataset.
Cite this version of the work
Omar Elghalhoud (2022). Data Balancing and Hyper-parameter Optimization for Machine Learning Algorithms for Secure IoT Networks. UWSpace. http://hdl.handle.net/10012/18957
Showing items related by title, author, creator and subject.
Qu, Kaige (University of Waterloo, 2020-12-17)The service-oriented fifth-generation (5G) core networks are featured by customized network services with differentiated quality-of-service (QoS) requirements, which can be provisioned through network slicing enabled by ...
Tharaperiya Gamage, Amila Pradeep Kumara (University of Waterloo, 2015-10-15)Demand for high volumes of mobile data traffic with better quality-of-service (QoS) support and seamless network coverage is ever increasing, due to growth of the number of smart mobile devices and the applications that ...
Chowdhury, Shihabur (University of Waterloo, 2021-02-23)Communication networks are undergoing a major transformation through softwarization, which is changing the way networks are designed, operated, and managed. Network Softwarization is an emerging paradigm where software ...