Data Balancing and Hyper-parameter Optimization for Machine Learning Algorithms for Secure IoT Networks
Abstract
Nowadays, many industries rely on Machine Learning (ML) algorithms and their ability to learn from existing data to make inferences about new unlabeled data. Applying ML algorithms to the network security domain is not new. However, without proper data preprocessing and proper optimization of the hyper-parameters (HPs) of ML algorithms, these algorithms might not achieve their full potential. Furthermore, attacks on network infrastructures come in a variety of forms and at different frequencies. Cyber-security experts often require the help of an automated process that filters and classifies attacks. To apply specific preventive measures for securing networks, the classification of the attack type is key. Many ML models have been proposed as a base for Network Intrusion Detection (NID) systems. However, their performance varies based on multiple factors. For instance,
an ML model fitted on a highly imbalanced dataset may be biased toward over-represented attack types. On the other hand, paying attention only to the ML model’s performance in the minority classes can negatively affect its performance in the majority classes or overall performance. This research proposes a framework that applies pre-processing steps, including data balancing, and utilizes optimization techniques to tune the HPs of random forest, gradient boosting machine, and deep neural networks. The conducted experiments in this research provide a performance comparison between three different optimization algorithms: Tree-structured Parzen Estimator (TPE), Bayesian Optimization and Hyperband (BOHB), and Particle Swarm Optimization (PSO). The research results show that through data balancing and optimization of the HPs and architecture of deep neural networks, their performance can improve significantly: false alarm rate of 0% and only 1.79% using the BoT-IoT and the ToN-IoT benchmark datasets, respectively.
To address the issue of imbalanced datasets, this research gives a data balancing algorithm and compares its performance to other existing approaches that use: Random Over-Sampling (ROS), Synthetic Minority Oversampling TEchnique (SMOTE), Adaptive Synthetic Sampling (ADASYN), and Generative Adversarial Networks (GAN). The data balancing algorithm is combined with Convolutional Neural Networks (CNN) to extract spatial features and classify the different attack types. Using the NSL-KDD and the BoT-IoT datasets for benchmarking, the proposed system achieves high performance in the minority classes: recall scores of 70.50% and 72.08% on the User to Root (U2R) and Remote to Local (R2L) attack classes of the NSL-KDD dataset, respectively, while maintaining an overall False Alarm Rate (FAR) of 6.50% and a recall of 90.46% on the binary classification task. The proposed system scores a weighted average F1-Score of 99.45% on the multi-class classification task using the BoT-IoT dataset.
Collections
Cite this version of the work
Omar Elghalhoud
(2022).
Data Balancing and Hyper-parameter Optimization for Machine Learning Algorithms for Secure IoT Networks. UWSpace.
http://hdl.handle.net/10012/18957
Other formats
Related items
Showing items related by title, author, creator and subject.
-
Dynamic Resource Provisioning and Scheduling in SDN/NFV-Enabled Core Networks
Qu, Kaige (University of Waterloo, 2020-12-17)The service-oriented fifth-generation (5G) core networks are featured by customized network services with differentiated quality-of-service (QoS) requirements, which can be provisioned through network slicing enabled by ... -
Strategic and Stochastic Approaches to Modeling the Structure of Multi-Layer and Interdependent Networks
Moradi Shahrivar, Ebrahim (University of Waterloo, 2016-08-12)Examples of complex networks abound in both the natural world (e.g., ecological, social and economic systems), and in engineered applications (e.g., the Internet, the power grid, etc.). The topological structure of such ... -
Resource allocation for heterogeneous wireless networks
Tharaperiya Gamage, Amila Pradeep Kumara (University of Waterloo, 2015-10-15)Demand for high volumes of mobile data traffic with better quality-of-service (QoS) support and seamless network coverage is ever increasing, due to growth of the number of smart mobile devices and the applications that ...