Algorizmi: A Configurable Virtual Testbed to Generate Datasets for Offline Evaluation of Intrusion Detection Systems
MetadataShow full item record
Intrusion detection systems (IDSes) are an important security measure that network administrators adopt to defend computer networks against malicious attacks and intrusions. The field of IDS research includes many challenges. However, one open problem remains orthogonal to the others: IDS evaluation. In other words, researchers have not yet succeeded to agree on a general systematic methodology and/or a set of metrics to fairly evaluate different IDS algorithms. This leads to another problem: the lack of an appropriate IDS evaluation dataset that satisfies the common research needs. One major contribution in this area is the DARPA dataset offered by the Massachusetts Institute of Technology Lincoln Lab (MIT/LL), which has been extensively used to evaluate a number of IDS algorithms proposed in the literature. Despite this, the DARPA dataset received a lot of criticism concerning the way it was designed, especially concerning its obsoleteness and inability to incorporate new sorts of network attacks. In this thesis, we survey previous research projects that attempted to provide a system for IDS offline evaluation. From the survey, we identify a set of design requirements for such a system based on the research community needs. We, then, propose Algorizmi as an open-source configurable virtual testbed for generating datasets for offline IDS evaluation. We provide an architectural overview of Algorizmi and its software and hardware components. Algorizmi provides its users with tools that allow them to create their own experimental testbed using the concepts of virtualization and cloud computing. Algorizmi users can configure the virtual machine instances running in their experiments, select what background traffic those instances will generate and what attacks will be launched against them. At any point in time, an Algorizmi user can generate a dataset (network traffic trace) for any of her experiments so that she can use this dataset afterwards to evaluate an IDS the same way the DARPA dataset is used. Our analysis shows that Algorizmi satisfies more requirements than previous research projects that target the same research problem of generating datasets for IDS offline evaluation. Finally, we prove the utility of Algorizmi by building a sample network of machines, generate both background and attack traffic within that network. We then download a snapshot of the dataset for that experiment and run it against Snort IDS. Snort successfully detected the attacks we launched against the sample network. Additionally, we evaluate the performance of Algorizmi while processing some of the common usages of a typical user based on 5 metrics: CPU time, CPU usage, memory usage, network traffic sent/received and the execution time.