Flow Based Intrusion Detection System Using Multistage Neural Network

: With the rapid expansion of computer networks during the past decade, security has become a crucial issue for computer systems. And to keep security at highest level, there is an increasing need for effective security monitors such as Network Intrusion Detection System to prevent such illicit. In the recent years many researchers focus their hard work on this field using different approaches to build dependable intrusion detection systems. One of these approaches is Flow-based intrusion detection systems that rely on aggregated network traffic flows. In this paper, Multistage Neural Network intrusion detection system based on aggregated flow data is proposed for detecting and classifying attacks in network traffic. The proposed system detects significant changes in the traffic that could be a possible attack in the first stage of neural network, while the second stage has the ability to recognize an attack, to differentiate one attack from another i.e. classifying attack, and the most important, to detect new attacks with high detection rate and low false negative. Two different neural network structures with the use of different training algorithms have been used in our proposed Intrusion Detection System. The experimental results show that the designed system is promising in terms of accuracy and low probability of false alarms, where the overall accuracy classification rate average is equal to 99.25%.


Introduction
With the enormous growth of network-based computer services and the huge increase in the number of applications running on networked systems.Moreover the use of computers in the home and in business was increased considerably.As a result, security becomes a big and increasingly-important issue for all networks and computer in today's enterprise environment.Internet (as many other things) is double-edged.It is the entrance to many beneficial things.Unfortunately, it also opens the way for a lot of harmful things to login into your device.Hackers and intruders have made many successful attempts to bring down high-profile companies networks and systems.Many methods have been developed to secure the system infrastructure and communication over the internet such as the use of firewalls, intrusion detection, and encryption [8], [36].
Intrusion detection is the process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusion.It aims to protect the confidentiality, integrity, and availability of critical networked information systems [37], [39].Intrusion detection system (IDS) is a system that gathers and analyzes information from various areas within a computer or a network to identify attacks made against these components.IDS is an important component for security system.It complements security of other technologies through the provision of information for management.It does not only detect attacks that are discovered by other security elements, but also attempts to provide notification of new attacks that cannot be expected by the other ingredients.This is done by continuously monitoring and analyzing the events that occur in the computer system or network from inside or outside.
In general, IDS uses a number of generic methods for monitoring the exploitations of vulnerabilities.IDSs can be characterized by depending on three main aspects [16], [33]:  The data source: In this case, we have host-based, network-based, or hybrid IDSs.Host based IDS monitors' computer components (such as operating system, packet, system log, etc.).Network based IDS monitors the network (such as traffic).Hybrid IDS combines host with network for monitoring computer and network together.IDS has become increasingly important in recent years to handle the growing number of attacks, the rise in the amount of traffic as well as the increase in line speed [35].Well-known systems like Snort [30] and Bro [31] exhibit high resource consumption when confronted with the overwhelming amount of data found in high-speed of today networks [32].The constant increase in network traffic and the fast introduction of high speed network equipment [14] make it hard to maintain traditional packet based intrusion detection systems.Given these problems, flow-based approaches seem to be promising candidates for intrusion detection research.Flows are monitored by specialized accounting modules usually placed in network routers.These modules are responsible for exporting reports on flow activity to external collectors.Flow-based IDSs will analyze these flows to detect attacks.Compared to traditional IDSs, flow-based IDSs have to handle a considerably lower amount of data.And according to previous researches, approaches that rely on aggregated traffic metrics, such as flow-based approaches, show improved scalability and therefore seems more likely.The benefit of flow-based approaches is that only a fraction of the total amount of data needs to be analyzed.It provides an aggregated view of the data transferred over the network and between hosts, in terms of number of packets, bytes and measured flows themselves.
A flow is defined as a unidirectional stream of packets that share common characteristics, such as source and destination addresses, ports and protocol type.Additionally a flow includes aggregated information about the number of packets and bytes belonging to the stream, as well as its duration.Flows are often used for network monitoring, allowing us to obtain a real time overview of the network status; common tools for this purpose are Nfsen [27] and Flowscan [28], while the de facto standard technology in this field is Cisco Netflow, particularly its versions 5 and 9 [5].
The computational changes in the last several decades have brought growth to new technologies.One of these technologies is artificial neural networks (ANNs).Over the years, ANNs have given various solutions to the industry.Designing and implementing intelligent systems have become an important activity for the innovation and development of better products for human life.Examples might include the case of the implementation of artificial life and giving solution to interrogatives that linear systems are not able to resolve [34].Neural Networks have strong discrimination and generalization abilities, when utilized for classification purposes [2].An increasing amount of research in the last few years has investigated the application of Neural Networks to intrusion detection.If properly designed and implemented, Neural Networks have the potential to address many of the problems encountered by rule-based approaches.Neural Networks were specifically proposed to learn the typical characteristics of system's users and identify statistically significant variations from their established behavior.In order to apply this approach to Intrusion Detection, we would have to introduce data representing attacks and normal network flow to the Neural Networks to adjust the coefficients of these Networks automatically during the training phase.In other words, it will be necessary to collect data representing normal and abnormal behavior to train the Neural Networks.After training has been accomplished, a certain number of performance tests with real network traffic and attacks have been conducted [26].In our study two different neural network methods have been used for our intrusion detection system: Multi Layer Perceptron (MLP) neural network, and Radial Basis Function Network (RBFN).
The ANN needs to be trained (or learned) in order to reach the best output.Basically, learning is a process by which the free parameters (i.e., synaptic weights and bias levels) of the ANN are adapted through a continuing process of stimulation by the environment in which the network is embedded.The type of learning is determined by the manner in which the parameter changes take place.In a general, the learning process may be classified as supervised or unsupervised.The most used training algorithm is back propagation algorithm gradient descent (GDA) with the disadvantage of slow training while Levenberg-Marquardt [11], [12] is one of the accurate algorithms and faster than GDA, but consumes more memory space.In the other hand The RBFN offers a viable alternative to the two-layer neural network in many applications of signal processing, decision making algorithms, pattern recognition, control, and function approximation.It has been shown that the RBFN can fit an arbitrary function with just one hidden layer [13], but they cannot quite achieve the accuracy of the back-propagation

‫والتطبيقية‬ ‫األساسية‬ ‫للعلوم‬ ‫األسمرية‬ ‫الجامعة‬ ‫مجلة‬
( ‫العدد‬ 31 ‫ديسمبر‬ ، ‫الثاني‬ ‫الجزء‬ ،) 2017 ‫م‬ 79 network.Although, RBFN can be trained several orders of magnitude faster than the back-propagation network, and this is a very important advantage in real or semi real time applications.In our study, flow-based intrusion detection and classification system is implemented using multistage neural network.While in many previous studies [6], [32], [22] the implemented system is a neural network based on DARPA [7] or KDD'99 [18] dataset with the capability of detecting normal or abnormal connections, in our study a more general problem is considered in which the attack type is also classified and the training dataset is based on flow dataset instead of DARPA dataset.This paper is organized as follows, section 2 present an overview of a number of related works, section 3 explains the proposed system , section 4 evaluate the proposed system, and section 5 discusses the experiments results followed by conclusions and future work.

Related Work
With the speedy rising of network speed, flow-based techniques attracted the concern interest of researchers, especially in analysis of high-speed networks.And day to day increase in network usage and load, have clearly pointed out that scalability is a growing problem.In this situation, flow based solutions to monitor and, moreover, to detect intrusions help to solve the problem.They achieve, indeed, data and processing time reduction, opening the way to high-speed detection on large infrastructures.Sperotto et al. [35] provided a comprehensive survey on current research in the domain of flow-based network intrusion detection.Gao and Chen [10] designed and developed a flow-based intrusion detection system.Karasaridis et al. [17], Shahrestani et al. [38].A sound evaluation of a neural network based IDS requires high-quality training and testing datasets.Unfortunately, the de facto standard is still the DARPA data set created by Lippmann et al. [19].Despite its severe weaknesses and the critique published by McHugh [20], it is still used.The KDD'99 [18] data set can be regarded as another popular data set.Sperotto et al. [40] contributed the first labelled flow-based dataset intended for evaluating and training network intrusion detection systems.
Several Neural Network approaches were employed for Intrusion Detection systems based on netflow and DARPA [7] dataset.Muna Mhammad T. Jawhar [21] used Neural Network and Fuzzy C-Mean (FCM) clustering algorithms.Rodrigo Braga [4] used OpenFlow and the SOM unsupervised neural network.Vallipuram and Robert [42] used back-propagation Neural Network having all features of KDD (Knowledge Discovery in Databases) data [18].Tie and Li [45] used the back propagation (BP) network with Genetic Algorithms (GAs) to enhance BP, for selected attacks and some features of the KDD dataset as input.Mukkamala, Andrew, and Ajith [23] used Back Propagation Neural Network with many types of learning algorithm.Jimmy and Heidar [15] used Neural Network for classification of unknown attacks.Novikov, Roman, and Reznik [25] used MLP and Radial Based Function (RBF) Neural Network for classification of five types of attacks.Ahmed, Ullah and Mohsin [1] used Resilient Back propagation algorithm for detecting network intrusion attacks in a precise way by using the power of RPROP ((Resilient Back propagation) learning algorithm.B. Subba, S. Biswas and S. Karmakar [46] used Neural Networks for attack classification.D, Vrushali & Pawar [47] developed Anomaly Detection System based on back propagation Neural Networks.

Proposed Approach
The proposed system for flow-based intrusion detection is composed of four main stages, as depicted in Figure 1.

Feature Extraction Stage
The Feature Extraction stage starts after the monitor completes capturing the packets that passed through the network.The packets can be captured at the network layer (IP) and/or the transport layer (TCP, UDP, ICMP).All unidirectional streams of packets that share common characteristics, such as source and destination addresses, ports and protocol type are collected and extracted as flows according to the Cisco protocol [5].The feature extraction module and after receiving collected and extracted flows start applying predefined processes to extract most features that are important for anomaly intrusion detection, and classification stage, and gathers them in 5-tuples and 11-tuples that are passed to the detection and classification stages respectively.Table 1 gives a more detailed explanation for all features of both stages.Pre-processing must be done on all selected features before passing them to the both stages; this phase involves normalizing all features by mapping all the different values for each feature to [0, 1] range.

Neural Network Based Detection stage
Anomalies in our system are defined as unusual activities in the network.The purpose of Neural Network Detection stage one (NNDS) is to find out such activities using a small number of features extracted from collected flow raw data.The number of input nodes of the NNDS corresponds to the number of the selected features (5 Features).The implemented NNDS includes one input layer, one hidden layer and an output layer of 2 nodes as shown in figure 2 (01 as normal traffic, and 10 as anomaly traffic)).The numbers of hidden layers and nodes in them have been determined based on the back propagation (BP) computation process and the process of trial and error which took stretched time.Algorithm 1 below is a simplified general description of the detection process.

Neural Network Based Classification Stage
There are several classification techniques that can be used for classifying attacks based on flow data such as Neural Networks, statistical methods, genetic algorithms, and others.In our system, neural network have been used in classification of data.The results can only be obtained after completing both of training and testing phases.The result from the neural network classification stage is classified into five possible categories.The number of input nodes to the NNCS corresponds to the number of the selected features (11 Features).The implemented neural network includes one input layer, one hidden layer and an output layer of 5 nodes as shown in figure 3 (table 2 contains the descriptions of the outputs).The numbers of nodes in the hidden layers has been determined based on the back propagation (BP) computation process and the process of trial and error.Algorithm 2 describes the Classification procedure.

Alert Stage
This is the final stage of the proposed system.This stage involves identifying the events that occurred whether abnormal or not, then sending the required signals according to the output from NNCS to alert administrator and creates alarms when appropriate.

Experimental Results
The proposed system has been implemented and experimentally evaluated using MATLAB (R2011b) neural network Toolbox.Figure 4 represents the block diagram of the implemented system.The considered scenarios in our experiments are as follows: A. Packet capturing process: it is the first step in the system operation, enables to capture the incoming and outgoing packets in the network.

B. Collecting and exporting flows:
in this phase all unidirectional streams of packets that share common characteristics, such as source and destination addresses, ports and protocol type are collected and extracted as flows according to the Cisco protocol [5].
C. Feature extraction process: Pre-processing must be done on all selected features before passing them to the both stages, such as mapping and normalization.

D. Machine training:
The ANN was trained by pre-processed NetFlow dataset, different number of iterations and hidden units to determine the level of training.And to find out when the neural network was trained properly to detect attacks.Also number of algorithms has been used for training and testing neural networks to detect and classify various actions.After the training of the ANN and finding the best detection rate, the best weights have been saved in a file to be used during the testing phase.The Detection Rate   3. Figure 5 shows the performance of detection module.AT: total number of Attack Type     3 show that the detection rate is 94.1% with false positive of 5.9%.On the other hand results from classification stage (NNCS), show significantly larger improvement of prediction accuracy than the detection stage.Figure 6 shows that, the best validation performance 0.0031 was met at epoch 148.Table 4 shows that, the detection rate relatively high at 99.25% for MLP, and 95.3% for RBF detection algorithm.The false alarms were as low as 0.588% in MLP neural network and 4.6% in RBF neural network.Table 5 shows that, the classification rate comparatively high of Dos attack, port scan attack, land attack, and unknown attack were detected and classified correctly by using Multistage neural networks.The analysis of both layer results show that MLP with Levenberg-Marquardt is found to be fast compared to Resilient Back propagation, low memory consumption compared to Radial Basis Function, and low in false alarms.
According to the recently published results [42], [23], [25], [41], [21], [4], [9], [29], [3] and our result based on neural networks, found that our proposed IDSs are greatly competitive with others and Figure 7 indicates that our system has possibilities for detection and classification of computer attacks with the minimum number of extracted features from flow dataset.

Conclusion and Future Work
A flow based intrusion detection and classification system using multistage neural networks was proposed.One neural network detects traffic anomalies and the other one classifies the type of attack.This system can easily be extended, configured, and modified by replacing some features or adding new features for new types of attacks.The experimental results with our proposed IDS showed that the use of Flow dataset and extracting only features that significantly contribute to intrusion detection gives promising results.The obtained detection rate (94.1% for anomaly detection at stage one, and 99.25% for classification at stage two) is remarkably good compared to other approaches that are based on a similar approach using the same type of training dataset.The MLP network has a better classification ability compared to RBFN, but memory and time consumption is 3-5 times greater.Otherwise, RBFN has a simple architecture and hybrid learning algorithm which leads to less time/memory consumption and it is better for working in real-time and for retraining with new data.Our future research will be directed towards developing a more accurate model that can be used in real-time for detecting and classifying anomaly with minimum features and less training time.


The model of intrusion detection: Here we have anomaly detection, misuse detection, or hybrid detection.Anomaly based IDS monitoring depends on the behaviour of system.Misuse based IDS monitoring depends on signature to data.Hybrid techniques combine anomaly with misuse.The audit collection and analysis: Here IDSs are divided into either centralized or decentralized (distributed) IDSs.In centralized IDSs, monitoring, detection, and reporting are controlled directly from a central location.In decentralized IDSs, monitoring and detection are controlled from a local control node with hierarchical reporting to one or more central location(s).

Algorithm 1 :
Detection Module While (new data available) do  Read 5-tuple inputs for NNDS  Feed parameters to the NNDS  NNDS creates the following results: If the data is "normal", then  Assign 01 to the output of NNDS Else  Assign 10 to the output of NNDS as anomaly traffic  Call neural network classification (NNCS)

Figure 3 :
Figure 3: one input layer, one hidden layer, and an output layer neural network.

Algorithm 2 : 2 
Classification Procedure While ( activated from NNDS) do Begin  Read 11-tuple inputs for NNCS  Feed parameters to the NNCS  NNCS creates the following results: If data is "normal", then  Assign 00001 to the output of NNCS Else  Assign appropriate attack type to the output of NNCS according to the table Enable and False Positive rate (FP) have been calculated for different scenarios according to the following formulas: DR = NA / TA * 100[%] FP = CA / NT * 100[%] Where: DR: Detection Rate NA: Number of detected Attacks.TA: Total number of Attacks.FP: False Positive.CA: number of normal Classified as Attack.NT: total Number of normal Traffic

Figure 4 .
Figure 4. Block diagram of the implemented system.

Figure 5 :
Figure 5: Performance of the detection module

Figure 6 :
Figure 6: Performance of the classification module detection system using multistage neural network and based on flow dataset have been proposed and tested.Three different training algorithms (Levenberg-Marquardt, Radial Basis Function net, and Resilient Back propagation) were used for training both neural networks.Detection Stage (NNDS) was trained until the best validation performance 0.0410 was met at epoch 247 as shown in figure 5.The results in Table

Table 1 :
Proposed system feature description.
8 FSSIP NNCS Number of Flows from the Same Source IP, attacker can send for example ICMP ping packet to every possible address within a subnet 9 FDSIP NNCS Number of Flows from Different Source IP, IP spoofing is wildly used by attackers, high number of different ip addresses within a short period of time could be a strong sign for attack(Dos) 10 FSDP NNCS Number of Flows to the Same Destination Port, in some cases the attacker sends GET request to some ports only (ex.Port 80) to crash the server.11PT NNCS Protocol Type (TCP, UDP, and ICMP), with the combination to the all previous features can help to determine the type of attack.‫والتطبيقية‬‫األساسية‬

Table 2 :
Neural network classified categories.

Table 4 :
Results of Classification Stage (NNCS) 5. Discussion and Comparison of Results