The survey of different existing IDS implementations is carried out, before finalizing the solution. While carrying out survey, the advantages and disadvantages of these systems are observed. The survey study consists of few successfully implemented IDS solutions. The survey is as follows:
Intrusion detection system using support vector machine (SVM)
In [7], Halimaa et al. have presented the intrusion detection system model using machine learning approach. Two machine learning classifiers are used, SVM and Naïve Bayes classifiers. It is concluded that, SVM provides good accuracy as compared to naïve Bayes. Feature selection is used for selecting useful and relevant features for model development. These accuracy results are valid only for known attack detection and cannot be considered in unknown attack detection systems. The timeliness is also the major area not addressed in this work.
In [8], Yang et al. have presented the work of support vector machine (SVM) and its performance. It is observed from the experiments that SVM is faster machine learning technique. The analysis is done using three kinds of indicators sensitivity, specificity, time-consumption.
In SVM performance evaluation using different data sets, and different kernels such as linear, polynomial, sigmoid and RBF, RBF kernel gives good performance [9]. Also, the results observed are encouraging as compared to other techniques. The SVM provides better results for limited data set size, but in case of huge data size, time complexity is more [10]. Also, the SVM is not so popular for the imbalanced data and unknown labelled data [10]. Hence, the models cannot be developed with standalone SVM method.
In [11], Agarap et al. have designed the intrusion detection system using neural network architecture with Gated Recurrent Unit (GRU) and Support Vector Machine (SVM). In the presented model the SVM is used at an output layer of GRU-RNN, SVM is faster in terms of time complexity. The training and testing is done only with binary classification for SVM. The results should be generated with mlticlassification and that is one of the important things. Also, the classification using SVM gives better results for known attack traffic and should be verified with unknown attack traffic.
In [12], Jha et al. have designed the intrusion detection system using Support Vector Machine (SVM). The data set used for the model training and testing is NSL KDD dataset. With, feature selection, best features selected based on k Means algorithm criteria. With the reduction in the number of features, the time required for execution of the SVM is also reduced. But, the disadvantage is, SVM is a classification technique, which requires, prior knowledge of events and data. Hence, in intrusion detection systems, SVM model is useful only in case of known attack detection and fails to detect unknown attacks attempts.
SVM gives good performance, when the effective feature selection algorithm is used. In [13], for cloud intrusion detection system (CIDS), the SVM is used to build a model with the Correlation based feature selection (COFS), which helped to achieve better results.
In [14], Teng have presented the CAIDM model named as Collaborative and Adaptive Intrusion detection model using machine learning classification techniques, SVM and decision tree. The KDDCUP99 dataset is used for training and testing of the CAIDM model and it is concluded that, this model with the use of SVM and DT generates better results than using the single SVM for intrusion detection system. Also, the important point to note here, is that, both of these methods in CAIDM are classification methods, which are useful for prior knowledge attacks and not for identification of new attack type input. Also, timeliness is not considered for the model execution.
Intrusion detection system using k nearest neighbor (kNN)
In [15], Cover and et al. have presented the nearest neighbor pattern classification technique called as a k-NN classification. The data point is classified based on the distance between the different data points of other classes. The data point is classified to the class of the data points, whose distance is minimum as compared to other classes’ data points. The work states that, the error is minimum for 1-NN and same can be implied for k-NN classification with minimized error. It is the classification technique, which is faster pattern classification technique.
In [16], Imandoust et al. have proposed a study of kNN for prediction of economic events. The same method is used for regression by assigning the value of the property to the object property by calculating the average of the property value for all its neighbors. Hence, kNN is used as efficient method for prediction of the economic events. The kNN is a faster algorithm and very useful, even in the absence of the prior knowledge of the events.
In [17], Ali et al. have presented the detail study about performance of k nearest neighbor algorithm for heterogeneous data sets. The kNN performance is evaluated using Euclidian distance and Manhattan distance formulas. It is found that, the Euclidian distance formula does not provide better kNN results. Also, for heterogeneous data sets, not much difference is observed in the performance of the kNN algorithm.
In [18], Benaddi et al. have proposed the robust intrusion detection system model using Principal Component Analysis (PCA)—Fuzzy Clustering—kNN. NSL KDD data set is used for preparing the model of IDS. It is concluded that, the important thing is to reduce the set of features in the data set to achieve the desired performance of the IDS. With the presence of kNN, it was possible for the model to classify the different attack types input effectively. But, the problem is the accuracy of the kNN should be verified properly. Also, the time complexity of the system is the area of concern in the given IDS model.
In [19], Li et al. have presented the intrusion detection system based on binary classification and kNN. The model is divided into two steps. The first step is related to binary classification to detect the abnormal connections and in second step, kNN is used for detecting unusual and new input types. It is concluded that, when kNN is used, IDS model found accurate than the single binary classification in IDS. But, the accuracy isn’t verified after step 2, as it is required for kNN and also the timeliness is not tested in the given results, which cannot be ignored in the IDS model.
In [20], Li et al. have presented the intrusion detection model for wireless sensor networks using the kNN classification. The attack type targeted for detection is flooding attack types. Model is proved to be effective in detecting the flooding attack types using kNN classification. But, the problem is, alone kNN is not effective when different attack types or new attack types are to be detected.
Intrusion detection system using decision tree
In [21], Yihunie et al. have presented the work of anomaly intrusion detection using machine learning techniques. The different five classification techniques are used in comparison with each other as a part of model. The techniques used are: SGD, logistics regression, random forest, SVM and sequential model. The data set used for training and testing of the models is NSL KDD dataset. The results have shown that, random forest has provided the better accuracy as compared to other algorithms. It is important to note that, this accuracy is valid only for known attacks detection. This work does not consist of unknown attack detection, which is the important thing to be considered. Also, the work does not talk about the speed of the system, timeliness in detection and also the fault tolerant nature of IDS. Hence, this IDS system needs improvement considering these properties, and model should be enhanced with better techniques.
In [22], Chang et al. have presented the network intrusion detection using the support vector machine and random forest. The random forest used for feature selection to increase the accuracy and time required for execution. By using, random forest, out of 41, 14 features are selected, which gave the good accuracy as compared to accuracy by using 41 attributes. These results are for known type of input and hence, accuracy cannot be generalized for new attack type input.
In [23], Kumar et al. have presented the intrusion detection system using the decision tree. The results obtained are having better figures. The decision tree uses the preexisting knowledge for model building and gives better results for the known input types. For, unknown input types, it cannot perform well and should be used with another methods to achieve better results.
Intrusion detection system using naïve Bayes
In [24], Panda et al. have presented the intrusion detection system using naïve Bayes technique. The results obtained by the model are better than the neural network architectures. The model is built with two layers and distance between the information nodes is minimized. The results also shown that naïve Bayes approach gives good results in less time and with low cost. But, the drawback of the system is, it generates more false positives as compared to other systems. Hence, it can be concluded that, naïve Bayes should be used along with other techniques for better results and reduction in false positives.
In [25], Sharmila et al. have presented the intrusion detection system using PCA based naïve Bayes algorithm. The results have shown that, better accuracy is achieved with PCA based naïve Bayes algorithm as compared to traditional naïve Bayes algorithm. This approach also helps to provide useful results even in presence of the missing values in the data sets. However, with increasing size of the data, the accuracy is decreased and speed of the system also slowdowns. Hence, naïve Bayes can be used with other techniques to achieve better results.
Intrusion detection system using artificial intelligence and deep learning techniques
In [26], Rassam et al. have proposed an IDS solution based on smart and generic rule construction. The smart rules are the rules which are formed as a single rule in place of 2–3 rules, which can detect multiple attacks. Hence, these are also named as generic rules. In the said work, first step is data preprocessing, followed by smart and generic rule construction, after which constructed rules learning is carried out. The advantage of the system is, because of construction of the smart rules, the smaller number of rules help to detect maximum number of attacks, hence less power consumption. But, this system is proposed considering, maximum attacks are from the internal systems of the network. So, it cannot be considered for outside incoming attack detections, as the size of the network is large for outside incoming attack detections.
In [27], Amudha et al. have presented the Intrusion detection system using hybrid swarm intelligence. The system is organized of two Intelligence algorithms, Particle Swarm Optimization (PSO) and Artificial Bee Colony (ABC). The data set used for the said work is KDDCUP 99. Initially the data preprocessing techniques are used for better results, feature selections are done by using SFSM (Single Feature Selection Method) and RFSM (Random Feature Selection Method). The PSO and ABC are applied as intelligence algorithms and hybrid model is developed, which gives the 99.5% accuracy for detection of known attacks. But, the problem is in detecting unknown attacks, which is not tested and presented by the researchers in this article. Also, the timeliness in detecting attacks is not considered while implementing this model. Hence, the model cannot deal with unknown attack identification and time of detection is also the concern that is required to be addressed.
In [28], Bahlali et al. have carried out the research work of anomaly based intrusion detection using machine learning techniques. The implementations and comparison of three machine learning techniques like logistic regression, decision tree and random forest, along with ANN in deep learning technique is presented in the work. The dataset USNW-NB15 is used, that suffers from issues such as imbalanced classes. Still, the accuracy claimed in the results by using these classifiers, is good. Among the used algorithms the ANN is found to be the best algorithm for accuracy of IDS model. The problem with the IDS model is performance in terms of timeliness, speed is not considered in the work. Also, the results can’t be genuinely considered to compare the models, as dataset is old and does not reflect the new attacks.
In [29], Zamani has provided the detail study of design of intrusion detection system using machine learning techniques. The machine learning techniques are divided into two parts: Artificial Intelligence and Computational Intelligence. These methods share many features in common. It is claimed in the study that, an effective intrusion detection system can be designed using machine learning approach. It can allow to design a system which will be accurate, fault tolerant, efficient etc.
In [30], Dr. Malliga et al. have presented the network intrusion detection system for IoT systems using machine learning and deep learning algorithms. Naïve Bayes is used as a machine learning technique, in comparison with ANN and RNN as deep learning techniques. It is concluded that, Accuracy achieved using deep learning techniques like ANN and RNN is more than accuracy achieved using machine learning approach of naïve Bayes. The time requirement is also mentioned. But, here the important issue is, in deep learning, the amount of data required is huge in size and it has more time complexity of training as compared to machine learning techniques. Also these ANN and RNN techniques are blind folded techniques, the base for decision making in the model, is not transparent. Also, the model present, is for IoT and a sequential model.
Survey studies in intrusion detection system using machine learning and artificial intelligence (deep learning)
In Intrusion detection system, the major areas of concern are the quality attributes which cannot be compromised. The important measures are: accuracy, performance, completeness, fault tolerance, timeliness [31]. These properties should be developed and considered while developing the better intrusion detection system.
In [32], Khan et al. have carried out the survey about performance of different decision tree algorithms in data mining. From the survey, the importance of the decision tree algorithms is highlighted. The performance of ID3, C4.5, CART decision tree algorithms is uncovered in the article. These algorithms are useful in decision tree based classification problems. These algorithms are used to construct the decision tree. In this process, the speed of the decision tree construction is the critical parameter. The performance of the algorithms in terms of speed is also presented in the said survey. Accordingly the C4.5 is the fastest algorithm among the given algorithms. This survey is useful for the researchers intending to deal with classification problem using decision tree.
In [33], Khairsat et al. have presented the survey of intrusion detection systems. The paper has outlined the Intrusion Detection Systems with their advantages and disadvantages, data sets, and different challenges in IDS model development. The IDS systems for zero-day attack identification are reviewed in the survey. It is found that IDS gives poor accuracy for new attacks detection. The survey has also examined the data sets and their effectiveness. The datasets used by different researchers for generating testing results for the IDS model does not consist of the new attacks. These datasets are developed in 1999, where are very old for testing new IDS systems developed in the recent years e.g., DARPA, KDD99 etc. Hence, the use of such old datasets leads to the inaccurate claims for effectiveness of the IDS systems and results cannot be considered as genuinely accurate results.
In [34], Saranya et al. have presented a survey about performance analysis of machine learning techniques in intrusion detection system. It briefs out various machine learning techniques and algorithms. It also explains the intrusion detection system in different application areas and their implementation using machine learning algorithms. The survey results states different IDS implementations using machine learning techniques like naïve Bayes, decision tree, SVM, kNN, k means, deep learning, ensemble learning, ANN, DBN etc. It is concluded by the survey study, that machine learning techniques are useful for developing accurate and effective intrusion detection system model as compared to other techniques.
In [35], Maniriho et al. have presented the survey of machine learning techniques for anomaly based intrusion detection system. The generic model of the IDS using machine learning techniques is explained, followed by the implementations of IDS using different machine learning techniques with the results. The results are generated using WEKA tool. The comparison is given among Random forest, decision stump, naïve Bayes and SGD algorithms. The conclusion is that, Random forest generates best results for intrusion detection as compared to other techniques used in the review. It is important to note here that, the methods and data set used is suitable for known attack detection, hence unknown attack detection is important but not considered here. Hence, the results cannot be generalized for new attack types, which are not present in the data set. Also, this survey does not outline the time performance of the algorithms in IDS, which is one of the important property to be considered in intrusion detection.
In [36], Haq et al. have presented the useful survey of machine learning techniques for intrusion detection systems. The study provides statistics about, number of IDS design attempts using machine learning approach. Few IDSs are designed using any of the standalone machine learning technique, few are designed using hybrid methods and few are designed using ensemble approach. It also states, dataset used, plays important role in generating results. Irrelevant and redundant features should be removed properly, best algorithm of feature selection should be used to avoid slowness of the system. And finally, it concludes that, hybrid approaches of implementation gives good results as compared to standalone technique based approaches.
In [37], Labonne et al. have presented a survey study of intrusion detection system using machine learning techniques. It briefs out the work that has been carried out over the years by different researchers. The study reveals that, better solution is possible with machine learning in intrusion detection. It is also concluded that, NSL KDD dataset is better than KDD99 data set and is used by many researchers also.
In [38], Mazini et al. have proposed the anomaly based intrusion detection using machine learning approach. The methods used for the design of this intrusion detection system is Artificial Bee Colony (ABC) and AdaBoost algorithms. The approach is sequential approach for the execution. The dataset used for testing of the system is NSL KDD dataset. The results obtained, are showing the better numbers as compared to legendary methods of machine learning. But, in this approach, the feature selection is used, which reduces many features based on the inappropriate data values of the features, and importance of the feature is not the criteria. Also, only classification of the known attack samples is given, hence accuracy cannot be generalized for unknown malicious behaviors. The approach is implemented in the simulation environment, where assumptions may not be suitable in real time environment. The efficiency or time complexities are given in sequential execution, which are not suitable to achieve the timeliness in the current systems as the data and connection requests per second are huge in numbers.
In [39], Amouri et al. have presented the intrusion detection system using machine learning techniques for mobile Internet of Things. The model is designed using regression techniques and feature selection is used for removing redundant, irrelevant features from the dataset. The results are generated using simulation based environment. The accuracy observed in detection is claimed as good accuracy, but major concern is false positive rate (FPR), it couldn’t be kept at minimum and it has shown variation to a concern level. This model is designed for IoT systems with the limited scope. Hence, it has several assumptions in the specified scope and cannot be considered true in other application scenarios.
In [40], Aslam et al. have presented the hybrid approach for network intrusion detection system using machine learning approach and rules based system. The machine learning techniques like decision tree, Sequential Minimal Optimization and simple logistic is used for the hybrid approach with rule based system. The accuracy observed is for known attacks from the dataset and hence cannot be generalized for unknown attack identification. Also, model does not consider important properties like speed, timeliness and fault tolerance of the IDS system.