Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection

Maseno, Elijah M.; Wang, Zenghui

doi:10.1186/s40537-024-00887-9

Journal of Big Data

Table 1 Summary of related work

From: Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection

Reference	Objective	Feature selection method	Dataset	Advantage	Disadvantage
[6]	Solving the problem of the “nesting effect” found in the original SFS	Wrapper	KDD Cup 99 dataset	The high detection rate of anomaly intrusion with reduced features	Focused only on anomaly detection
[2]	Designing a new technique of binarizing binarize a continuous pigeon-inspired optimizer	Wrapper	KDDCUP 99, NLS-KDD and UNSW-NB15	The model had a better learning rate and outperformed other models in terms of TPR, FPR, accuracy, and F-score	The model was evaluated using outdated datasets
[7]	To develop IDS in a fog environment	Wrapper	NSL-KDD dataset	The excellent detection rate of 99.73%	Lower F-score compared to SVM, Random Forest, and Decision tree algorithms
[8]	Developing a model to diagnose different cancer diseases from big data	Filter-base + Wrapper	Four cancerous microarray datasets (Leukemia, ovarian cancer, small round blue cell tumor, and lung cancer datasets)	The model selected the few relevant genes with high accuracy	The model was tested using microarray datasets of smaller sizes
[9]	Proposed ensemble-filter-based hybrid feature selection model for disease detection	Filter-base + Wrapper	Twenty benchmark medical datasets	The model was evaluated using four classifiers namely, Naïve Bayes, Support Vector Machine with Radial Basis Function, Random Forest, and k-Nearest Neighbor	The study used only two performance metrics namely, accuracy and AUROC. The authors propose other metrics to be used in future work
[10]	Investigation of various feature selection techniques	Wrapper	Aegean Wi-Fi Intrusion Dataset (AWID)	The model reported a high detection accuracy of up to 99.95%	It takes longer to build the model
[11]	To develop semi-distributed and distributed IDS	Wrapper	AWID	Using a multi-layer perceptron (MLP) classifier, distributed IDS had the lowest CPU running time of 73.52 s and the best detection accuracy of 97.80%	The authors noted the need for up-to-date datasets for further evaluation of the model
[12]	To select the best features for exact classification of smart IoT anomaly and intrusion traffic identification	Wrapper	Bot-IoT dataset	The research reduced the 39 original features to 7 without affecting the model's accuracy	The research focused only on Bot-IoT attacks
[5]	To develop a feature selection technique based on a differential evaluation algorithm	Wrapper	NSL-KDD dataset	The selected features improved the accuracy and the running time of the model	The results are still not optimistic
[13]	To develop a wrapper-based feature selection method based on a modified whale optimization algorithm (WOA)	Wrapper	CICIDS2017 and ADFA-LD standard datasets	The improved WOA performed better compared with the traditional WOA in terms of detection rate and accuracy	Investigations can be done in the future to further reduce the features
[14]	To improve the performance of IDS through the development of a two-phase framework to increase the detection rate as well reducing the false alarm rate	Wrapper	NLS-KDD, ISCX2012, UNSW-NB15, KDD CUP 1999, and CICIDS2017 datasets	Introduction of a new metaheuristic algorithm (MOBBAT), a binary version of the BAT algorithm	The researchers did not consider computational cost as a metric measure
[1]	Selection of key features using an evolutionary algorithm	Filter and Wrapper	Wine, Ada Sonar, Sylva, Madelon and Gina datasets	Evaluation of the model was done using several datasets to drop any bias	The model was evaluated using only one metric measure
[15]	To develop a novel feature selection algorithm named hybrid improved dragonfly algorithm (HIDA)	Filter and Wrapper	10 gene expression datasets and 8 UCI data sets	HIDA has an excellent performance in resolving imbalanced classification problems	High computational complexity compared to the wrapper algorithm
[16]	Combination of genetic algorithms (GA) and particle swarm optimization (PSO) for best feature selection	Filter and Wrapper	Lung, Hill-Valley, Gas 6, Musk 1, Madelon, and Isolet 5	The model had a superior performance in feature reduction as well as classification accuracy	The authors did not compare the computational requirement of the model with other models
[17]	Implementation of binary version of the hybrid grey wolf optimization (GWO) and particle swarm optimization (PSO) for feature selection	Wrapper	18 standard benchmark datasets from UCI	The proposed model outperformed other models in accuracy, best features selection, and the computational time	The model used on one classification algorithm (KNN)
[18]	Implementation of a new feature selection named GAWA	Wrapper	Tweeter datasets	The technique had the capability of reducing the feature subsets up to 61.95% without affecting the accuracy level of the model	The model was evaluated using only one dataset, in the future as proposed by the others the model can be confirmed using other datasets
[19]	Implementation of novel feature selection based on GA and PI	Wrapper	UNSWNB 15 dataset	The proposed model yielded better performance in terms of accuracy and execution time	The main limitation of the model is that it focused on the detection of only two types of attacks
[20]	Combination of embedded and wrapper for feature selection	Embedded and Wrapper	NSL-KDD dataset	Use of two classification algorithms to test the selected feature subset	The model used the NSL-KDD dataset, a classical dataset that does not capture the current intrusion threats

Back to article page