Reference | Objective | Feature selection method | Dataset | Advantage | Disadvantage |
---|---|---|---|---|---|
[6] | Solving the problem of the “nesting effect” found in the original SFS | Wrapper | KDD Cup 99 dataset | The high detection rate of anomaly intrusion with reduced features | Focused only on anomaly detection |
[2] | Designing a new technique of binarizing binarize a continuous pigeon-inspired optimizer | Wrapper | KDDCUP 99, NLS-KDD and UNSW-NB15 | The model had a better learning rate and outperformed other models in terms of TPR, FPR, accuracy, and F-score | The model was evaluated using outdated datasets |
[7] | To develop IDS in a fog environment | Wrapper | NSL-KDD dataset | The excellent detection rate of 99.73% | Lower F-score compared to SVM, Random Forest, and Decision tree algorithms |
[8] | Developing a model to diagnose different cancer diseases from big data | Filter-base + Wrapper | Four cancerous microarray datasets (Leukemia, ovarian cancer, small round blue cell tumor, and lung cancer datasets) | The model selected the few relevant genes with high accuracy | The model was tested using microarray datasets of smaller sizes |
[9] | Proposed ensemble-filter-based hybrid feature selection model for disease detection | Filter-base + Wrapper | Twenty benchmark medical datasets | The model was evaluated using four classifiers namely, Naïve Bayes, Support Vector Machine with Radial Basis Function, Random Forest, and k-Nearest Neighbor | The study used only two performance metrics namely, accuracy and AUROC. The authors propose other metrics to be used in future work |
[10] | Investigation of various feature selection techniques | Wrapper | Aegean Wi-Fi Intrusion Dataset (AWID) | The model reported a high detection accuracy of up to 99.95% | It takes longer to build the model |
[11] | To develop semi-distributed and distributed IDS | Wrapper | AWID | Using a multi-layer perceptron (MLP) classifier, distributed IDS had the lowest CPU running time of 73.52 s and the best detection accuracy of 97.80% | The authors noted the need for up-to-date datasets for further evaluation of the model |
[12] | To select the best features for exact classification of smart IoT anomaly and intrusion traffic identification | Wrapper | Bot-IoT dataset | The research reduced the 39 original features to 7 without affecting the model's accuracy | The research focused only on Bot-IoT attacks |
[5] | To develop a feature selection technique based on a differential evaluation algorithm | Wrapper | NSL-KDD dataset | The selected features improved the accuracy and the running time of the model | The results are still not optimistic |
[13] | To develop a wrapper-based feature selection method based on a modified whale optimization algorithm (WOA) | Wrapper | CICIDS2017 and ADFA-LD standard datasets | The improved WOA performed better compared with the traditional WOA in terms of detection rate and accuracy | Investigations can be done in the future to further reduce the features |
[14] | To improve the performance of IDS through the development of a two-phase framework to increase the detection rate as well reducing the false alarm rate | Wrapper | NLS-KDD, ISCX2012, UNSW-NB15, KDD CUP 1999, and CICIDS2017 datasets | Introduction of a new metaheuristic algorithm (MOBBAT), a binary version of the BAT algorithm | The researchers did not consider computational cost as a metric measure |
[1] | Selection of key features using an evolutionary algorithm | Filter and Wrapper | Wine, Ada Sonar, Sylva, Madelon and Gina datasets | Evaluation of the model was done using several datasets to drop any bias | The model was evaluated using only one metric measure |
[15] | To develop a novel feature selection algorithm named hybrid improved dragonfly algorithm (HIDA) | Filter and Wrapper | 10 gene expression datasets and 8 UCI data sets | HIDA has an excellent performance in resolving imbalanced classification problems | High computational complexity compared to the wrapper algorithm |
[16] | Combination of genetic algorithms (GA) and particle swarm optimization (PSO) for best feature selection | Filter and Wrapper | Lung, Hill-Valley, Gas 6, Musk 1, Madelon, and Isolet 5 | The model had a superior performance in feature reduction as well as classification accuracy | The authors did not compare the computational requirement of the model with other models |
[17] | Implementation of binary version of the hybrid grey wolf optimization (GWO) and particle swarm optimization (PSO) for feature selection | Wrapper | 18 standard benchmark datasets from UCI | The proposed model outperformed other models in accuracy, best features selection, and the computational time | The model used on one classification algorithm (KNN) |
[18] | Implementation of a new feature selection named GAWA | Wrapper | Tweeter datasets | The technique had the capability of reducing the feature subsets up to 61.95% without affecting the accuracy level of the model | The model was evaluated using only one dataset, in the future as proposed by the others the model can be confirmed using other datasets |
[19] | Implementation of novel feature selection based on GA and PI | Wrapper | UNSWNB 15 dataset | The proposed model yielded better performance in terms of accuracy and execution time | The main limitation of the model is that it focused on the detection of only two types of attacks |
[20] | Combination of embedded and wrapper for feature selection | Embedded and Wrapper | NSL-KDD dataset | Use of two classification algorithms to test the selected feature subset | The model used the NSL-KDD dataset, a classical dataset that does not capture the current intrusion threats |