Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Li, Jing; Othman, Mohd Shahizan; Chen, Hewan; Yusuf, Lizawati Mi

doi:10.1186/s40537-024-00892-y

Research
Open access
Published: 24 February 2024

Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Jing Li¹,
Mohd Shahizan Othman¹,
Hewan Chen² &
…
Lizawati Mi Yusuf¹

Journal of Big Data volume 11, Article number: 36 (2024) Cite this article

2607 Accesses
3 Citations
Metrics details

Abstract

Internet of Things (IoT) devices are widely used but also vulnerable to cyberattacks that can cause security issues. To protect against this, machine learning approaches have been developed for network intrusion detection in IoT. These often use feature reduction techniques like feature selection or extraction before feeding data to models. This helps make detection efficient for real-time needs. This paper thoroughly compares feature extraction and selection for IoT network intrusion detection in machine learning-based attack classification framework. It looks at performance metrics like accuracy, f1-score, and runtime, etc. on the heterogenous IoT dataset named Network TON-IoT using binary and multiclass classification. Overall, feature extraction gives better detection performance than feature selection as the number of features is small. Moreover, extraction shows less feature reduction compared with that of selection, and is less sensitive to changes in the number of features. However, feature selection achieves less model training and inference time compared with its counterpart. Also, more space to improve the accuracy for selection than extraction when the number of features changes. This holds for both binary and multiclass classification. The study provides guidelines for selecting appropriate intrusion detection methods for particular scenarios. Before, the TON-IoT heterogeneous IoT dataset comparison and recommendations were overlooked. Overall, the research presents a thorough comparison of feature reduction techniques for machine learning-driven intrusion detection in IoT networks.

Introduction

The Internet of Things (IoT) refers to the technology of connecting everyday life and devices to the internet. IoT is growing and changing quickly, with the goal of linking things like wireless sensors, smart cameras, televisions, and other smart home devices online [1]. The number of Internet-connected IoT devices is rising rapidly, with over 2 billion connected in 2017. Experts predict there will be over 7.5 billion IoT devices generating 73.1 zettabytes of data by 2025 [2]. While IoT devices are becoming widespread and assisting people in many areas, they often have very limited security capabilities. This is despite the huge growth of IoT and the large amounts of data it creates. In summary, IoT adoption is surging, connecting billions of devices and generating massive data. However, IoT devices typically lack strong security protections even as their use proliferates.

Due to the security limitations of IoT devices, it is crucial to create network intrusion detection systems (NIDS) that can quickly and dependably detect and prevent attacks on IoT networks [3]. For this purpose, many machine learning techniques have been developed for intrusion detection in IoT, along with public datasets of network traffic [4]. However, these datasets frequently contain numerous irrelevant or redundant features, which negatively impacts the complexity and accuracy of machine learning models [5]. A common approach to develop efficient NIDS is through feature reduction, which decreases the dimensionality of network traffic data fed into the machine learning model. This helps lower computational costs and latency while enhancing model generalization.

Two of the most common are feature selection and feature extraction, which help address the issues caused by excessive features. Feature selection selects a subset of the most informative features from the original set [6]. It reduces dimensionality while retaining the semantic interpretability of the selected features. In contrast, feature extraction transforms the original features into a new low-dimensional space via mathematical projection [7]. While it can effectively reduce dimensionality, however, the extracted features lose intuitive meanings. In the realm of IoT security, feature selection enables the creation of lightweight and efficient IDS by judiciously choosing a subset of the most relevant original features. On the other hand, feature extraction techniques offer a valuable means to transform and distill the essence of the original feature set, reducing overall data dimensionality while retaining critical information. By optimizing the efficiency and interpretability of intrusion detection models, both feature selection and feature extraction become indispensable tools for enhancing the cybersecurity posture of IoT ecosystems, ensuring effective threat detection in a manner tailored to the limitations and intricacies of IoT devices and networks.

While existing works have focused on using either feature selection, feature extraction or hybrid method of the two, to improve certain performance metrics for NIDS [8, 9], there remains a research gap in comprehensively comparing these two methods, especially on modern IoT datasets [10]. Very few studies have evaluated the trade-offs between detection accuracy and computational complexity under the same experimental settings. However, such a comparison is essential to provide guidelines for choosing the appropriate feature reduction technique based on the IoT system constraints and intrusion detection requirements.

Therefore, this research aims to conduct an in-depth investigation of feature selection and feature extraction for building lightweight NIDS tailored to IoT environments. We focus on comparing the two techniques because they take contrasting approaches to reducing dimensionality, and may have different advantages and limitations in the context of IoT-based NIDS [11]. The findings can provide data-driven insights to guide the selection of feature reduction methods for optimal efficiency and detection performance in IoT network protection systems. In summary, our work addresses the gap in comparative studies on feature reduction techniques for machine learning-driven NIDS on IoT data. By benchmarking feature selection and extraction head-to-head, we derive valuable guidelines for striking the right balance between detection accuracy and complexity in IoT environments.

This comparative study reveals that feature selection and feature extraction have different strengths and weaknesses for building lightweight NIDS on IoT data. Our experiments demonstrate that when a substantial number of features are reduced, feature selection generally achieves higher detection accuracy, demanding less training and inference time. Conversely, as the number of features decreases, feature extraction excels over feature selection. Additionally, examining the F1-scores for different attack classes under various feature quantities using various machine learning classifiers provides a deeper insight into the detection capabilities of both methods. This analysis reveals that while feature extraction shows less sensitivity to changes in the number of reduced features, it also demonstrates the ability to detect a wider array of attack types compared to feature selection. Moreover, both methods favorite Decision Tree classifier considering both classification metrics and run time performance, which is more suitable for NIDS in IoT network. Based on these observations, we present a detailed theoretical guide, elaborated in Table 20 within “Result verification statistically” section, to aid in the selection of the most appropriate intrusion detection method for distinct scenarios.

The key contributions in this paper are provided as follows.

1)
A comprehensive performance evaluation between feature selection and feature extraction, involving performance metrics and run-time using the IoT data set, is conducted and evaluated.
2)
The 3-phase machine learning pipeline framework, involving data preprocessing, feature reduction, and classification with multiple machine learning classifiers, is created for performance evaluation.
3)
The NIDS for IoT is tested using public IoT datasets, named Network TON-IoT [10], to build models and compare performance between two feature reduction methods.

The subsequent sections are structured as follows: “Related works” section explores previous studies linked to this research, “Methodology” section explains the proposed methodology, “Experimental setup and analysis” section details the experimental setup and analysis, “Result and analysis” section displays the outcomes and discussions of two feature reduction techniques, and lastly, “Conclusion” section concludes this paper.

Related works

In this section, related studies on NIDSs that were implemented using feature reduction methods are discussed.

In the realm of NIDS, there has been widespread use of feature selection to reduce the complexity of the original traffic data. Many studies employ filter-based feature selection method to select the discriminate feature towards target class. For instance, in study [8], Mutual Information (MI)-based approach was proposed to select the features for NIDS, the study compared both linear and non-linear, specifically correlation-based and MI-based feature selection techniques, while MI-based outperform correlation-based approach on accuracy of attack detection. After that, Ambusaidi et al. [12] introduced a feature selection algorithm that utilized MI in combination with an variant support vector machine classifier. This approach exhibited enhanced accuracy and decreased model complexity compared to prior methods, on datasets such as KDD Cup 99, NSL-KDD [10] and Kyoto 2006+ [13].

In study [14], the authors conducted an analysis of a dataset named UNSW-NB15 [15] for NIDS. The filter-based feature reduction technique using machine learning algorithm such as XGBoost algorithm was applied to select features. In the same way, Disha and Waheed [16] designed feature ranking based on Gini Impurity by Random Forest (RF) to analyze the classification performance for NIDS using the latest TON-IoT dataset, while did not consider too much on computational cost for feature reduction process. However, most of the datasets as networking dataset are outdated as the benchmark data sets to evaluate classification models in NIDS for IoT security.

Furthermore, many studies use wrapper-based feature selection to find out the best feature subsets to improve the classification performance. Shafiq et al. [17] introduced a feature selection method called CorrAUC and a wrapper-based FS algorithm that employs the area under the curve (AUC) metric to choose effective features for machine learning (ML) algorithms. The method was tested on the Bot-IoT dataset [18] with four ML algorithms and the approach effectively selected informative features, however, it had lower precision for certain attacks like keylogging attack.

In addition, various techniques employing heuristic optimization algorithms, such as genetic algorithms (GA) as a search strategy to identify optimal feature subsets are detailed in [19,20,21]. These methods demonstrated lower false alarm rates compared to baseline approaches, using datasets like UNSW-NB15 and KDD99. In study [22], The researchers utilized the Pigeon Inspired Optimizer (PIO) for the feature selection process, binarizing the continuous pigeon inspired optimizer and contrasting it with the conventional approach for binarizing continuous swarm intelligent algorithms. The evaluation was conducted on datasets including KDDCUP99, NLS-KDD, and UNSW-NB15, showcasing outcomes that demonstrated a high detection rate and accuracy while minimizing false alarms. In addition, some studies designed lightweight models to meet the characteristic of IoT network, Liu et al. [23] proposed Particle Swam Optimization (PSO) with one-class Support Vector Machine (SVM) [24] optimized PSO for feature selection with light GBM to build lightweight models for detecting attack. However, it is worth noting that these feature selection strategies often come at a high computational cost, especially when relying on GA, PSO, or machine learning-based classifiers, as a result, which have negative impact on resource-constraint IoT system and networks.

Moreover, many studies employed hybrid feature selection methods to improve the performance of the attack classifiers while reducing overfitting in model training task. In study [25], the authors utilized association rule mining and central attribute values outperformed NSLKDD when tested on the UNSW-NB15 dataset. In addition, some studies employed ensemble feature selection techniques to find out the significant features, for example, Moustafa et al. [26] employed an ensemble Intrusion detection technique, which combined DT, ANN and NB as the base learners to learn the optimal features from statistic flow features, while Leevy et al. [27] employed information gain, information gain ratio, and Chi-squared (Chi²) feature ranking techniques for feature selection. However, the cost for computation is overlooked with the purpose of improving the performance metrics.

As a response to this challenge, researchers investigated a correlation-based feature selection method that offers a more computationally efficient solution for NIDS, considering the correlation among features [6]. This approach was initially applied to the KDD99 and UNSW-NB15 datasets in [28]. More recently, Moustafa et al. [26] proposed correlation-based method which was improved for multivariate correlation-based network anomaly detection systems, moreover, Gavel et al. [29] employed correlation-based fitness function using the ant lion optimization to select features using AWID dataset for wireless network. Zhou et al. [30] chose the optimal features by removing the redundant features and selecting the most informative features based on the threshold of correlation. These works lead to a substantial improvement in NIDS accuracy, albeit with increased complexity. In light of the need for real-time and low-latency attack detection solutions, this study will place greater emphasis on the correlation-based feature selection method.

Unlike feature selection, which maintains a subset of initial features in Network Intrusion Detection Systems (NIDS), feature extraction focuses on condensing the original features into a lower-dimensional vector while preserving much of the data and applied in various research domains. In the research domain of image processing and pattern recognition, feature extraction involves transforming raw data, such as images, into a reduced and more meaningful representation [31]. The primary goal is to capture essential information that is relevant for subsequent analysis, classification, or recognition tasks. For example, Miseikis et al. [32] employed a multi-objective convolutional neural network to extract features, identify and precisely localize the robot in 2D camera images, allowing flexibility in camera movement and providing accurate 3D position estimates for the robot base and joints. Aggarwal [33] explored the use of the Grey-level Co-occurrence Matrix (GLCM) feature extractor in classifying brain tumor MRI images with a random forest classifier. The results indicate that GLCM features with optimal parameters can achieve promising accuracy in capturing significant texture components. Various methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), and autoencoders (AE) based on neural networks, have been utilized for reducing dimensions in NIDS.

For example, in [34], the KDD99 dataset’s dimensionality was greatly reduced by PCA, improving NIDS performance and accuracy while handling attack classification via support vector machines. Various PCA variants, such as hierarchical PCA neural networks using 1998 DARPA dataset [35] and kernel PCA with genetic algorithms [36] have been adopted for intrusion detection to improve precision for less common attacks. PCA is also employed to recent network traffic datasets like UNSW-NB15 and CICIDS2017 can be found in [37, 38]. Additionally, LDA has been utilized as a feature reduction method in NIDS to notably decrease computational complexity, as seen in [39]. In [40, 41] the combination of PCA and LDA were employed to build a two-layer dimension reduction approach, effectively reducing dimensionality, and detecting low-frequency malicious activities over the NSLKDD dataset.

To improve efficiency of feature extraction in NIDS, various research works have applied AE-based neural networks. In particular, Yan and Han [7] introduced a stacked sparse AE approach to build non-linear mapping of high-dimensional to low-dimensional data over the NSLKDD dataset. Khan et al. [42] employed a deep stacked AE to reduce the number of features for both binary and multiclass classification, achieving higher accuracy than previous methods. Several AE-based networks on long short-term memory (LSTM), including variational LSTM [43] and bidirectional LSTM [44], have been developed for dimensionality reduction in NIDS, addressing imbalances and high-dimensional problems effectively. However, it's worth noting that AE-based methods, derived from deep neural networks, entail higher computational costs in both training and testing compared to statistical-based PCA and LDA algorithm.

To mitigate the computational costs issue, a network pruning algorithm was recently proposed to build lightweight detection model in [45] to significantly reduce the complexity of AE structures for feature extraction in NIDS, using UNSW-NB15 and CICIDS data sets. Moreover, in [46], a network design integrates an autoencoder (AE) network using convolutional and recurrent neural networks to extract spatial and temporal features without human intervention.

Since there is a wide range of studies that employed various feature reduction or dimensionality reduction techniques, which can be classified into two methods, namely feature selection and feature extraction, to build lightweight detection models for NIDS. However, few studies conduct comprehensive comparison for the performance and efficiency between the two methods, particularly for IoT data. For example, Aminanto et al. [9] combined AE-based feature extraction and supervised machine learning feature selection to learning representations of the original features, without performance comparison between them, while [47] only conducted comparison for two methods using traditional networking data set UNSW-NB15.

It's important to highlight that most of the previously mentioned studies have concentrated on enhancing either the accuracy of detection or reducing the computational complexity of Network Intrusion Detection Systems (NIDS). They accomplished this by utilizing machine learning classifications and feature engineering methods such as FS and FE to minimize data complexity. Nonetheless, the existing literature lacks a comprehensive comparison between these two feature reduction methods with current datasets in IoT networks. Our study endeavors to fill this gap.

In particular, we initiate the creation of a machine learning-driven NIDS framework utilizing diverse IoT data, emphasizing the feature reduction evaluation phase. Within this context, we identify feature selection through the correlation matrix and feature extraction using PCA as promising approaches for practical low-latency NIDS operations. We then perform an extensive assessment using the contemporary TON-IoT dataset derived from a heterogenous IoT network, comparing performance measures for detection. This includes accuracy, precision, recall, F1-score, and runtime intricacies such as feature reduction time, model training time, and inference time for these methodologies. Our evaluation encompasses both binary and multiclass classifications while maintaining consistency in the quantity of selected or extracted features.

Methodology

In this section, we put FS or FE technique as module factor into the pipeline of machine learning-based network intrusion detection system (NIDS) respectively, according to the final performance metrics of the classification models. Here is the framework of the methodology according to Fig. 1, which can be divided into three phases, data pre-processing, feature reduction, and classification. A detailed explanation of the three workflow of the proposed model is provided as following, particularly for two feature reduction methods.

(1)
Data preprocessing

During this phase, the data is processed by cleansing, partitioning, and normalization to standardize the data format. The dataset is divided into two sets, training for feature reduction and testing for final model prediction. A detailed description is presented in “Phase 1 data preprocessing” section.
(2)
Feature reduction

This critical stage employs FS or FE techniques to identify the most crucial attributes, thereby reducing data dimensionality. The transformed data through both methods is then utilized in subsequent classification tasks. “Phase 2 feature reduction” section offers an in-depth description of feature reduction methods.
(3)
Classification modeling

Various machine learning models, involving Decision Tree, Random Forest, k-Nearest Neighbors, Naive Bayes, and Multiple Layer Perception, are employed to validate the impact of the two feature reduction methods. These models perform binary and multiple classifications, offering a comprehensive comparison based on multiple performance metrics.

Dataset

Below is the key information about the TON-IoT Network dataset, which will be employed in our experiments detailed in “Experimental setup and analysis” section. Subsequently, a comprehensive discussion on data preprocessing for this dataset will be provided.

TON-IoT dataset was generated from heterogeneous data sources collected from Telemetry datasets of IoT and IIoT sensors, operating systems datasets of Windows as well as Ubuntu network traffic datasets. It was first introduced in [48], and this dataset comprises 22,339,021 instances of data and includes two target classes: the “label” class, containing normal and attack data, and another class with ten categories—normal and nine attack types, such as Backdoor, DDoS, DoS, Injection, Password, Ransomware, Scanning, XSS, and MITM. There are six feature groups: Connection, Statistical, DNS, SSL, HTTP, Violation, and Labeling, holding a total of 45 features in the original data. However, in this research, our data analysis involves the “Train_Test_Network.csv” dataset, comprising both training and testing sets, totaling 461,043 records. Table 1 displays the distribution of labels for the binary class and types for the multiple-class, while Table 2 shows the dataset's respective features.

Table 1 Classes description in network TON_IoT

Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Abstract

Introduction

Related works

Methodology

Dataset

Phase 1 data preprocessing

Feature elimination

Missing value handling

Duplicates removal

Non-numerical features encoding

Data splitting

Normalization

Phase 2 feature reduction

Feature selection

Feature extraction

Phase 3 attack classification

Decision tree (DT)

Random forest (RF)

k-Nearest neighbors (kNN)

Naive Bayes (NB)

Multi-layer perceptron (MLP)

Experimental setup and analysis

Experimental setup

Performance evaluation

Hyperparameter settings of classifiers

Features selected based on correlation thresholds

Features extracted based on PCA

Result and analysis

Binary classification

Multiclass classification

Result verification statistically

Conclusions

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords