 Research
 Open Access
 Published:
Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams
Journal of Big Data volume 6, Article number: 103 (2019)
Abstract
Feature selection is mainly used to lessen the dispensation load of data mining models. To condense the time for processing voluminous data, parallel processing is carried out with MapReduce (MR) technique. However with the existing algorithms, the performance of the classifiers needs substantial improvement. MR method, which is recommended in this research work, will perform feature selection in parallel which progresses the performance. To enhance the efficacy of the classifier, this research work proposes an innovative Online Feature Selection (OFS)–Accelerated Bat Algorithm (ABA) and a framework for applications that streams the features in advance with indefinite knowledge of the feature space. The concrete OFSABA method is suggested to select significant and nonsuperfluous feature with MapReduce (MR) framework. Finally, Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier is applied to classify the dataset samples. The outputs of homogeneous IDMLP classifiers were combined using the EIDMPL classifier. The projected feature selection method along with the classifier is evaluated expansively on three datasets of high dimensionality. In this research work, MROFSABA method has shown enhanced performance than the existing feature selection methods namely PSO, APSO and ASAMO (Accelerated Simulated Annealing and Mutation Operator). The result of the EIDMLP classifier is compared with other existing classifiers such as Naïve Bayes (NB), Hoeffding tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC)KNN (K Nearest Neighbour). The methodology is applied to three datasets and results were compared with four classifiers and three stateoftheart feature selection algorithms. The outcome of this research work has shown enhanced performance in accuracy and less processing time.
Introduction
Big data refers to any assortment of data which are outsized and intricate in nature such that conventional database administration systems and data processing tools cannot process. Big data can be characterized by 5 V’s [1, 2] namely “Volume”, “Variety”, “Velocity”, “Variability”, and “Veracity” (Fig. 1).
From the perception of these challenges, it is acknowledged that the conventional data mining methods are neither appropriate for data stream of diverse characteristics nor to achieve the analytical efficiency as it involves periodic analytics in contrast to big data which involves realtime analytics. Also, every time the induction model has to rerun and rebuilt for adding up of the recent data. Besides, to ensure scalability, MapReduce framework [3,4,5] is implemented to parallelize the classification algorithm. Presently, many Machine Learning (ML) techniques are premeditated to big data analytics [6]. For glitches involving big data sets of varied type and nature, deep learning (DL) techniques were formulated for improved performance classification.
DL algorithms are quite beneficial as it handles multifaceted, complex and unstructured data by a greedy layerwise learning [7, 8]. DL procedures have provided significant contributions along with ML applications explicitly speech recognition systems [9], computer vision [10], and NLP [11]. DL has been proficiently used for addressing vivacious issues in big data analytics.
Feature selection (FS) is a momentous step in any classification application. It becomes a complex task when data features are huge. For several years, many research works have focused on FS methods [12, 13]. FS essentially involves the removal of extraneous and redundant features, thereby creating a prediction model with higher efficiency, interpretability, and speed. FS has been applied to many applications involving high dimensional data. Though, comprehensive methods are accessible for FS, there is an exposed challenge in handling rapid big data streams that requires instantaneous processing.
The current literature features many FS methods, but they are conducted in batch mode rather online. In offline mode, all features are available for training before the FS process. Taking into consideration of realtime applications, features may arrive online and accumulation of all training examples becomes expensive. So, OFS [14] is introduced for selecting the features using an online learning approach. In order to extract significant perceptions from big data sets in the online learning process, several relevant features must be efficiently identified. These features have to be also effectively identified from the big data sets so that accurate prediction models can be built in real time. The OFS with data streams is closely associated with data stream mining [15].
When data becomes uncontrollable or huge, the parallel processing is employed for reducing the time complexity. In this research effort, an ascendable efficient OFS method using the parallel Accelerated Bat Algorithm (ABA) technique is proposed to select the features from the data set online. In addition, the proposed Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier is used for large scale data. To work with largescale dataset, a distributed programming model, MapReduce is used which divides the dataset into smaller portions. The scalability of OFSABA over an extremely high dimensional and big dataset is proven through an empirical study which also demonstrates, that the algorithm performs superlatively well than the other stateof theart FS methods.
This research work is structured into various segments. The segment on “Literature review” skeletons the related work done in the field of feature selection and classification, and the motivation behind this research work. The “Proposed methodology” section describes the proposed method in a step by step process, starting from preprocessing to classification. The “Experimental work” section mandates the datasets used, the evaluation metrics, and the outcomes are envisaged as tables and diagrams. The final section, “Conclusion” recapitulates the entire work.
Literature review
Feature selection (FS) technique for big data analytics is envisioned to have a significant feature selection method with reduced time complexity and enhanced accuracy levels. The recent progresses in OFS with MapReduce have given a major revolution in this domain. In the recent years of development, bioinspired associated algorithms have been used for various problems of big data analytics [16].
Hoi et al. [14] has architected an effective algorithm that provides a solution to the problem by giving a theoretical analysis and assessing the performance empirically for OFS on benchmark datasets. The application of OFS was established on real time issues, which significantly scales when compared to other FS algorithms. The outcomes are validated with the efficacy of the projected techniques for extensive and varied large scale applications.
Peralta et al. [17] projected a MapReduce approach to derive a subset of features from large data sets. The FS method was assessed by classifiers such as support vector machine (SVM), Naïve Bayes (NBs) along with logistic regression (LR). The evaluations showed that the spark implemented framework was beneficial to perform evolutionary FS on large data sets with enhanced classification precision and runtime. Tsamardinos et al. [18] accessed the Parallel, forward–backward with pruning (PFBP) algorithm for FS for huge datasets. The experimental study demonstrated increased scalability (number of features) with speedup.
Tan et al. [19] evidenced a novel FS algorithm on big datasets. The algorithm was based on convex semiinfinite programming (SIP), and multiple kernels learning (MKL) subproblem, which is an adaptive accelerated proximal gradient technique, where each base kernel is associated with a set of features. The results show an improved training, competence over bigger data with ultralarge sample size.
De la Iglesia et al. [20] swotted diverse Evolutionary Computation (EC) for FS in classification problems. The development of EC is the competence to efficiently search large population. The assessment and implementation uncovered the competency of these algorithms and further leads to new research direction in FS problems. Nazar and Senthilkumar [21] contributed an efficient, scalable OFS, which used the Sparse Gradient (SGr) for the online selection of features. In this approach, based on the threshold value, the feature weights were proportionally decremented, which zeroed irrelevant featured weights. The experimental results demonstrated heightened accuracy of 15% compared to other methods.
Hu et al. [22] elaborated on a conventional online FS stream dataset. A comprehensive review of the present OFS method was analyzed and compared over other methods. The uncluttered issues were discussed in FS. Yu et al. [23] built a Scalable and Precise Online Feature Selection (SAOLA), online FS model built on pair wise comparison techniques and extended to online group FS. On the augmentation side, SAOLA algorithms were scalable, on high dimensional data sets. It exhibited a superior performance compared to other prevailing algorithms.
The review of the feature selection methods for handling data stream has been discussed in many recent works [24,25,26,27,28,29]. Fong et al. [24] proposed a novel, lightweight Accelerated Particle Swarm Optimization (APSO) feature selection algorithm for big data streams. The APSO algorithm is based on swarm intelligence and the proven results show that the algorithm performed well in terms of accuracy, time complexity, and so on. Five benchmarks datasets are experimented in this work.
Said and Alimi [25] crafted a MultiObjective Automated Negotiation based Online Feature Selection (MOANOFS). The results demonstrated that MOANOFS system can be successfully applied to diverse domains and were able to accomplish high accuracy on real time applications. Lin et al. [26] estimated an improved cat swarm optimization (ICSO) algorithm for big data classification. The algorithm is pragmatic for FS in text classification problems in big data analytics. The proposed ICSO is compared with CSO. The disadvantage here is, it was only pertinent to text classification problems.
Gu et al. [27] projected the competitive swarm optimizer which is a variant of the PSO algorithm, overcomes the shortcomings of conventional PSO when handling large scale datasets, with less computational cost. The algorithm performs FS to select minimal subsets followed by classification. The future work is protracted to explore multiobjective, metaheuristics FS algorithm to handle huge dimensionality with enhanced accuracy.
Manoj et al. [28] prospectively came up with the ACO–ANN algorithm for FS in big data analytics for text classification. The challenge in terms of this approach is to apply other types of data such as images and video. The exertion emphasized the use of populationbased hybrid algorithm for FS problems. Devi et al. [29] proposed the MultiObjective Firefly and Simulated Annealing for online feature selection where the KSVM classifier was used for classification. This scheme had the limitation of having only one classifier and the performance was not compared with the other classifiers.
For classification problems, DL techniques are considered to be efficient [30, 31]. Wan S et al. [30] proposed Deep multilayer perceptron classifier for Parkinson’s disease behaviour analysis. The proposed classifier has demonstrated enhanced performance in terms of accuracy compared with other algorithms. Young et al. [31] outlined the DL based ensemble approach for prediction in big data analytics. This work highlighted the issues of conventional mining, and proved the elevated performance level of Deep neural networks.
From the prevailing literature, it can be deduced that the bioinspired algorithm combined with the MapReduce approach evidences to be effectual and competent in Feature selection (FS) methods in the field of big data analytics. It is evident that DMLP is used for classification problems.
Proposed methodology
MapReduce model is applied to big datasets, which is further divided into smaller partition. In the proposed, an efficient scalable Online Feature Selection (OFS) approach using the Accelerated Bat Algorithm (ABA) technique was recommended for OFS. In this approach, based on the threshold values, the feature weights are proportionally decremented and Clustering Coefficients of Variation (CCV) zeroed incognizant features weights. This work suggested an Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier for large scale data. Also, we have analyzed the impact with a penalty and kernel parameters on the performance of EIDMLP classifier. The scalability of OFS ABA over an extremely high dimensional and big dataset is proven through an empirical study which also illustrates that the algorithm performs supremely improved than the other known FS methods. The proposed model is shown in Fig. 2.
Preprocessing
Normalization is commonly used to maintain the balance of significance amongst the attributes, when attributes are on a diverse scale. When datasets are with diverse range of attributes, they are preprocessed by min–max normalization method. In this process all the values are transferred into same scale between 0 and 1, thus giving importance to the attribute even with the low range of value on scale.
It is the method of scaling the given dataset within the specified range of values between 0 and 1. From the Eq. (1), the normalized feature is derived.
where n is the current value of the feature and n’ is the normalized value of the feature. min_{ds} is the minimum value and max_{ds} is the maximum value of the given dataset. nfmax and nfmin is the normalization range 0, 1 respectively.
OFS
OFS [14] is related to streaming features. The interpretation of OFS [21] is represented by the notation Ds = [Ds_{1}, Ds_{2},…,Ds_{n}]^{T} \(\in\) R^{n×d,} Where Ds_{1},…,Ds_{n} is the given dataset with the feature set Fs = [fs_{1}, fs_{2},…, fs_{d}]^{T} \(\in\) R^{d} and let Cl = [cl_{1}, cl_{2},…,cl_{m}]^{T} \(\in\) R^{m} denote the class label vector. Let d be the number of features which is unknown in priori, the best feature subsets are selected from d such that s < d. Accuracy will be achieved through only selecting the most relevant feature subset for classification. For every Ds_{i,} feature weight vector \({\text{we}}_{\text{i}} \in {\text{R}}^{\text{d}}\) is learned which classifies the instance. After classification, \(we_{n}\) is updated to \(we_{n + 1}\). For features of streaming nature, the number of features is unknown priori, consequently this issue is well handled by OFS. The OFS acquire dataset instance one at a time. For every instance, a weight vector is learned and the class label of the instance is prophesied using the function sign \(\left( {we_{{n^{\prime}}} \times Ds_{n} } \right)\). Followed by, comparison of target and predicted class is done. The weight vector is rationalized using the following stochastic gradient rule given in Eq. (2) when the method misclassifies:
In Eq. (2), \(C^{\prime}\left( {we_{n} ,Ds_{n} , y_{n} } \right)\) implies the cost function and α denotes the rate of the learning. This procedure is concisely specified by EIDMLP classifier.
MapReduce
In MapReduce model, the given dataset \(\left( {Ds} \right)\) is spilt into number of smaller sets and distributed across the network [32] and for every single partition the feature selection algorithm is applied in parallel. The examples are equally distributed and processed in parallel so as to achieve the class balance. In MR, \(\left( {Ds_{i} } \right)\) is mapped into the corresponding \(map_{i}\) task. Throughout the mapping phase, \(Ds_{i}\) comprises of the OFS (in this case, based on ABA).
ABA is applied to each partitions, the output of each map function is represented as \(fe_{i} = \left( {fe_{i1} , \ldots ,fe_{iD} } \right)\), where the number of selected features is denoted by ‘D’. The reduce phase combines the features selected from each partitions, obtaining a vector ‘sf’ given in Eq. (3), where sf_{j} denotes the jth feature.
This is the outcome of the complete OFS process, which is used for further ML process:
where n = number of tasks in the MapReduce. Generally the reduce phase is carried out by a distinct process thus reducing the execution time in MR [33]. The entire execution is done with a single MR process which eliminated the added disk admittances.
Accelerated Bat Algorithm (ABA)
The Accelerated Bat Algorithm (ABA) is formed on the echolocation activities of bats. Bats collect the information of the streaming features. Microbats are capable of echolocation, a fascinating characteristic they possess to find optimal streaming features and classify. The process is given as follows [34, 35]:
 1.
Bats discover food and prey using echolocation.
 2.
Every bat has velocity ve_{i}, with a feature position fp_{i} with freq_{min} fixed frequency, \(\lambda\) varying wavelength and A_{0} loudness. The pulse emission rate varies between \(er \in \left[ {0, 1} \right]\). The wavelength is modified accordingly.
 3.
The loudness is from A_{0} to A_{min}.
freq take the values in a range [freq_{min}, freq_{max}] that correlate to wavelengths \([\lambda_{min} , \lambda_{max}\)]. At time step t, outline the rules how feature position fp_{i} and velocities ve_{i} in a higher dimensional population is given by the following Eqs. (4) to (6) [36].
where \(\beta \in \left[ {0, 1} \right]\) is a random vector drawn from a uniform distribution.
Here \(fp_{*}\) is the current global best solution which is updated on every iteration in comparison with the current position for ‘n’ number of features with the velocity \(\lambda_{i} freq_{i}\). The best feature is selected amongst the current best optimal feature using random walk
where \({\epsilon} {\in}\) [− 1, 1] is a random number with a loudness average \(A^{t} = \left\langle {A^{t} i} \right\rangle\). Both A_{i} and \(er_{i}\) of pulse emission rate is adapted consequently. The loudness and pulse emission rates are contrariwise proportional. To set the starting position for applying ABA, some feature ranker function and mutation operator is applied to augment the accuracy of the classifier. Initial position of the BA is rearranged by using the Gaussian Mutation operator. Let \(f_{i} \in \left[ {a_{i} , b_{i} } \right]\) be a real variable. Then the truncated Gaussian mutation operator changes \(f_{i}\) to a neighboring value using the following probability distribution [37]:
where \(\varphi \left( z \right) = \frac{1}{{\sqrt {2\pi } }} exp \left( {  \frac{1}{2}z^{2} } \right)\) is the probability distribution of the standard normal distribution and Φ(·) is the cumulative distribution function.
This mutation operator has a mutation strength parameter σ_{i} for every features, which should be related to the bounds a_{i} and b_{i}, \(\sigma = \sigma_{i} /\left( {b_{i} {} a_{i} } \right)\) as a fixed nondimensionalized parameter for all m features. To implement the above concept use the following Eq. (9 to 11) to compute the offspring \(f_{i}^{\prime }\):
where \(\alpha = \frac{{8\left( {\pi  3} \right)}}{{3\pi \left( {4  \pi } \right)}}\) ≈ 0.140012 and sign \(\left( {u^{\prime}_{i} } \right)\) is − 1 if \(u^{\prime}_{i}\) < 0 and is + 1 if \(u^{\prime}_{i}\) ≥ 0. Also, u_{L} and u_{R} are calculated as follows
Thus, the Gaussian mutation procedure for mutating ith feature variable f_{i} is as follows:
Step 1: Create a random number u_{i} ∈ [0, 1].
Step 2: Use Eq. (9) to create offspring \(f^{\prime}_{i}\) from parent \(f_{i}\)
CCV is used as fitness function [38] to select the optimal features with a balance between class and overfitting problem. This function is mainly applied for building an accurate prediction model. Higher the CV, the features are considered.
Let Ds be a dataset with n instances and m features. An instance \(\left( {f_{1} ,f_{2} , \ldots, a_{m} } \right)\) is divided into number of groups with classes \(c \in C\) is the total number of prediction target classes. For each \(f_{a} ,a \in \left[ {1..m} \right]\), \(f_{a} \in \left\{ {f_{a}^{1} ,f_{a}^{2} , \ldots, f_{a}^{c} } \right\}\)
v_{d} is the sum of all coefficients of variation for each class c where \(c \in \left[ {1..C} \right]\), for that particular ath feature.
Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP)
The Deep Multiple Layer Perceptron (DMLP) is experimented to improve classification results of big data stream. An ensemble method combines the output of individual homogenous classifiers applied to the given big datasets. Output from each ensemble is selected and combined using majority voting rule [39]. DMLP is a multiple feed forward artificial neural network that maps input vectors to that of the output vectors. It is a connected graph with numerous layers namely input, hidden, and output layer. In this fully connected network, one to one layer connectivity is established. Also, it allows one or more hidden layers. Except for input layer nodes, rest of other neurons are associated with a nonlinear activation function. DMLP, an leeway of a singlelayer perceptron corrects the weakness that single layer perceptron that cannot unravel nonlinear data with 3 ensembles. It can learn separable decisions non linearly. It is shown in the Fig. 3.
In DMLP [30], five or ten hidden layers are implemented in contrast to the conventional simple two layer MLP. Generally, sigmoid and tanh activation functions show elevated performance in small to medium sized networks. By hard preventive the input of undesirable hidden nodes to zero, the function permits them to obtain sparse depictions. The term shortcut is the connection that span across the multiple layers. But, in DMLPs shortcuts are generally avoided hence all nodes of one layer is connected with the subsequent layer.
The following are the list of nodes(j) in the layers:
Succ(i) is where is connection i \(\to\) j exits.
Pre(i) is where is a connection j \(\to\) i exits.
For every connection between the layer i and layer j, the weight we_{ji} is assigned. All hidden and productivity nodes have a network input net_{i} and ac_{i} activation output. In mining streaming data, data instances are generated frequently over the time. The issue arises to update the model every time without reloading the entire batch. This frequent updation even becomes very crucial if the data examples are massive. The model should be updated incrementally. To solve this incremental problem, an incremental DMLP classification model is proposed. The method is also named anytime algorithm as big dataset samples are read only once without the need to store or reload the samples every time. A tree is built by the induction method which selects an attribute to be classified by estimating the necessary indicators that registers the counts of every attribute value. To compute the frequency of attribute value at_{ij} of attribute at_{i} corresponding to class y_{k}, the Hoeffding Bound (HB) [38] is calculated using Eq (15).
where R is the class distribution and n is the number of instances that are perceived to a class. At any particular time, the top value of H(.) called \(a{\text{t}}_{\text{ia}}\) = argmax (at_{ia}).
Similarly, the second top value is at_{ib}, where at_{ib}=argmax H (at_{ij}),∀≠. The two top values are incrementally taken as induction leaves and new data comes. The value ΔH (at_{i}) = Δ H (at_{ia}) − Δ H (at_{ib}) is calculated for each attribute at_{ib} where i \(\in\) I as the difference between the two calculated top values. Using Eq. (16), a confidence interval is computed as r_{true}, for the n number of instances till then. This is done to confirm the relation of attribute value at_{ij} to class y_{k}. The confidence intervals are observed incrementally as the only statistics for each attribute ATi, r−HB ≤ r_{true}< r+ HB where=(1/)Σr_{i} is retained. When the equality and r_{true} < 1 hold true over the observed samples, then the tested \({\text{at}}_{\text{i}}\), is the best statistical candidate with good accuracy among part of the data stream in entirety.
The classified outputs of IDMLP classifier is now combined using Majority Voting. Let \(Tr_{i}\) be total set of examples (N) and CL be a set of output (Q) classes. Let S = {A_{1}, A_{2}, A_{M}} be an algorithm set which contains the M classifiers to be used for voting. Every example TR ∈ \(r_{i}\), the prediction is made and the Classifier Q has all the predicted classes. Here, final class assigned is the class of each example predicted by the majority of classifiers by gaining majority votes which is explained as follows. Let cl_{l} ∈ CL denotes the class of an example ‘tr’ predicted by a classifier A_{l}, and let a counting function F_{k} defined as:
where \(cl_{l}\) and \(cl_{k}\) are the classes of CL. The sum of entire votes for class \(cl_{k}\) and it is defined via the use of the major vote function (\(mv_{Mk}\)):
S is the set of class that gained the majority of vote, with class cl for example tr is given as:
Two strategies are used when more than one class conflicts with the same vote. The classes are arbitrarily chosen (SMV) in the first strategy whereas in the second strategy, an Influence Majority Vote (IMV) chooses the class given by the ensemble’s best classifier.
Experimental work
Dataset
The proposed methodology is implemented with benchmark mark dataset from the “UCI repository”. The characteristics of big datasets experimented is shown in the Fig. 4. Arcene dataset consists of 10,000 features and 900 instances. This dataset was merged from the mass spectrometry datasets. Based on the numerous feature characteristics, the cancer patient has to be identified from the healthier one from the dataset.
The Dorothea dataset consists of various molecular properties of drug combination. The molecular features must be either active or inactive combination for drug formation. The classification task is to identify the molecules of binding nature or not. The identification of the binding property further leads to designing the new drug compounds with added properties like absorption, duration of action etc.,
The gisette dataset is used for hand written digit recognition, has 13,500 instances and 5000 features. The challenge here is to classify the digits four and nine. The distractive features were added into the dataset for feature selection.
Performance evaluation
The performance results are measured in expressions of Accuracy, Precision, Recall, Fmeasure, Processing time.
Sensitivity is also known as the true positive rate (TPR) which evaluates the amount of positives that are appropriately identified as positive.
From the confusion matrix (Fig. 5), precision is interpreted as follows:
Fmeasure is the ratio precision and recall, given in Eq. (21).
Accuracy (Eq. 22) is the unit of measurement that quantify how well the classifiers perform. It is the ratio of correctly predicted samples to the total number of tested samples.
Results and discussion
The results of the experiment are discussed in this division. The experiment is carried out in MATLAB environment, using Parallel Computing Toolbox and it is implemented on the system of 1 TB of HDD and 16 GB RAM capacity. The precision, recall, Fmeasure, and accuracy are the metrics used to assess the performance of this research work. MROFSABA method has shown enhanced performance than the existing feature selection methods namely PSO, APSO and ASAMO (Accelerated Simulated Annealing and Mutation Operator) [37, 40]. The result of the EIDMLP classifier is compared with other existing classifiers such as Naïve Bayes (NB), Hoeffding Tree (HT) and FMCCSC (Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC)KNN (K Nearest Neighbour). The methodology is applied to three datasets and results were compared with four classifiers and three stateoftheart feature selection algorithms. All the datasets are preprocessed first. Then feature selection is carried out followed by classification which is discussed in “Proposed methodology” section. Tables 1, 2, and 3 consolidates the performance of the datasets Dorothea, arcene, and gisette.
Figure 6 depicts the accuracy comparison of the MROFSABA with EIDMLP classifier. The accuracy of Dorothea classification of active drug compounds are measured as 98.6%, 97.37%, 96.8%, 96.62%. The execution time is also substantially reduced in MR approach. From the Fig. 7, The execution time is 0.056, 0.068, 0.18, and 1.25.
Figure 8 depicts the accuracy comparison of the MROFSABA with EIDMLP classifier. The accuracy of arcene dataset to analyze the patient is affected with cancer or not is measured as 99%. The execution time is also shown in Fig. 9, as 0.053, 0.062, 0.13, and 1.68.
The performance metrics for the gisette dataset is shown in Table 3.
Figure 10 depicts the accuracy comparison of the MROFSABA with EIDMLP classifier. The accuracy of gisette to identify the digits 4 or 6 is measured as 98.6%, 98%, 96.7%, 96.3%. The execution time is also shown in presented in the Fig. 11, as 0.44, 0.05,0.19,4.72.
A receiver operating characteristic curve (ROC), is a graphical representation of classification model performance. The plot is drawn taking FPR, TPR along the axis x, y respectively. From the Fig. 12, the MROFSABAEIDMLP curve is higher, thus proposed model performance is also higher.
Conclusion
This paper focuses on innovative feature selection mechanism termed as OFS Accelerated Bat Algorithm (ABA) is proposed to choose the most important features from online streaming features. The proposed OFSABA algorithm employs MapReduce (MR) perception in a streaming method towards assessment of improving the run time among features. Lastly, Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier is anticipated to classify dataset samples. The methodology is applied to three datasets and results were compared with four classifiers and three stateoftheart feature selection algorithms. In this research work, MROFSABA method has shown improved performance than the existing feature selection methods namely PSO, APSO and ASAMO (Accelerated Simulated Annealing and Mutation Operator). The outcome of the EIDMLP classifier is compared with other prevailing classifiers such as Naïve Bayes (NB), Hoeffding tree (HT) and FMCCSC (Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC)KNN (K Nearest Neighbour). The upshot of this work has shown heightened performance in accuracy and less processing time. In big data analytics, it is really challenging to combat the all the characteristics of big data. Indeed, in the proposed model the Volume, Variety, and Velocity is handled in a most proficient way. But, these characteristics may evolve into new dimensions in near future. The research challenge is to develop feature selection model for upcoming challenges and complexities.
Availability of data and materials
Datasets used in this work are available in UCI Machine Repository.
Abbreviations
 OFS:

Online Feature Selection
 ABA:

Accelerated Bat Algorithm
 EIDMLP:

Ensemble Incremental Deep Multiple layer Perceptron
 MR:

MapReduce
 ML:

machine learning
 DL:

deep learning
 FS:

feature selection
 NLP:

natural language processing
 SVM:

support vector machine
 NB:

Naïve Bayes
 LR:

logistic regression
 PFBP:

forward–backward with pruning
 SIP:

semiinfinite programming
 MKL:

multiple kernel learning
 EC:

evolutionary computation
 SAOLA:

Scalable and Precise Online Feature Selection
 MOANOFS:

MultiObjective Automated Negotiation based Online Feature Selection
 DMLP:

Deep Multiple Layer Perceptron
 HT:

Hoeffding tree
References
 1.
AlNuaimi N, et al. Streaming feature selection algorithms for big data: a survey. Appl Comput Inform. 2019. https://doi.org/10.1016/j.aci.2019.01.001.
 2.
Oussous Ahmed, et al. Big data technologies: a survey. J King Saud Univ Comput Inf Sci. 2018;30(4):431–48.
 3.
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.
 4.
Chu CT, Kim SK, Lin YA, Yu Y, Bradski G, Olukotun K, Ng AY. Mapreduce for machine learning on multicore. In: Advances in neural information processing systems. p. 281–288; 2007.
 5.
Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–7.
 6.
Athmaja S, Hanumanthappa M, Kavitha V. A survey of machine learning algorithms for big data analytics. In: International conference on innovations in information, embedded and communication systems (ICIIECS). p 1–4; 2017.
 7.
Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–54.
 8.
Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layerwise training of deep networks. In: Advances in neural information processing systems. p. 153–160; 2007.
 9.
Dahl G, Ranzato M, Mohamed AR, Hinton GE. Phone recognition with the meancovariance restricted Boltzmann machine. In: Advances in neural information processing systems. Curran Associates, Inc; p. 469–77; 2010.
 10.
Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol. 25. Curran Associates, Inc; p. 1106–1114; 2012.
 11.
Mikolov T, Deoras A, Kombrink S, Burget L, Cernock`y J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: INTERSPEECH. ISCA. p. 605–608.
 12.
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.
 13.
Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng. 2005;17(4):491–502.
 14.
Hoi SC, Wang J, Zhao P, Jin R. Online feature selection for mining big data. In: Proceedings of the 1st international workshop on big data, streams and heterogeneous source mining: algorithms, systems, programming models and applications. p. 93–100; 2012.
 15.
Stefanowski J, Cuzzocrea A, Slezak D. Processing and mining complex data streams. Inf Sci. 2014;285:63–5.
 16.
Gill SS, Rajkumar B. Bioinspired algorithms for big data analytics: a survey, taxonomy, and open challenges. In: Big data analytics for intelligent healthcare management. Academic Press; p. 1–17; 2019.
 17.
Peralta D, del Río S, RamírezGallego S, Triguero I, Benitez JM, Herrera F. Evolutionary feature selection for big data classification: a MapReduce approach. Math Prob Eng. 2015;2015:246139.
 18.
Tsamardinos I, Borboudakis G, Katsogridakis P, Pratikakis P, Christophides V. A greedy feature selection algorithm for Big Data of high dimensionality. Mach Learn. 2019;108(2):149–202.
 19.
Tan M, Tsang IW, Wang L. Towards ultrahigh dimensional feature selection for big data. J Mach Learn Res. 2014;15:1371–429.
 20.
de La Iglesia B. Evolutionary computation for feature selection in classification problems. Wiley Interdiscip Rev Data Min Knowl Discov. 2013;3:381–407.
 21.
Nazar NB, Senthilkumar R. An online approach for feature selection for classification in big data. Turk J Electr Eng Comput Sci. 2017;25(1):163–71.
 22.
Hu X, Zhou P, Li P, Wang J, Wu X. A survey on online feature selection with streaming features. Front Comput Sci. 2018;12(3):479–93.
 23.
Yu K, Wu X, Ding W, Pei J. Scalable and accurate online feature selection for big data. ACM Trans Knowl Discov Data (TKDD). 2016;11(2):16.
 24.
Fong S, Wong R, Vasilakos A. Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput. 2016;1:1–1.
 25.
Said FB, Alimi AM(2018) MOANOFS: Multiobjective automated negotiation based online feature selection system for big data classification. arXiv preprint arXiv:1810.04903.
 26.
Lin KC, Zhang KY, Huang YH, Hung JC, Yen N. Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput. 2016;72(8):3210–21.
 27.
Gu Shenkai, Cheng Ran, Jin Yaochu. Feature selection for highdimensional classification using a competitive swarm optimizer. Soft Comput. 2018;22(3):811–22.
 28.
Manoj RJ, Praveena MA, Vijayakumar K. An ACO–ANN based feature selection algorithm for big data. Cluster Comput. 2019;22:3953–60.
 29.
Devi SG, Sabrigiriraj M. A hybrid multiobjective firefly and simulated annealing based algorithm for big data classification. Concurr Comput Pract Exp. 2019;31(14):e4985.
 30.
Wan S, Liang Y, Zhang Y, Guizani M. Deep multilayer perceptron classifier for behavior analysis to estimate Parkinson’s disease severity using smartphones. IEEE Access. 2018;6:36825–33.
 31.
Young S, Tamer A, Ayse B. Deep super learner: a deep ensemble for classification problems. In: Canadian conference on artificial intelligence. Springer, Cham; 2018.
 32.
Triguero I, Peralta D, Bacardit J, García S, Herrera F. MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing. 2015;150:331–45.
 33.
Chu CT, Kim SK, Lin YA et al. Mapreduce for machine learning on multicore. In: Advances in neural information processing systems. p. 281–288; 2007.
 34.
Yang XS. A new metaheuristic batinspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Berlin: Springer; p. 65–74; 2010.
 35.
Yang XS, Hossein Gandomi A. Bat algorithm: a novel approach for global engineering optimization. Eng Comput. 2012;29(5):464–83.
 36.
Akhtar S, Ahmad AR, AbdelRahman EM. A metaheuristic batinspired algorithm for full body human pose estimation. In: Ninth conference on computer and robot vision. p. 369–75; 2012.
 37.
Renuka Devi D, Sasikala S. Accelerated simulated annealing and mutation operator feature selection method for big data. Int J Recent Technol Eng. 2019;8:910–6.
 38.
Fong S, Wong R, Vasilakos AV. Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput. 2016;9(1):33–45.
 39.
Bouziane H, Messabih B, Chouarfia A. Profiles and majority votingbased ensemble method for protein secondary structure prediction. Evol Bioinform. 2011;7:EBOS7931.
 40.
Sasikala S, Renuka Devi D. A review of traditional and swarm search based feature selection algorithms for handling data stream classification. In: Third international conference on sensing, signal processing and security (ICSSS), New York: IEEE; 2017.
Acknowledgements
Not applicable.
Funding
No external funding.
Author information
Affiliations
Contributions
Both authors read and approved the final manuscript.
Corresponding author
Correspondence to D. Renuka Devi.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Renuka Devi, D., Sasikala, S. Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams. J Big Data 6, 103 (2019) doi:10.1186/s4053701902673
Received:
Accepted:
Published:
Keywords
 Feature selection
 Online Feature Selection (OFS)
 Big data
 Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP)
 Accelerated Bat Algorithm (ABA)
 MapReduce (MR)