 Research
 Open access
 Published:
Gene selection via improved nuclear reaction optimization algorithm for cancer classification in highdimensional data
Journal of Big Data volume 11, Article number: 46 (2024)
Abstract
RNA Sequencing (RNASeq) has been considered a revolutionary technique in gene profiling and quantification. It offers a comprehensive view of the transcriptome, making it a more expansive technique in comparison with microarray. Genes that discriminate malignancy and normal can be deduced using quantitative gene expression. However, this data is a highdimensional dense matrix; each sample has a dimension of more than 20,000 genes. Dealing with this data poses challenges. This paper proposes RBNRODE (Relief Binary NRO based on Differential Evolution) for handling the gene selection strategy on (rnaseqv2 illuminahiseq rnaseqv2 un edu Level 3 RSEM genes normalized) with more than 20,000 genes to pick the best informative genes and assess them through 22 cancer datasets. The knearest Neighbor (kNN) and Support Vector Machine (SVM) are applied to assess the quality of the selected genes. Binary versions of the most common metaheuristic algorithms have been compared with the proposed RBNRODE algorithm. In most of the 22 cancer datasets, the RBNRODE algorithm based on kNN and SVM classifiers achieved optimal convergence and classification accuracy up to 100% integrated with a feature reduction size down to 98%, which is very evident when compared to its counterparts, according to Wilcoxon’s ranksum test (5% significance level).
Introduction
DNA contains our recipe, “our genetic code”. Although each cell’s DNA is the same, each tissue structure is distinct and has a unique function, as DNA expresses which genes in a cell are active and which are not engaged through a mechanism called RNA transcription. This RNA is then converted into a protein responsible for cell structure and function. Therefore, analyzing a transcriptome profile is our method for determining the genetic changes in each cell from which we can evaluate diseases’ biomarkers. Differential expression analysis aims to discover quantitative changes in expression levels through statistical analysis to classify genes whose expression levels vary under different conditions, which helps us understand diseases and control them. In this manner, Gene expression Profiling technologies have been significantly developed. There are two leading popular technologies: the hybridizationbased technique “microarray”, which is elder, and the nextgeneration sequencingbased “RNASeq” [1]. Both techniques are meant to quantify gene expression for statistical analysis and classification. The quantification data based on the nextgeneration sequencingbased RNASeq technique is chosen in this paper because it can detect RNA quantification levels more accurately than microarray data. This reason is not the only advantage of the RNASeq technique but also because the previous technique has many limitations that have been overcome thanks to the nextgeneration sequencingbased technology [2], which is the base of the RNAseq method as mentioned above. One primer obstacle in the microarray was the reliance upon existing sequencing knowledge that limited the detection range; this obstacle is no longer a problem in RNAseq as it requires no previous knowledge and makes our dynamic detection range wide. That choice helps our results’ accuracy and the set of genes we get and gives us a close understanding of the disease’s accurate biomarker.
Lyu et al. [3] presented the scope of determining cancer genetic biomarkers depending on RNASeq gene expression data; it worked on normalizedlevel3 RNASeq gene expression data of 33 tumor types in PanCancer Atlas, which we have also worked on in this paper. However, it was noted that the researchers in paper [3] used mixed samples of nontumor samples as if they were all tumors. Therefore, we made a code to separate samples based on their type for binary classification and more accurate tumor data. It is noteworthy that every record of data is comprised of a set of 20531 genes “features”, which includes an abundance of extraneous genes and extra information.
The curse of dimensionality [4] is a popular challenge as a result of the evolutionary era of data availability, which leads to progress in Feature Selection (FS) algorithms and techniques. Generally, FS techniques follow four approaches: filter approach, wrapper approach, embedded approach, and hybrid approach [5, 6]. All these approaches aim to select the best features to distinguish the classes, which are, in our case, the informative genes related to their tumor.
The filter approach depends on the single relationship of each gene using statistical scores to represent the strength, which achieves high accuracy and selects the best group of genes. However, working on each gene separately discards the reality of the interrelationships between genes, and it can be trapped in a local optimum. It is also worth mentioning that the filter approach includes subtypes univariate and multivariate; the main difference is that the multivariate considers correlation in its rank. Examples of filter approach are ttest [7], Fisher score [8], signaltonoise ratio [9], information gain [10], and Relief [11].
The wrapper approach can be seen as an exploration of all possible subsets, and the principle is to create and test a subset of genes. A particular classifier determines the output of a given subset, and the classification algorithm is used many times for each evaluation. This approach achieves higher performance than the filter approach because of the reality that it uses a classification algorithm that guides the learning process. However, that classifier requires high computational cost and slows the process, especially with our highdimensional data.
A metaheuristic is a higherlevel procedure or heuristic used in computer science and mathematical optimization to find, generate, or select a heuristic (partial search algorithm) that may offer a good enough solution to an optimization problem, particularly when there is incomplete or imperfect information or limited computing power. A subset of solutions that would otherwise be too numerous to be fully enumerated or otherwise investigated is sampled by metaheuristics. Metaheuristics may only make a few generalizations about the optimization problem, making them useful for various issues. Metaheuristics do not guarantee that a globally optimal solution can be found for a class of problems, unlike optimization algorithms and iterative techniques. Numerous metaheuristics use stochastic optimization, meaning that the outcome depends on the collection of generated random variables. Metaheuristics are generally more effective than optimization algorithms, iterative techniques, or basic heuristics in combinatorial optimization because they search a much more extensive range of feasible solutions. As a result, they are advantageous strategies for optimization issues. Several publications and research papers have been released on the issue. Metaheuristic approaches can successfully address the FS problem among several wrapper solutions. Stochastic techniques may produce optimum (or nearly optimal) answers quickly, and academics have begun to use them. These techniques have many benefits, such as flexibility regarding dynamic changes, the ability to selforganize without requiring specific mathematical properties, and the capacity to evaluate multiple solutions simultaneously. For that, metaheuristic algorithms have attracted researchers’ attention for tackling optimization problems. Several metaheuristicbased algorithms for solving the FS issue have recently been developed [12]. These algorithms yield trustworthy (nearoptimal) solutions at a drastically decreased computational cost.
Evolutionary Approaches (EA), Swarm intelligence (SI) approaches, and Physicsbased Approaches (PHA) are the classes of metaheuristic approaches. SI approaches are a group inspired by swarms and animals’ behavior habits [13]. Multiple SI methods have been proposed in the literature and above. They have obtained reliable outcomes in a broad range of optimization issues, such as Particle Swarm Optimization (PSO) [14], Artificial Bee Colony (ABC), [15], Sparrow Search Algorithm (SSA) [16], Grey Wolf Optimization (GWO) [17], Bat Algorithm (BA) [18], Wheel Optimization Algorithm (WOA) [19], Grasshopper Optimization Algorithm (GOA) [20], Sailfish Optimizer (SFO) [21], Bird Swarm Algorithm (BSA) [22], and Harris Hawks Optimization (HHO) [23]. The EA approaches are designed by simulating biological evolutionary patterns such as mutation, crossover, and choice. Genetic Algorithm (GA) [24], Differential Evolution (DE) [25], COVIDOA Optimization Algorithm [26], invasive tumor growth optimizer [27], and biogeographybased optimizer [28] are significant EAbased metaheuristic methods that have demonstrated their effectiveness in multiple optimization areas. PHA has been created using the rules of physics found in nature techniques, including SA [29], Gravitational Search Algorithm (GSA) [30], Atom Search Optimization (ASO) [31], and Henry Gas Solubility Optimization (HGSO) [32].
The embedded approach uses a learning algorithm to choose the relevant genes, directly interacting with the classification; the FS algorithm is integrated as part of the learning algorithm. The learning model is trained using an initial feature set to establish a criterion for measuring the rank values of features. The main objective is to reduce the computation time for reclassifying different subsets, which is done in wrapper methods by incorporating the FS into the training process. The most common embedded techniques are tree algorithms like Random Forest (RF). Some embedded methods perform feature weighting based on regularization models with objective functions that minimize fitting errors and, in the meantime, force the feature coefficients to be small or precisely zero. These Methods are the LASSO [33] with the L1 penalty, Ridge with the L2 penalty for constructing a linear model, and Elastic Net [34]. Examples of the embedded approach are SVM based on Recursive Feature Elimination (SVMRFE) [35], RF [36], and the First Order Inductive Learner (FOIL) rulebased feature subset selection algorithm.
The hybrid approach is designed to combine the filter and wrapper approaches to achieve the advantage of each and maximize each approach’s benefits. The feature space dimension space is first reduced using a filter approach, which may produce numerous candidate subsets with moderate complexity. Then, a wrapper is used as a learning strategy to determine the best candidate subset. The excellent efficiency of filters and the high accuracy of wrappers are typically achieved via hybrid approaches. Many intriguing methodologies, hybrid genetic algorithms [37], hybrid ant colony optimization [38], and mixed gravitational search algorithm [39], have recently been proposed. Practically any combination of filter and wrapper can be used to create a hybrid methodology.
Motivation and contributions
Nuclear Reaction Optimization (NRO) [40] is A brandnew metaheuristic algorithm for global optimization, which mimics the nuclear reaction process. The proposed NRO algorithm can be divided into two phases, nuclear fission (NFi) and nuclear fusion, in accordance with the definitions of nuclear reaction characteristics (NFu). The nuclear fission phase primarily mimics this mechanism. The Gaussian walk and differential operators between the nucleus and neutron have been used for exploitation and exploration in nuclear fission based on the types of nuclei and the probability of decay following bombardment. The NFu phase primarily mimics the fusion of nuclear reactions. The ionization and fusion processes of the NFu can be included in this phase.
In order to address the Gene Selection (GS) problem, this paper suggests an improved binary version of the NRO algorithm, known as the RBNRODE algorithm, which is a promising method and shows precise performance. Initially, there’s a chance that the suggested algorithm will avoid local optima and achieve sufficient search accuracy, rapid convergence, and enhanced stability. The suggested RBNRODE achieves improved efficacy by obtaining optimal or nearly optimum outcomes for many of the investigated issues, in contrast to stateoftheart metaheuristic algorithms. Furthermore, RBNRODE uses a transfer function to convert continuous data into discrete values, and it incorporates the Relief algorithm and the DE technique for boosting exploration capacity and the best outcomes found inside the solution space through iterations. The rationality for applying the RBNRODE approach in FS is due to the fact that it is easy to understand and create, can handle a wide range of optimization problems and achieves worthwhile outcomes in a reasonable amount of time and lower computational costs; it also utilizes few control parameters. The fundamental contributions of this paper can be presented in the following:

RNAseq nextgeneration sequencingbased level 3 data is preprocessed.

The proposed NRO algorithm is a novel type of metaheuristic algorithm that has not been applied before to RNASeq gene expression data. Thus, its ability to resolve this issue has not been examined.

NRO is modified and then recreated to develop a binary version called the RBNRODE algorithm.

For improving the feature space exploration capacity and enhancing the acquired optimal outcomes, the proposed RBNRODE algorithm embeds a Relief algorithm and a DE technique with the binary version of the NRO algorithm. This embedding enhances the algorithm’s performance by producing a new population that maintains the fundamental structure but has more appropriate positions.

As GS has a broad search space, it frequently leads to the issue of being trapped in local optima in most current algorithms. The RBNRODE can efficiently explore large spaces to locate optima or near optima solutions while avoiding falling into local optima.

The final results are estimated based on various performance metrics, including mean of fitness rate, mean of accuracy rate, and mean of features count selected.

The influence of the proposed RBNRODE algorithm using the two suggested classifiers (kNN and SVM) is compared with its peers of literature algorithms.

The proposed RBNRODE algorithm is evaluated on 22 different types of cancer datasets, and the results are displayed.

The selected genes are conducted with cancertype biomark.
Structure
The rest of the paper consists of five sections as follows: “Related work” section discusses the literature of FS with genome data; then “Background details” section analyzes and elaborates the base concepts of the presented methodology background; after that in “Proposed relief binary NRO based on DE (RBNRODE) for gene selection” section provides a detailed explanation for the proposed RBNRODE algorithm, which is the improved version of NRO and its parameters to handle GS; “Experimental results and discussion” section presents the experimental results and comparisons with some competitive algorithms; and finally “Conclusion and future work” section contains the work conclusion and suggestions for future research.
Related work
This section will demonstrate the literature on researchers’ techniques to handle the highdimensionality of genome data for accurate classification. Deleting irrelevant genes plays an essential role in the performance of classification algorithms, so selecting genes is a necessary step before using any Machine Learning (ML), deep learning (DL) algorithms, or other classification methods. For this consideration, we have studied some related studies in this scope to reach the goal of RNAseq classification for cancer detection.
Li et al. [41] had an interest in finding tumors’ biomarkers; they worked on the pancancer public data set for 31 different types. GA/KNN was the method they used to extract the genes. In this method, they carried out multiple iterations of subsets of genes and then asses the accuracy with the kNN algorithm. Using the resultant accuracy, they chose the best set of features. This method, with 90% success, has been used across 31 types of cancer.
Lyu et al. [3] presented work to find specific cancer biomarkers; they depended on the importance of genes to their contribution to the classification. They followed these steps: preprocessing the data and applying tumortype classification using a convolutional neural network. After that, they generated heat maps for each class to pick out the genes corresponding to pixels with top intensities in the heat maps and finally validate the pathways of selected genes. In preprocessing, they used a variance threshold of 1.19 to delete the gene expression levels that had not changed as the GS step, which reduced the number of genes to 10381 of 19531, which is a filtering approach and The final accuracy they got was 95.59. Although the accuracy was good, it can still be much better, which can be achieved using a better FS approach to reduce that dimensionality.
Khalifa et al. [42] have followed the paper mentioned above [3] however, They focused on five cancer types of data: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), breast invasive carcinoma (BRCA), kidney renal clear cell carcinoma (KIRC) and uterine corpus endometrial carcinoma (UCEC). The total dataset is 2086 rows and 972 columns; each row contains a specific sample and the RPKM RNASeq values of a particular gene [43]. They used the hybrid approach for preprocessing the data as they proposed binary particle swarm optimization with design trees (BPSODT) algorithm; 615 features out of 971 were chosen as the best features of RNAseq. The presented results and the performance metrics performed in this research showed that the proposed approach achieved an overall testing accuracy of 96.90%. The comparative results were introduced, and the accuracy achieved in the present work outperforms that of other related work for the testing accuracy for five classes of tumors. Moreover, the proposed approach is less complex and has less time for training.
Xiao et al. [44] evaluated their method on three RNASeq gene expression data sets: lung adenocarcinoma, Stomach Adenocarcinoma (STAD), and breast invasive carcinoma. They depended on the DL technique as they used five different classification models followed by the DL model to ensemble each result of the five models and that have made an improvement in all the predictions evaluation as follows: LUAD \(99.20\%\), BRCA \(98.4\%\), and STAD 98.78&.
Liu et al. [45] have investigated genetic data but not in RNASeq. However, they used microarray data and followed the hybrid approach as well. Unlike the papers mentioned above, they have worked on each type independently. They used four gene datasets of colon cancer, smallroundbluecell tumors, leukemia, and lung cancer to evaluate the algorithm’s performance. The algorithm depends on Relief as the feature prefilter to remove the genes with low relevancy with the cancer type. PSO is used as the search algorithm, and finally, the classification accuracy of SVM is used as the evaluation function of the feature subset to get the final optimal gene subset cancer.
Danaee et al. [46] worked on the gene expression data using the power of encoders and decoders of neural networks as they used Stacked Denoising Autoencoder (SDAE) as the FS method. The effectiveness of the extracted representation was then assessed using supervised classification models to confirm the use of the additional features in cancer detection. Finally, by studying the SDAE connection matrices, they discovered a collection of highly interacting genes. They used RNAseq expression data for both tumor and healthy breast samples from The Cancer Genome Atlas (TCGA) database for our research. These data comprise 113 healthy samples and 1097 samples with breast cancer. The findings and analyses show that the highly interacting genes may serve as breast cancer indicators that merit further investigation. After training the SDAE, they chose a layer with a low dimension and validation error compared to other encoder stacks. It has four layers that were respectively 15,000, 10,000, 2000, and 500 dimensions thick. The chosen layer’s features are fed into the algorithms for classifying data. Deep learning models can, therefore, easily handle vast amounts of input data. Hence, they anticipate this model will perform better and highlight more insightful patterns if additional gene expression data becomes available.
According to the related work, most research with genetic data is at its beginning, and all of the work is trials to conduct and apply the concepts in this promising field. The research literature is filled with experiments on different methods, such as FS and deep learning stateoftheart techniques. However, due to the very high dimensions of genetic data, there is no perfect technique. FS of genome data detects the link of a gene to its class, which is a critical preprocessing task to overcome the curse of dimensionality and verification of the gene biomarker of cancer. Because of this, the objective of this study is to use a new wrapper approach RBNRODE algorithm and apply it for the first time on the RNASeq and compare the influence of the algorithm with other FS methods.
Background details
Relief algorithm
Relief algorithm [47, 48] is a highly effective, simple, and rapid filtering method for determining the features associated with each other. The essential idea of this algorithm is to identify features that cause values to be close for identical samples that are near each other and significant for the distinction between the different samples. Therefore, the algorithm relies on the weighted order of features. The higher the feature’s weight, the better the feature to classify, and vice versa.
The Relief algorithm begins by selecting a sample at random, after which it investigates two types of nearest samples: one associated with comparable class samples called NearHit and the other related to different class samples called NearMiss. Each feature’s weight can be assessed from the values of both NearHit and NearMiss. The features are arranged according to their weights. The features with the highest weights will be chosen in the end. The weight W for the feature A can be measured using the following equation:
where \(W_A\) is the feature’s weight A, \(x_A^j\) is the feature’s value A for data \(x^j\), and N represents the number of samples. \(NH(x^j)\) and \(NM(x^j)\) are the closest data points to \(x^j\) that belong to the similar same and different classes, respectively.
NRO algorithm
The idea of nuclear reaction arose after finding neutrons derived from boron and nitrogen. This results from research into the interaction of uranium with neutron [49]. Nuclear fission and nuclear fusion are the two processes that make up the nuclear reaction [50]. As shown in Fig. 1, nuclear fission occurs when a heated neutron shells a weighty nucleus and transforms into lighter nuclei as fission outcomes and other molecules. When heated neutrons shell weighty nuclei, new neutrons are produced to shell other weighty nuclei. The nuclear fission chain reaction is the name for this methodology. As a result, a significant amount of power is released, which is relative to the variation in mass between the atom and the masses of the majority of fission fragments.
Nuclear fusion, on the other hand, occurs when a nucleus is warmed until it is in a condition of plasma, where the strong nuclear force causes nuclear particles to get close enough to join together and overcome the Coulomb repulsion force, as seen in Fig. 2.
The nuclear fission process is first used during the presented approach, in which nuclei fragments absorb hot neutrons and then form odd or eveneven nuclei. Essential fission products, which might be utilized for exploitation, and subaltern fission products, which can be used for exploration, are two types derived from odd nuclei. The eveneven nuclei not present in fission can be sought near the existing positions (current optimal solution). After that, the presented approach utilizes the process of nuclear fusion, whereby the energy generated during nuclear fission is used to heat the nuclei, causing atomic fusion. Some nuclei constrained by the force of Coulomb repulsion slow down the upcoming velocity for exploitation or reject one another for exploration. Other nuclei can be explored by overcoming Coulomb repulsion and bonding together by strong nuclear forces. The heated neutron or energy generated in nuclear interaction gives each nucle kinetic energy.
According to the above illustration, a physicsbased optimization algorithm known as the NRO algorithm [50] has been developed to mimic the two nuclear reaction processes, namely fission and fusion processes. The nuclear fission process involves the nuclear fission operators comprising of two cases: essential and subaltern fission of odd nuclei and nearby seeking a solution of the eveneven nucleus. As for the nuclear fusion process, it has ionization and fusion phases that make up its nuclear fusion operators. Since the NRO algorithm might slip into the local optima trap, a fusion process incorporates a Levy flight methodology to jump out of the local optimal value.
Base processes of NRO algorithm
According to the NRO algorithm, the cycle generated by fission energy and fusion neutrons might be employed to find the most stable nucleus (optimal fitness value). Hence, nuclear fusion can arise from heating lighter nuclei with the energy emitted by nuclear fission. In contrast, nuclear fission can result from shelling the weighty nuclei by thermal neutrons from nuclear fusion. For exploitation and exploration of a search solution area, the NRO algorithm considers nuclear fission and nuclear fusion processes to occur in a closed container where all nuclei interface. The NRO algorithm considers a nucleus characteristic that comprises elements like position, potential energy, nucleus mass number, and charge property, which is a solution in a search solution area. The specific binding energy of each nucleus is assessed as the energy for each mass, which describes the nucleus’ stability. The essential processes of the NRO algorithm are depicted below.

1.
Nuclear fission process: According to the cycle between nuclear fission and nuclear fusion, it is thought that hot neutrons shelling a weighty nucleus for nuclear fission may be created by the nuclear fusion of two separate arbitrary nuclei. In order to mathematically model nuclear fission, Gaussian walk [51] is utilized to mimic the various fission elements with diverse cases. In general, two cases can be used to distinguish the attributes of various products. The first case is associated with forming subaltern fission products for exploitation and essential fission products for exploration. These products are created when nuclear fission is applied to odd nuclei. The odd nuclei from which the subaltern fission products are generated are activated for fission utilizing energy emitted by heated neutrons and can be highly steady through \(\beta\) decay. On the other hand, the information on neutron and the present best solution is used by the existing solution to find a more satisfactory solution depending on the Gaussian walk. As for the odd nuclei from which the essential fission products are produced following the absorption of a hot neutron may not be steady because the fission fragment may not afford \(\beta\) decay. In the first case, \(rand \le P_{Fi}\) is correct, where rand signifies an arbitrary number distributed uniformly within the range [0, 1], and \(P_{Fi}\) is the probability of nucleus fission. For the subaltern fission products of odd nuclei, \({rand} \le P_\beta\) is correct, where \(P_\beta\) is the likelihood of \(\beta\) decay. \({rand} > P_\beta\) is suitable for essential fission products of odd nuclei. The composition process of subaltern and actual fission products of odd nuclei can be expressed as follows:
$$\begin{aligned}{} & {} {X}_{i}^{Fi}= \left\{ \begin{array}{ll} Gaussian(X_{best},\sigma _1)+(randn \cdot X_{best}P_{ne}^s \cdot Ne_i), &{} \,\,\, \text{if}\,{rand} \le P_\beta ,\\ Gaussian(X_{i},\sigma _2)+(randn \cdot X_{best}P_{ne}^e \cdot Ne_i), &{}\,\,\, \text{if}\,{rand} > P_\beta , \end{array} \right\} \text{if}\,\,\,{rand} \le P_{Fi}, \end{aligned}$$(2)$$\begin{aligned}{} & {} \sigma _1=\Big (\frac{log(g)}{g}\Big ) \cdot {X}_{i}  {X}_{best}, \end{aligned}$$(3)$$\begin{aligned}{} & {} \sigma _2=\Big (\frac{log(g)}{g}\Big ) \cdot {X}_{r}  {X}_{best}, \end{aligned}$$(4)$$\begin{aligned}{} & {} P_{ne}^s=round(rand+1), \end{aligned}$$(5)$$\begin{aligned}{} & {} P_{ne}^e=round(rand+2), \end{aligned}$$(6)$$\begin{aligned}{} & {} Ne_i = \frac{(X_i + X_j)}{2}. \end{aligned}$$(7)where \({X}_{i}^{Fi}\) means the \(i{\text{th}}\) fission product nucleus, randn means a normally distributed arbitrary number, and \(X_{best}\) denotes the present most suitable nucleus. The Gaussian distribution’s parameters for subaltern fission products are \(X_{best}\) and \(\sigma _1\), while the parameters of Gaussian distribution for essential fission products are \(X_{i}\) and \(\sigma _2\), \(\sigma _1\) and \(\sigma _1\) signifies the step sizes, g represents the present generation number, \({X}_{r}\) means the \(r{\text{th}}\) nucleus whose index r is picked randomly from the population of nuclei. Additionally, \(P_{ne}^s\) represents a mutation factor, indicating that the subaltern fission product can exploit the slighter searching range, whereas \(P_{ne}^e\) indicates that the essential fission product can exploit the larger searching range, in which round is the closest integer and rand is an arbitrary number distributed uniformly within the range [0, 1]. \(Ne_i\) is the \(i{\text{th}}\) heated neutron, \(X_i\) and \(X_j\) represent the different random \(i{\text{th}}\) nucleus and \(j{\text{th}}\) nucleus, respectively. The second case is related to an eveneven nucleus, which cannot be activated for fission. The status of the nucleus is altered even if there is no fission. The present nucleus’ information might be kept, and it comes from the Gaussian walk. In the second case, \(rand > P_{Fi}\) is correct, where \(P_{Fi}\) is the prospect of nucleus fission. It is expressed as follows:
$$\begin{aligned} {X}_{i}^{Fi}= \left\{ \begin{array}{ll} Gaussian(X_{i},\sigma _2),&\,\,\, \text{if}\,{rand} > P_{Fi}. \end{array} \right. \end{aligned}$$(8) 
2.
Nuclear fusion process: Whenever nuclei are heated to a plasma shape, they can merge to form nuclei heavier than the initial light nuclei, known as hot nuclear fusion. The nuclear fusion process includes two steps: ionization and fusion steps.

The ionization step: It supposes that nuclear fission causes the emission of thermal ionization energy, which yields the motion of a nucleus. Differential operators can be involved in the ionization step. Firstly, each nucleus is rated given its fitness function level, starting with the biggest and ending with the smallest. For exploitation, the nucleus with a higher fitness function value is kept for guiding, whereas the nucleus with a lower fitness function value is utilized for exploration.
In the ionization step, when \(rand > Pa_{i}\), where \(Pa_{i}\) is a probability value of the nucleus’s ionization and illustrates that the higher possibility value means a better nucleus, the ionization step can be described as mathematically, to enhance the exploration’s quality, as follows:
$$\begin{aligned}{} & {} {X}_{i,d}^{Ion}= \left\{ \begin{array}{ll} X_{r1,d}^{Fi} + rand \cdot (X_{r2,d}^{Fi}  X_{i,d}^{Fi}), &{}\,\,\, \text{if}\,{rand} \le 0.5,\\ X_{r1,d}^{Fi}  rand \cdot (X_{r2,d}^{Fi}  X_{i,d}^{Fi}), &{}\,\,\, \text{if}\,{rand}> 0.5, \end{array} \right\} \,\,\,\text{if}\,rand > Pa_{i}, \end{aligned}$$(9)$$\begin{aligned}{} & {} Pa_{i}=\frac{rank(fit({X}_{i}^{Fi}))}{N}. \end{aligned}$$(10)where \({X}_{i,d}^{Ion}\) is the \(d{\text{th}}\) variable of \(i{\text{th}}\) ion after ionization. The \(d{\text{th}}\) variables of \({r1}{\text{th}}\), \({r2}{\text{th}}\) and \(i{\text{th}}\) fission nuclei are represented by \(X_{r1,d}^{Fi}\), \(X_{r2,d}^{Fi}\) and \(X_{i,d}^{Fi}\), respectively, and rand implies an arbitrary number between 0 and 1. \(Pa_{i}\) denotes a probability value of nucleus’s ionization, \(fit({X}_{i}^{Fi})\) is the fitness function value of \({X}_{i}^{Fi}\), \(rank(fit({X}_{i}^{Fi}))\) means the rank of \({X}_{i}^{Fi}\) in the population, and N is the overall number of nuclei. In contrast, when \(rand \le Pa_{i}\), the thermal fission’s energy can’t ionize the more stable nucleus. As a result, \(X_{i,d}^{Fi}\) is adjusted to improve the exploitation’s performance using the following formula:
$$\begin{aligned} {X}_{i,d}^{Ion}= \left\{ \begin{array}{ll} X_{i,d}^{Fi} + round(rand) \cdot rand \cdot (X_{worst,d}^{Fi}  X_{best,d}^{Fi}),&\,\,\, \text{if}\,rand \le Pa_{i}. \end{array} \right. \end{aligned}$$(11)where \(X_{worst,d}^{Fi}\) and \(X_{best,d}^{Fi}\) mean the \(d{\text{th}}\) variable for the worst and better fission product nucleus, respectively. The algorithm is sometimes susceptible to falling into the trap of local optimal conditions, where two solutions are almost identical, and the difference item might be zero. In this case, the search strategy is considered the most challenging part. Therefore, finding an algorithmoptimized approach for supporting the current solution in leaping out of a local optimal solution and investigating the global optimum is critical. This approach is called Levy flight distribution [52]. About Eq. (9), which was formed for improving the exploration in the ionization step, this equation can be applied appropriately when \(X_{r2,d}^{Fi}\) is not equal to the value of \(X_{i,d}^{Fi}\). However, in case the value of \(X_{r2,d}^{Fi}\) is equal to the value of \(X_{i,d}^{Fi}\). The Levy flight distribution approach should be employed to avoid a locally optimal solution as follows:
$$\begin{aligned}{} & {} {X}_{i,d}^{Ion}=X_{i,d}^{Fi} + \Big (\alpha \otimes Levy(\beta )\Big )_{d} \cdot \Big (X_{i,d}^{Fi}  X_{best,d}^{Fi}\Big ), \end{aligned}$$(12)$$\begin{aligned}{} & {} Levy(\beta )=\frac{\mu }{\nu ^{1/\beta }}, \end{aligned}$$(13)$$\begin{aligned}{} & {} \mu =N(0,\sigma _{\mu }^2),\,\,\,\,\,\, \nu =N(0,\sigma _{\nu }^2), \end{aligned}$$(14)$$\begin{aligned}{} & {} \sigma _{\mu }=\Big (\frac{\Gamma (1+\beta )\sin (\Pi \beta /2)}{\Gamma [(1+\beta )/2]\beta 2^{(\beta 1)/2}}\Big )^{1/\beta },\,\,\,\,\,\,\, \sigma _{\nu }=1. \end{aligned}$$(15)where \(\alpha\) is a scale factor whose value is determined by the problem’s scales (\(\alpha = 0.01\)), and \(Levy(\beta )\) denotes the Levy flight step size. \(\mu\) and \(\nu\) are calculated from the normal distribution \(N(0,\sigma _{\mu }^2)\), and \(N(0,\sigma _{\nu }^2)\) respectively, and \(\beta = 1.5\). As for Eq. (11), which was formed for improving the exploitation in the ionization step, this equation can be applied appropriately when \(X_{worst,d}^{Fi}\) is not equal to the value of \(X_{best,d}^{Fi}\). However, in case the value of \(X_{worst,d}^{Fi}\) is equal to the value of \(X_{best,d}^{Fi}\), then the Levy flight distribution approach should be utilized as follows:
$$\begin{aligned} {X}_{i,d}^{Ion}=X_{i,d}^{Fi} + \Big (\alpha \otimes Levy(\beta )\Big )_{d} \cdot \Big (UB_{d}LB_{d}\Big ). \end{aligned}$$(16) 
The fusion step: It attempts to combine an ion with information from different ions and modify the status of the ions. Initially, all ions acquired from the ionization are ranked given their fitness function levels, starting with the largest and ending with the lowest. In the fusion step, if \(rand > Pc_{i},\)where \(Pc_{i}\) is a probability value of the \(i{\text{th}}\) ion, the ions of two light nuclei defeat the Coulomb repelling force and are fused through a robust nuclear force. Additional differential operators are used in the fusion stage to simulate the collision and fusion and boost the variety of the nuclei population to allow for more effective exploration. This situation can be depicted mathematically through the following equation:
$$\begin{aligned}{} & {} {X}_{i}^{Fu}= \left\{ \begin{array}{ll} X_{i}^{Ion} + rand \cdot (X_{r1}^{Ion}  X_{best}^{Ion}) + rand \cdot (X_{r2}^{Ion}  X_{best}^{Ion}) \\ \,\,\,\,\,\,\,\,\,  e^{norm(X_{r1}^{Ion}  X_{r2}^{Ion})} \cdot (X_{r1}^{Ion}  X_{r2}^{Ion}), &{}\,\,\, \text{if}\,rand>Pc_{i}, \end{array} \right. \end{aligned}$$(17)$$\begin{aligned}{} & {} Pc_{i}=\frac{rank(fit({X}_{i}^{Ion}))}{N}. \end{aligned}$$(18)where \({X}_{i}^{Fu}\) is the \(i{\text{th}}\) product of fusion, \(X_{i}^{Ion}\) represents the current ion, \(X_{r1}^{Ion}\) and \(X_{r2}^{Ion}\) denote the \(r1{\text{th}}\) and \(r2{\text{th}}\) ions, respectively, in which r1 and r2 are unlike.The difference expression \((X_{r1}^{Ion}  X_{best}^{Ion})\) is used to describe a portion of fusion process, the expression \((X_{r2}^{Ion}  X_{best}^{Ion})\) utilizes the difference to clarify another part’s information of fusion, and the final expression \((X_{r1}^{Ion}  X_{r2}^{Ion})\) means that ions defeat the Coulomb repelling force. The exponential coefficient seeks to accomplish an equilibrium between exploration and exploitation. \(Pc_{i}\) stands for a probability value of nucleus’s fusion, \(fit({X}_{i}^{Ion})\) is the fitness function value of \({X}_{i}^{Ion}\), and \(rank(fit({X}_{i}^{Ion}))\) stands for the rank of \({X}_{i}^{Ion}\) in the population. On the other hand, when \(rand \le Pc_{i}\), ions cannot defeat the Coulomb force and fail to be fused by a nuclear force. The Coulomb force may lessen the approach speed or repel the opposing motion if fusion does not occur. The mathematical formula is recommended as follows:
$$\begin{aligned} {{X}_{i}^{Fu}= \left\{ \begin{array}{ll} X_{i}^{Ion}  0.5 \cdot \Big (\sin (2 \Pi \cdot freq \cdot g + \pi ) \cdot \frac{G_{max}  g}{G_{max}} + 1 \Big ) \cdot (X_{r1}^{Ion}  X_{r2}^{Ion}), &{}\\ \text{if}\,{rand} > 0.5,\\ X_{i}^{Ion}  0.5 \cdot \Big (\sin (2 \Pi \cdot freq \cdot g + \pi ) \cdot \frac{g}{G_{max}} + 1 \Big ) \cdot (X_{r1}^{Ion}  X_{r2}^{Ion}), &{}\\ \text{if}\,{rand} \le 0.5, \end{array} \right\} \text{if}\,rand \le Pc_{i}.} \end{aligned}$$(19)where freq denotes the sine function’s frequency, g represents the present generation number, \(G_{max}\) is the permissible maximum generation number, \(X_{r1}^{Ion}\) and \(X_{r2}^{Ion}\) represent the \(r1{\text{th}}\) and \(r2{\text{th}}\) ions, respectively, with distinct indexes. In the first row of Eq. (19), the state where the Coulomb force might lower the approach speed used the nonadaptive sine adjustment to exploit the solution space and converge to the optimal solution. The case in which the two ions repulse and are far from each other to explore is in the second row of Eq. (19). The Levy flight distribution approach is applied to enhance the algorithm’s capability to avoid getting stuck into a local optimum in the fusion step. In case of the value of \(X_{r1}^{Ion} = X_{r2}^{Ion}\) in the fusion step, then the Levy flight distribution approach should be utilized for avoiding a locally optimal solution as follows:
$$\begin{aligned} {X}_{i}^{Fu}=X_{i}^{Ion} + \alpha \otimes Levy(\beta ) \otimes (X_{i}^{Ion}  X_{best}^{Ion}). \end{aligned}$$(20)The fission nucleus with the best fitness function value in the present generation should be saved as guiding information for the following process. While the fusion nucleus with the best fitness function value should be the globally acquired best solution. The individuals outside the search boundary are reformed using the boundary control approach.

Suggested classifiers
kNN classifier
The kNN [53, 54] is a pattern classification algorithm, which is used to predict whether new sample instances will belong to one or another class based on which class the cases closest to it belong to for making a decision [55]. The kNN is a wrapper for generating classification rules from training samples. Then, by computing the distances amongst the new unclassified instance and its closest ktraining neighbours, it tries to locate the cases in the training set most comparable to the new instances in the test set. Finally, depending on the training process, a novel instance is classified according to the most significant category likelihood.
However, while training kNN, the option of k is fundamental and the sole factor to consider when categorizing a novel test set; therefore, it is picked after a series of trial and error runs. The kNN classifier (k = five [56, 57]) with the Euclidean distance metric was utilized to assess the feature subsets in the literature experiments.
SVM classifier
The greatest margin hyperplanes in the space can be found using the SVM [58] to accurately classify training instances into different classes. SVM can analyze highdimensional data with a fast training period and minimal computational resources, even with a few training examples.
SVM employs a margin maximization strategy to avoid assessing the distributions linked to the statistics of distinct classes in the hyperdimensional space. It creates hyperplanes to produce resolution boundaries for linear or nonlinear classification. Since the classes cannot be divided along a straight line in the nonlinear classification, SVM makes the data linearly separable by using the socalled kernel function [59] as a scalar product. SVM is used in a variety of industries, including bioinformatics [60], face detection [61], image classification [62], and text categorization [63].
Proposed relief binary NRO based on DE (RBNRODE) for gene selection
As one of the most valuable uses of RNASeq gene expression data is disease classification, ML algorithms may be misled by the high dimensionality of data. Therefore, an enhanced version of NRO called RBNRODE, which indicates a Relief Binary NRO based on DE, is proposed to ignore irrelevant genes and identify the minor relevant genes’ subsets from the classification process.
The main characteristic of RBNRODE is that it achieves the best accuracy with the most minor subset of features. Two main phases constitute the proposed RBNRODE. Firstly, a preprocessing phase uses the Relief algorithm to identify the relevant features by computing a weight for every feature to describe its relationship and then ignoring the irrelevant features with the lowest weights. The second phase includes applying the binary NRO algorithm combined with the DE technique to determine the most relevant and nonredundant features. When solving largescale problems, the NRO algorithm is susceptible to the local optimal trap. To prevent this, the DE technique is included in the NRO algorithm.
The stages required for the proposed RBNRODE to be able to handle the GS strategy include filtration, initialization, position improvement depending on the NRO algorithm, binary conversion, fitness estimation, and hybridization with DE. The following subsections describe these stages.
Filtration of features
As illustrated in subsection “Relief algorithm”, the Relief algorithm is used to preprocess the population by filtering the features and choosing the relevant features. The weight of each feature is first evaluated by Eq. (1), and then the weights are ordered from the largest to the smallest weights to determine relevance for the classification process. By concentrating only on the relevant features and minimizing the initial search space, the Relief algorithm supports the NRO algorithm to obtain better features faster.
Initialization of nuclei population
The suggested BRNO initiates by randomly producing a population of N nuclei. Each nucleus represents a potential solution within its restricted lower and upper limits, depicted by a D dimensions vector equal to the original dataset’s feature count. The randomly generated position of each nucleus is employed in this randomly initialized step, which is confined within the \([1, 1]\) range at each variable of the position vector.
Improvement and adjustment of position
Positions are improved using equations linked to the NRO algorithm presented in Subsection “NRO algorithm”. These equations are repeated until a certain stopping condition is fulfilled. This paper’s acceptable stopping condition for suitably assessing the proposed algorithm’s quality is the maximum number of generations \({G}_{max}\).
Some nuclei may be outside the search space’s boundaries when optimizing the position utilizing the NRO algorithm. This paper offers a procedure for enhancing these worthless nuclei by adjusting them to an arbitrary position inside the permitted boundaries. By randomly varying the optimal position, this procedure will improve the exploitation of the NRO algorithm. This procedure can be expressed as follows:
where \({X}_{i,d}^{adjust}\) refers to the proper product nucleus, \({X}_{i,d}\) is the value that surpasses the variable’s boundaries, \({X}_{d}^{LB}\) denotes the lower boundary of product nuclei, \({X}_{d}^{UB}\) denotes the upper boundary of product nuclei. An arbitrary value between \({X}_{d}^{LB}\) and \({X}_{d}^{UB}\) is returned through \(rand({X}_{d}^{LB},{X}_{d}^{UB})\) with regular distribution.
Continuous to binary conversion
The nuclei positions are represented as continuous (real) values in the NRO. Therefore, they can’t be utilized directly for the GS binary problem. To fit in with the binary character of GS, a binary conversion strategy for transforming the continuous (real) values of the nucleus’ positions into binary values is required. At the same time, the original algorithm’s structure is preserved.
In the binary vector, the continuous (real) values of the relevant selected features are expressed by ones, whereas zeros express the continuous values of the irrelevant unselected features. The mathematical formulation to transform the continuous nucleus position \({X}_{i}^{g}\) to a binary position \(({X}_{i}^{g})_{bin}\), at each generation g, is as follows:
where \(\delta\) represents a random threshold value within the range [0, 1]. This essential binary conversion strategy implies that if \(({X}_{i}^{g})_{bin}\) is bigger than \(\delta\). It changes from its continuous value to the binary “one” (selected features for the classification process). In contrast, the continuous value of \(({X}_{i}^{g})_{bin}\) has been adjusted to the binary “zero” if it is less than delta. (unselected feature for the classification process).
Estimation of fitness function
Two clashing goals should be considered to estimate the goodness of a solution and reach the optimal solution: maximizing the accuracy of classification from the classifiers (kNN and SVM classifiers) while searching for the shortest size of elected features, and this enhances the algorithm’s predictive capacity. The fitness function will be used to balance the size of selected features and the accuracy of (kNN and SVM) classifiers since accuracy may be impaired if the size of selected features is reduced more than desired. To minimize the two goals, the fitness function will focus on reducing the error rate of classification instead of the accuracy, as follows:
where \({Err}_{rate}\) reflects the rate of classification error from the (kNN and SVM) classifiers, \({feat}_{elected}\) signifies the selected features’ length, and D indicates the dataset’s overall feature count. The weight parameters \(w_{1}\) and \(w_{2}\) refer to the significance of classification accuracy and the length of the elected features, respectively. Based on the comprehensive trials executed in prior research [64, 65], \(w_1\) is assigned to 0.99, and \(w_2\) equals 0.01. Minimizing the error rate of classification \({Err}_{rate}\) (maximizing classification accuracy) is given more preference than shortening the length of the elected features \({feat}_{selected}\), which suggests that \(w_1\) should be given more weight than \(w_2\).
Embedding of the DE approach
One of the most influential and straightforward stochastic, populationbased trialanderror approaches for acquiring the preferable solution to complicated optimization problems is DE [25]. The DE approach requires few control parameters, is simple to learn and use, and can handle a variety of optimization problems while producing valuable results quickly and at a reduced computational cost. DE depends on three primary stages: mutation, crossover, and selection, as follows:

Mutation stage: It is also known as a differential mutation. With each iteration, this stage aims to create a mutated vector \(\upsilon _i\) for each solution vector. To create the mutated vector \(\upsilon _i\), three distinct nominee vectors \({X}_{r_{1}},{X}_{r_{2}}, {X}_{r_{3}}\) are randomly selected from the range [1, population size]. The difference between two of the nominee vectors \({X}_{r_{2}}, {X}_{r_{3}}\) is then estimated. The third nominee vector \({X}_{r_{1}}\) is then added to after this difference is multiplied by a mutation weighting factor (\(W_M\)) within the range [0, 1] [66]. The following is a mathematical representation of \(\upsilon _i\):
$$\begin{aligned} \overrightarrow{\upsilon }_{i}=\overrightarrow{X}_{r_{1}} +W_M(\overrightarrow{X}_{r_{2}}\overrightarrow{X}_{r_{3}}) \end{aligned}$$(24) 
Crossover stage: DE uses the crossover stage to enhance population diversity after the differential mutation stage. Combining values from the target vector \(X_i\) and the mutated vector \(\upsilon _i\) yields creating an offspring vector \(u_i\). The binary crossover is characterized as the most popular and straightforward crossover search operator in DE, which is mathematically expressed as:
$$\begin{aligned} {u}_{i,d}= \left\{ \begin{array}{ll} \upsilon _{i,d}, &{} \text{if}\,{rand} \le C_R\,\,\,\,\, {or}\,\,\,\, \,d=j_{rand},\\ X_{i,d}, &{} \text{otherwise}. \end{array} \right. \end{aligned}$$(25)where \(j_{rand}\in [1,\,2,\,\ldots , D_{X}]\) is a uniformly distributed arbitrary number to guarantee that the mutated vector includes at least one dimension. Crossover rate \(C_R\) is employed to determine the likelihood of each element being crossed; it is often set to a high value (\(C_R\) = 0.9). It is evident from Eq. (25) that \(C_R\) and rand are compared. \(u_i\) is derived from \(\upsilon _i\) if the value of rand is less than or equal to the value of \(C_R\). If not, \(X_i\) is used to infer \(u_i\).

Selection stage: Eventually, the selection stage is performed, as illustrated in Eq. (26), in which the target vector’s fitness function \(fit(X_i)\) and the corresponding offspring vector’s fitness function \(fit(u_i)\) are compared, and the fitness function with the lowest value is retained, and the best possible solution is ready for the next generation.
$$\begin{aligned} {X}_{i}= \left\{ \begin{array}{ll} u_{i}, &{} \text{if}\,fit(u_{i}) < fit(X_{i})\\ X_{i}, &{} \text{otherwise}. \end{array} \right. \end{aligned}$$(26)\(u_{i}\) is set to \(X_{i}\) if \(fit(u_{i})\) yields a value that is smaller than \(fit(X_{i})\). If not, the previous target vector \(X_{i}\) remains in place.
After illustrating the main stages of DE, the pseudocode for these stages is presented in Algorithm 1.
The exhaustive RBNRODE algorithm
Finally, after describing the steps of the suggested RBNRODE algorithm in the preceding Subsections to handle the GS strategy, Algorithm 2 provides the pseudocode for the proposed RBNRODE algorithm. In addition, Fig. 3 includes a flowchart of the proposed RBNRODE algorithm to show its essential steps.
Experimental results and discussion
The experimental results for the proposed RBNRODE algorithm and its peers are presented in this section. The models are evaluated using training and testing datasets. The final findings are derived using the evaluation metrics’ average value. The datasets used to verify the efficacy of the proposed model are described in Subsection “Dataset description”, the parameters that are utilized in working environments are presented in Subsection “Parameters setting”, the evaluation criteria are shown in Subsection “Evaluation criteria”, and experimental results analysis is explained in Comparison results of the proposed RBNRODE against popular ML classifiers.
Dataset description
Extensive experiment techniques and other wrapper algorithms are conducted on 22 gene expression datasets. The data used is the normalizedlevel3 RNASeq gene expression data of 22 tumor types in Broad Institute. It is publicly found and obtainable in [67]. We followed the whole process applied in paper [3] and noticed the difference between the data used in the mentioned paper from GitHub and the numbers written in the mentioned paper, which is copied from the site. The data from the site was a mixture of tumor and normal samples, while it was used as a tumor as a whole in the mentioned paper. Therefore, we investigated the data closely. First of all, the site contains different forms of the same data that we chose to work on; we explored the data and found these challenges:

Some Genes are named with ID but without symbol.

Some Genes are not found in the annotation file.

Samples are mixed up between normal and tumor and other staff.
As a result, we needed some preprocessing to separate and identify samples to get normal samples versus tumor samples that could be used in binary classification and to facilitate the process of FS. We faced the mentioned challenges as follows:

We searched the annotation file for the found ID and got the gene symbol.

We have compared with the annotation file, so more than 100 genes are removed.

depending on the samples report, we separated each row depending on the sample type in an Excel sheet for binary classification.
Furthermore, the Relief algorithm, described in subsection “Relief algorithm”, is employed for preprocessing by computing the weight for each feature in the dataset, and the weights are then sorted from biggest to smallest. Finally, the features with small weights are eliminated. After applying the Relief algorithm on the 22 gene expression datasets, we found that just 500 features had the largest weights. For that, the remaining irrelevant features with small weights were ignored, and these 500 relevant features were only chosen for use in the FS process. The Relief algorithm can eliminate features that are irrelevant to classification.
After preprocessing, the resulting file became clean enough for use in the FS process. Still, unlike paper [3], which provided multiclassification of all cancer types, we worked on each type separately to be more specific. Table 1 shows a detailed list of all 22 tumour types and the corresponding number of samples.
Parameters setting
The proposed RBNRODE algorithm has been compared with binary conversions of distinct metaheuristic algorithms, which include Binary SSA (BSSA) [68], Binary ABC (BABC) [15], Binary BA (BBA) [69], Binary PSO (BPSO) [70], Binary WOA (BWOA) [57], Binary GWO (BGWO) [17], Binary GOA (BGOA) [20], Binary SFO (BSFO) [71], Binary BSA (BBSA) [22], Binary ASO (BASO) [31], Binary HHO (BHHO) [23], and Binary HGSO (BHGSO) [32]. The main parameters of the ML classifiers suggested in this paper are depicted in Table 2.
To ensure a fair comparison between different metaheuristic algorithms, each method was subjected to thirty separate experiments on each dataset due to their stochastic nature. The resulting performance measures, which include accuracy, fitness, selected features, and standard deviation, were based on the average results of these experiments. To maintain consistency across all methods, each experiment’s population size and maximum number of iterations were set to 10 and 100, respectively. Furthermore, the number of attributes in the datasets used in this study indicated the problem size. To enable individuals to search within a continuous search space, the domain was set to [− 1, 1], allowing them to explore a relatively wide but constrained search range.
In the presented framework, the optimality degrees of the outcomes are confirmed using a 10fold crossvalidation method to assure the reliability of the values received. This involves a datasplitting strategy that employs random sampling without replacement to distribute training and testing groups. Each benchmark is divided into two separate subsets through this method. Specifically, 80% of the benchmark data is randomly selected without replacement for training, ensuring that each data point is unique and not duplicated in the training set. The remaining 20% of the data is also uniquely chosen for testing. This approach ensures that the training subset is utilized to learn the ML classifier through optimization while the testing subset is employed to assess the performance of the chosen features. By using random sampling without replacement, we ensure that there is no overlap between training and testing data, thus maintaining the integrity of the evaluation process. Each method’s remaining parameters are set considering the original variants and the data presented in their first publications. Standard configurations for all techniques and parameter settings for each method are shown in Table 3. Python is utilized in the computing environment to execute the runs with an Intel processor core i7, 16 GB of RAM, and an NVIDIA GTX 1050i GPU.
Evaluation criteria
To assess the performance of the proposed RBNRODE algorithm compared to others, each approach is independently verified 30 times in each dataset to validate the results statistically. To this end, the following standard performance measures for the FS problem are utilized.

Average accuracy \((AVG_{Acc})\): this metric is the rate of correct data classification and is obtained by executing the algorithm independently 30 times, and is computed as follows:
$$\begin{aligned} AVG_{Acc} = \sum _{k=1}^{30} \sum _{r=1}^{m}match(PL_r,AL_r) \end{aligned}$$(27)where m represents the size of the samples in the testing dataset, \(PL_r\) and \(AL_r\) are the classifier output labels of the predicted and actual class labels for sample r, respectively, and \(match(PL_r, AL_r)\) represents a comparison discriminant function. If \(PL_r == AL_r\), then \(match(PL_r,AL_r) = 1\); otherwise, \(match(PL_r,AL_r) = 0\).

Average fitness value \((AVG_{Fit})\): this metric measures the average fitness value obtained by executing the proposed algorithm independently 30 times, which defines the synergy between minimizing the error rate of classification and reducing the number of selected features. The lower value represents the better solution, which is evaluated in terms of fitness as follows:
$$\begin{aligned} AVG_{Fit} = \frac{1}{30} \sum _{k=1}^{30}f_*^k \end{aligned}$$(28)where \(f_{*}^{k}\) represents the optimal fitness value obtained in the \(k{\text{th}}\) run.

Average size of selected features \((AVG_{Feat})\): this metric represents the average size (or FS ratio) of the number of features selected by executing the algorithm independently 30 times and is determined as:
$$\begin{aligned} AVG_{Feat} = \frac{1}{30} \sum _{k=1}^{30}\frac{d_*^k}{D} \end{aligned}$$(29)where \(d_*^k\) is the absolute number of selected features in the best solution for the \(k_th\) run, and D is the absolute total number of features in the original dataset.

Standard deviation (STD): corresponding to the above results, the final average results obtained from the 30 independent runs of each algorithm on every dataset are evaluated in terms of stability as follows:
$$\begin{aligned} STD = \sqrt{{\frac{1}{29}} \sum _{k=1}^{30} (Y_*^kAVG_Y)^2} \end{aligned}$$(30)where Y denotes the metric to be measured, \(Y*k\) represents the value of the metric Y in the \(k_th\) run, and \(AVG_Y\) is the average of the metric over 30 independent runs.
The results presented in the following tables are the average values over 30 independent runs in terms of the fitness value \((AVG_{Fit})\), classification accuracy \((AVG_{Acc})\), and the number of selected features \((AVG_{Feat})\). The experimental results are closely analyzed and discussed in the subsequent subsections, where bold numbers indicate the bestrequired results.
Comparison results of the proposed RBNRODE against popular ML classifiers
This section introduced a comparison of the proposed RBNRODE with kNN and SVM classifiers and the most popular ML classifiers [kNN, SVM, Decision Tree (DT), RF, and eXtreme Gradient Boosting (XGBoost)] in terms of classification accuracy, fitness values, precision, recall, and F1score.
Table 4 shows the results of the proposed RBNRODE with kNN and SVM classifiers compared with other popular ML classifiers regarding the classification accuracy values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 7 out of 22 datasets and identical results in 14 datasets as other techniques. The proposed RBNRODE with kNN is ranked second by yielding identical results in 15 datasets as other techniques. It should also be noted that SVM ranked third by yielding identical results in 11 datasets as others, while kNN ranked fourth by yielding identical results in 10 datasets as other techniques. Finally, DT is ranked last by yielding identical results in 8 datasets as other techniques.
Table 5 shows the results of the proposed RBNRODE with kNN and SVM classifiers compared with other popular ML classifiers regarding fitness values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 9 out of 22 datasets and identical results in 11 datasets as the proposed RBNRODE with kNN. The proposed RBNRODE with kNN is ranked second by achieving the best results in 2 out of 22 datasets and identical results in 11 datasets as the proposed RBNRODE with SVM. Finally, it should also be noted that other methods did not achieve optimal results on any of the datasets used regarding fitness values.
Table 6 shows the results of the proposed RBNRODE with kNN and SVM classifiers compared with other popular ML classifiers regarding the precision values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 4 out of 22 datasets and identical results in 17 datasets as other techniques. The proposed RBNRODE with kNN is ranked second by achieving the best results in 1 out of 22 datasets and identical results in 17 datasets as other techniques. It should also be noted that SVM and kNN ranked third by yielding identical results in 14 datasets as others, while RF and XGBoost ranked fourth by yielding identical results in 12 datasets as other techniques. Finally, DT is ranked last by yielding identical results in 10 datasets as other techniques.
Table 7 shows the results of the proposed RBNRODE with kNN and SVM classifiers compared with other popular ML classifiers regarding the recall values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 2 out of 22 datasets and identical results in 20 datasets as other techniques. The proposed RBNRODE with kNN is ranked second by achieving identical results in 19 datasets as other techniques. It should also be noted that RF ranked third by yielding identical results in 18 datasets as others, while kNN ranked fourth by yielding identical results in 17 datasets as other techniques. Finally, DT is ranked last by yielding identical results in 13 datasets as other techniques.
Table 8 shows the results of the proposed RBNRODE with kNN and SVM classifiers compared with other popular ML classifiers regarding the F1score values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 7 out of 22 datasets and identical results in 14 datasets as other techniques. The proposed RBNRODE with kNN is ranked second by achieving the best results in 1 out of 22 datasets and identical results in 14 datasets as other techniques. It should also be noted that SVM ranked third by yielding identical results in 11 datasets as others, while kNN and RF ranked fourth by yielding identical results in 10 datasets as other techniques. Finally, DT is ranked last by yielding identical results in 8 datasets as other techniques.
Comparison results of different versions of the proposed RBNRODE
This section introduced a comparison between different versions of the proposed RBNRODE (RBNRODE with kNN, RBNRODE with SVM, and RBNRODE with XGBoost) in terms of classification accuracy, fitness values, and number of selected features.
Table 9 shows the results of different versions of the proposed RBNRODE (RBNRODE with kNN, RBNRODE with SVM, and RBNRODE with XGBoost) regarding classification accuracy values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 4 out of 22 datasets and identical results in 14 datasets as other versions. The proposed RBNRODE with XGBoost is ranked second by achieving the best results in 2 out of 22 datasets and identical results in 13 datasets as other versions. Finally, RBNRODE with kNN is ranked third by achieving the best results in 1 out of 22 datasets and identical results in 14 datasets as other versions.
Table 10 shows the results of different versions of the proposed RBNRODE (RBNRODE with kNN, RBNRODE with SVM, and RBNRODE with XGBoost) regarding the fitness values. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 8 out of 22 datasets and identical results in 11 datasets as other versions. The proposed RBNRODE with kNN is ranked second by achieving the best results in 1 out of 22 datasets and identical results in 11 datasets as other versions. Finally, RBNRODE with XGBoost is ranked third by achieving the best results in 2 out of 22 datasets.
Table 11 shows the results of different versions of the proposed RBNRODE (RBNRODE with kNN, RBNRODE with SVM, and RBNRODE with XGBoost) regarding the number of selected features. The empirical results show that the proposed RBNRODE with SVM is ranked first by achieving the best results in 8 out of 22 datasets and identical results in 10 datasets as other versions. The proposed RBNRODE with kNN is ranked second by achieving the best results in 4 out of 22 datasets and identical results in 10 datasets as other versions. Finally, RBNRODE with XGBoost did not achieve the best results in any one of the utilized datasets. Therefore, the experimental results in this research will be conducted using kNN and SVM classifiers due to their superiority and efficiency, as described in the following subsections.
Comparison results of the proposed RBNRODE against other stateoftheart metaheuristic algorithms
To demonstrate the dominance of RBNRODE over other counterparts in literature, the best performing RBNRODE algorithm with the two suggested classifiers, kNN, and SVM, is compared with other stateoftheart metaheuristic algorithms executed in identical situations. The comparison with RBNRODE incorporates binary versions of some optimization algorithms, such as BSSA, BABC, BBA, BPSO, BWOA, BGWO, BGOA, BSFO, BBSA, BASO, BHHO, and BHGSO. Note that the 22 original gene expression datasets are first subjected to the Relief algorithm, and the 500 relevant features with the biggest weights are only chosen for use in the FS process. Subsequently, the suggested RBNRODE and the other stateoftheart metaheuristic algorithms are implemented only on these 500 pertinent features.
Comparisons based on the suggested kNN classifier
Table 12 reveals the results of the proposed RBNRODE compared with other optimizers based on the kNN classifier regarding the classification accuracy values evaluated under the same implementation conditions. The empirical results show that the proposed RBNRODE and BSFO scored the best in only one dataset. It should also be noted that all competitive algorithms yielded identical results in 20 datasets as RBNRODE with kNN.
Table 13 reveals the average fitness and STD values of the proposed RBNRODE algorithm with its other peers based on kNN under identical implementation requirements. The proposed RBNRODE with kNN classifier demonstrates higher quality than different algorithms. By investigating Table 13, the results reveal that kNNbased RBNRODE produced the least fitness values and competitive STD over all datasets. Furthermore, all the used datasets are largescale, which verifies that the proposed RBNRODE can consistently execute on all datasets regardless of the size of the dataset. Currently, the proposed RBNRODE can be positively inferred to be promising, with a demonstrated ability to balance exploitation and exploration in the search space on iterations and escape from local optima. While standard algorithms may evolve, trapping it.
Table 14 shows the number of extracted features using the proposed RBNRODE and its other counterparts for training the kNN classifier. The proposed RBNRODE surpassed the other algorithms in all datasets regarding the number of extracted features. Furthermore, the RBNRODE’s capability to identify the most informative features is attributable to the ability to search within feasible regions while considering improved classification accuracy.
Table 15 displays the average precision values of the proposed RBNRODE algorithm with kNN and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean precision values for 3 datasets. Alternatively, BABC, BPSO, BGWO, BGOA, BSFO, and BHHO achieved identical results as the proposed RBNRODE in 19 datasets, while BWOA performed similarly in 18 datasets. BASO and BHGSO ranked fourth by achieving identical results as the proposed RBNRODE in 17 datasets. Finally, BBA yielded identical results as the proposed RBNRODE in 15 datasets, ranking it last among all methods.
Table 16 displays the average recall values of the proposed RBNRODE algorithm with kNN and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean recall values for 3 datasets. Alternatively, BSSA, BABC, BPSO, BGWO, BWOA, BGOA, BSFO, and BBSA achieved identical results as the proposed RBNRODE in 19 datasets, while BHHO performed similarly in 18 datasets. BHGSO ranked fourth by achieving identical results as the proposed RBNRODE in 17 datasets. Finally, BASO and BBA yielded identical results as the proposed RBNRODE in 16 datasets, ranking it last among all methods.
Table 17 displays the average F1score values of the proposed RBNRODE algorithm with kNN and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean F1score values for 4 datasets. Alternatively, BSSA, BABC, BPSO, BGWO, BGOA, BSFO, and BBSA achieved identical results as the proposed RBNRODE in 18 datasets, while BWOA and BHHO performed similarly in 17 datasets. BASO and BHGSO ranked fourth by achieving identical results as the proposed RBNRODE in 16 datasets. Finally, BBA yielded identical results as the proposed RBNRODE in 14 datasets, ranking it last among all methods.
Comparisons based on the suggested SVM classifier
Table 18 shows the results of the proposed RBNRODE compared with other optimizers based on the SVM classifier regarding the classification accuracy values that are fairly evaluated under the same implementation conditions. The empirical results show that the proposed RBNRODE is ranked first by achieving the best results in 2 out of 22 datasets. BSFO is ranked second with the best results in only one dataset. It should also be noted that all competitive algorithms yielded identical results in 19 datasets as the proposed RBNRODE with SVM.
Table 19 reveals the average fitness and STD values of RBNRODE with its other peers, based on SVM, under identical implementation requirements. Notably, the proposed RBNRODE with SVM classifier demonstrates higher quality than other algorithms. By investigating Table 19, the results reveal that SVMbased RBNRODE produced the least values of fitness along with competitive STD in 21 out of 22 datasets, accounting for 95% of all datasets. Furthermore, all the used datasets are largescale, which verifies that the proposed RBNRODE is capable of consistently executing on all datasets regardless of the size of the dataset. For the only dataset that BSFO won, the mean fitness value is very close to RBNRODE. None of the other algorithms compared to RBNRODE ranked first in the 22 datasets. Now, RBNRODE can be positively inferred to be promising, with a demonstrated ability to balance between exploitation and exploration in the search space on iterations and escape from local optima. While common algorithms may evolve, trapping it.
Based on the number of extracted features, the outcomes of the proposed RBNRODE and other counterparts for training the SVM classifier are revealed in Table 20. By investigating the scores, an attractive observation is made for the proposed RBNRODE based on SVM, which did better than other algorithms over 22 of the 22 datasets used in this paper. Furthermore, The excellence of the proposed RBNRODE with SVM in this context confirms its ability to identify the most significant regions of the search space and escape the search through regions of nonfeasible spaces.
Table 21 displays the average precision values of the proposed RBNRODE algorithm with SVM and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean precision values for 3 datasets. Alternatively, BSFO achieved identical results as the proposed RBNRODE in 19 datasets, while BABC, BGWO, BWOA, BBSA, and BASO performed similarly in 18 datasets. BSSA, BPSO, BGOA, and BHHO ranked fourth by achieving identical results as the proposed RBNRODE in 17 datasets. Finally, BBA yielded identical results as the proposed RBNRODE in 12 datasets, ranking it last among all methods.
Table 22 displays the average recall values of the proposed RBNRODE algorithm with SVM and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean recall values for 3 datasets. Alternatively, BSSA, BABC, BGWO, BWOA, BGOA, BSFO, and BBSA achieved identical results as the proposed RBNRODE in 19 datasets, while BPSO and BHHO performed similarly in 18 datasets. BHGSO ranked fourth by achieving identical results as the proposed RBNRODE in 17 datasets. Finally, BASO and BBA yielded identical results as the proposed RBNRODE in 16 datasets, ranking it last among all methods.
Table 23 displays the average F1score values of the proposed RBNRODE algorithm with SVM and its counterparts. Out of 22 datasets, the proposed RBNRODE performed better than other methods in terms of mean F1score values for 5 datasets. Alternatively, BWOA and BSFO achieved identical results as the proposed RBNRODE in 17 datasets, while BSSA, BABC, BPSO, BGWO, BGOA, and BBSA performed similarly in 16 datasets. BHHO ranked fourth by achieving identical results as the proposed RBNRODE in 15 datasets. Finally, BBA yielded identical results as the proposed RBNRODE in 12 datasets, ranking it last among all methods.
Convergence analysis
Figures 4, 5, 6, 7, 8, 9, 10, 11 show convergence performance of the proposed RBNRODE with kNN and SVM classifiers in comparison with its counterparts, which are all implemented under identical conditions of iterations number and population size. From Figs. 4, 5, 6, 7, 8, 9, 10, 11, it is obvious that the proposed RBNRODE with kNN and SVM classifiers achieved optimal convergence performance on all datasets. Hence, the convergence behavior of the RBNRODE with kNN and SVM classifiers proves its ability to achieve the optimum results in time while striking an effective balance between exploration and exploitation.
Wilcoxon’s ranksum test
The effectiveness of the proposed RBNRODE is recognized by executing the Wilcoxon test as a pairwise test to evaluate whether there is a statistically significant deviation between the fitness values achieved via the proposed approach and its peers [72]. According to the results shown in Tables 24 and 25, it is evident that the proposed RBNRODE with KNN and SVM classifiers exceeds all other algorithms in all datasets. Therefore, all Pvalues that are listed in Tables 24 and 25 are less than 0.05 (5% significance level) that demonstrate robust evidence against the null hypothesis and can show that the achieved results by the proposed method are statistically better and not happened by chance.
Computational complexity of the RBNRODE and other stateoftheart metaheuristic algorithms
Time computational complexity of the RBNRODE algorithm
To define the computational complexity of the proposed RBNRODE algorithm, we can analyze each of its five fundamental stages individually. These stages include feature filtration, population initialization, position improvement and adjustment, fitness function estimation, and DE technique. The comprehensive computational complexity of the proposed RBNRODE algorithm can then be summarized in bigO notation as \(O_{time}(RBNRODE)\), and can be calculated in bigO notation through the following equations:
Let N is the size of the population, \({G}_{max}\) means the maximum generations’ number, and D denotes the dimension size of problem. The following can be acquired as follows:
\(O_{time}(\text {Features filtration}) = O_{time}(D)\).
\(O_{time}(\text {Population initialization}) = O_{time}(N)\).
\(O_{time}(\text {Position improvement and adjustment}) = O_{time}({G}_{max} \times N \times D).\)
\(O_{time}(\text {Fitness function estimation}) = O_{time}({G}_{max} \times N).\)
\(O_{time}(\text {DE technique}) = O_{time}(N \times D).\) Therefore,
Space computational complexity of the RBNRODE algorithm
The amount of memory or storage space needed for an algorithm to solve a problem as the size of the input increases is referred to as space computational complexity. It is often stated as the amount of additional memory that the algorithm requires in addition to the input. It consists of combining the following two main components:

1.
Input values space: It is the memory space needed to save the input data needed for the algorithm to operate. As exhibited in Algorithm 2 that provides the pseudocode of the proposed RBNRODE algorithm, there are nine input variables, which are: N, \({G}_{max}\), D, \(P_{Fi}\), \(P_\beta\), LB, UB, \(C_R\), and \(W_M\). Each variable represents just numerical values, so each uses 4 bytes of memory space. Therefore, these nine input variables’ total memory space complexity is 36 bytes (\(9 \times 4\) bytes \(= 36\) bytes). The input values space complexity is of constant space.

2.
Auxiliary space: It indicates the additional space that the algorithm uses, apart from the input. It comprises the memory needed for the algorithm’s internal variables, data structures, and other parts. A certain amount of additional memory is used by the RBNRODE algorithm, regardless of the input size. This involves the following variables:

The positions’ vector \(X_{initial}\), whose size is \((N \times D)\) proportional to the initial population of N positions with dimension size D, and each position takes 4 bytes of memory space, so the memory space complexity taken by \(X_{initial}\) is \((4 \times N \times D)\) bytes. Its space complexity is linear since the memory requires linear increases with value \((N \times D)\).

The variables \(\sigma _1\), \(\sigma _2\), g, \(P_{ne}^s\), \(P_{ne}^e\), \(Ne_i\), \(Pa_{i}\), \(Pc_{i}\), \(fit({X}_{i}^{Fi})\), \(fit({X}_{i}^{Ion})\), \(fit({X}_{i}^{Fu})\), \(fit(u_{i})\), \(fit({X}_{i})\), \(\sigma _{\mu }\), \(\sigma _{\nu }\), \(fit({X}_{opt})\). Each of these 16 variables represents just numerical values, so each one takes 4 bytes of memory space. Therefore, these eleven variables’ total memory space complexity is 64 bytes (\(16 \times 4\) bytes \(= 64\) bytes). Its space complexity is of constant space.

The positions’ vectors \({X}_{i}^{Fi}\), \({X}_{i}^{Ion}\), \({X}_{i}^{Fu}\), \({X}_{i}\), \({X}_{r}\), \({X}_{j}\), \(X_{r1}^{Fi}\), \(X_{r1}^{Ion}\), \(X_{r2}^{Fi}\), \(X_{r2}^{Ion}\), \(X_{best}^{Fi}\), \(X_{best}^{Ion}\), \(X_{worst}^{Fi}\), \({X}_{best}\), \(Levy(\beta )\), \(\mu\), \(\nu\), \({X}_{i}^{adjust}\), \(X_i^{bin}\), \({u}_{i}\), \({\upsilon }_{i}\), \({X}_{r_{1}}\), \({X}_{r_{2}}\), \({X}_{r_{3}}\), \({X}_{opt}\). The size of each of these 25 positions’ vectors is D proportional to the dimension size of the obtained positions, and each position takes 4 bytes of memory space. Therefore, the total memory space complexity for these eleven positions’ vectors is \((100 \times D)\) bytes (\(25 \times 4 \times D\) bytes). Its space complexity is linear since the memory requires linear increases with value D.
Consequently, the total memory space complexity for all mentionedabove auxiliary variables is: \((4 \times N \times D) + 64 + (100 \times D)\) bytes.

Finally, the total memory space computational complexity for the proposed RBNRODE algorithm can be calculated as follows:
Note that there are constant bytes, which will not be considered. For that, the total RBNRODE space computational complexity can be expressed in bigO notation as \(O_{space}(RBNRODE)\), and can be computed in bigO notation after removing all constants as follows:
Comparison results between the RBNRODE and other stateoftheart metaheuristic algorithms based on the computational complexity
Creating a comprehensive comparison of the time complexity and space complexity of multiple metaheuristic optimization algorithms can be challenging because these complexities can vary depending on the specific implementation, problem size, and other factors. Additionally, detailed time and space complexity analyses may not be available for all of the mentioned algorithms, and they may have different characteristics when applied to different problems. However, we try to provide a simplified comparison of these algorithms in terms of their general characteristics with respect to time and space complexity, as illustrated in Table 26.
Comparison results of the proposed RBNRODE versus various recent algorithms from the published literature
As previously clarified, no metaheuristic algorithm has ever been applied to RNASeq gene expression data. Therefore, the RBNRODE algorithm is considered the first metaheuristic algorithm to be proposed for solving GS problems of RNASeq gene expression data. This subsection presents the empirical results of comparisons based on the average classification accuracy values, fitness values, and selected features values in tackling the GS issue between the proposed RBNRODE and other recent metaheuristic optimization techniques from the published literature, including Binary meerkat optimization algorithm (BMOA) [73], Binary Brownbear Optimization (BBBO) algorithm [74], Binary Aquila Optimization (BAO) algorithm [75], and Binary African Vultures Optimization (BAVO) algorithm [76].
Comparisons based on the suggested kNN classifier
The accuracy values of the proposed RBNRODE optimizer and other recent optimizers based on the kNN classifier were compared in Table 27 under the same implementation conditions. Based on the empirical results, RBNRODE yielded the best results in four datasets. In the remaining 18 datasets, RBNRODE with kNN and other competitive recent algorithms produced identical results.
Table 28 compares the performance of the proposed RBNRODE algorithm with other algorithms based on kNN using the same implementation requirements. The results indicate that the proposed algorithm outperforms its competitors in producing higherquality fitness values with lower standard deviation across all datasets used in the experiments. It is worth noting that all these datasets are largescale, which demonstrates the proposed algorithm’s ability to perform consistently regardless of the dataset size. Additionally, the RBNRODE algorithm has shown remarkable performance in balancing exploitation and exploration to avoid getting trapped in local optima. Overall, these results suggest that the proposed RBNRODE algorithm is promising and has the potential to evolve beyond the other recent algorithms.
Table 29 shows the number of extracted features using the suggested RBNRODE and other recent optimization algorithms for training the kNN classifier. The proposed RBNRODE exceeded the other recent algorithms in all datasets regarding the number of the selected features. Also, the RBNRODE’s ability to determine the most instructive features is attributable to the capability to explore the feasible regions while maintaining enhanced classification accuracy.
Comparisons based on the suggested SVM classifier
The mean accuracy results of the suggested RBNRODE optimizer and other recent optimization methods regarding the SVM classifier were shown in Table 30 under identical implementation conditions. The proposed RBNRODE produced the most promising results in four datasets based on the results. In the remaining 18 datasets, the proposed RBNRODE with SVM and other recent competitive algorithms yielded equivalent results.
Table 31 shows the fitness values of the suggested RBNRODE and other recent optimization algorithms regarding the SVM classifier. The outcomes show that the proposed technique exceeds its peers by producing the smallest fitness values with lower standard deviation across all benchmarks employed in the experimentations. It is worth noting that all these datasets are largescale, demonstrating the suggested algorithm’s capability to perform consistently regardless of the size of the dataset. Also, the proposed RBNRODE has shown promising performance in balancing exploitation and exploration to avoid getting trapped in local optima.
Table 32 displays the number of extracted features chosen by the suggested RBNRODE and other recent optimization algorithms for training the SVM classifier. The proposed RBNRODE exceeded the other recent algorithms in all datasets regarding the number of selected features. Also, the RBNRODE’s ability to determine the most instructive features is attributable to the capability to explore the feasible regions while maintaining enhanced classification accuracy.
Comparison results of the proposed RBNRODE versus different filter and embedded methods
This subsection presents the experimental results of the proposed RBNRODE and various filter and embedded methods.
Comparisons based on the suggested kNN classifier
Table 33 shows the results of the proposed RBNRODE compared with other filter and embedded methods based on the kNN classifier regarding the classification accuracy values that are fairly evaluated under the same implementation conditions. The empirical results show that the proposed RBNRODE is ranked first by achieving the best results in 4 out of 22 datasets. Lasso regularization is ranked second by yielding identical results in 18 datasets as the proposed RBNRODE. It should also be noted that variance threshold, linear regression, and ridge regularization methods ranked third by yielding identical results in 17 datasets as the proposed RBNRODE.
The average fitness values of the proposed RBNRODE algorithm based on kNN and various filter and embedded methods are shown in Table 34. From the results presented in Table 34, it can be observed that RBNRODE with kNN produces the least fitness values for all datasets. Additionally, it is noteworthy that the proposed RBNRODE can perform consistently on all datasets, irrespective of their size, as all the datasets used in this study are largescale.
Table 35 shows the number of selected features using the proposed RBNRODE with the kNN classifier and different filter and embedded methods. The proposed RBNRODE surpassed the other algorithms in 21 out of 22 datasets regarding the number of extracted features. Correlation ranked second by achieving the best results in one dataset.
Comparisons based on the suggested SVM classifier
Table 36 shows the results of the proposed RBNRODE compared with other filter and embedded methods based on the SVM classifier regarding the classification accuracy values that are fairly evaluated under the same implementation conditions. The empirical results show that the proposed RBNRODE is ranked first by achieving the best results in 4 out of 22 datasets. Lasso regularization is ranked second by yielding identical results in 18 datasets as the proposed RBNRODE. It should also be noted that variance threshold, linear regression, and ridge regularization methods ranked third by yielding identical results in 17 datasets as the proposed RBNRODE.
The average fitness values of the proposed RBNRODE algorithm based on SVM and various filter and embedded methods are shown in Table 37. From the results presented in Table 37, it can be observed that RBNRODE with SVM produces the least fitness values for all datasets.
Table 38 shows the number of selected features using the proposed RBNRODE with the SVM classifier and different filter and embedded methods. The proposed RBNRODE surpassed the other algorithms in 21 out of 22 datasets regarding the number of selected features. Correlation ranked second by achieving the best results in one dataset.
Discussion
Based on the empirical analysis, it can be demonstrated that the proposed RBNRODE with kNN and SVM classifiers yielded more reliable results than other recent algorithms for handling the GS strategy on (rnaseqv2 illuminahiseq rnaseqv2 un edu Level 3 RSEM genes normalized) with more than 20,000 genes to pick the best informative genes and assessed them through 22 cancer datasets. Binary versions of the most common metaheuristic algorithms have been compared with the proposed RBNRODE algorithm. In most of the 22 cancer datasets, the RBNRODE algorithm based on kNN and SVM classifiers achieved optimal convergence and classification accuracy up to 100% integrated with a feature reduction size down to 98%, which is very evident when compared to its counterparts, according to Wilcoxon’s ranksum test (5% significance level). Moreover, the RBNRODE optimizer showed a more significant exploration and exploitation behaviour than its peers, verified by subsequent underlying causes.
Firstly, a preprocessing phase uses the Relief algorithm to identify the relevant features by computing a weight for every feature to describe its relationship and then ignoring the irrelevant features with the lowest weights. The second phase includes applying the binary NRO algorithm combined with the DE technique to determine the most relevant and nonredundant features. When solving largescale problems, the NRO algorithm is susceptible to the local optimal trap. To prevent this, the DE technique is included in the NRO algorithm.
Moreover, the suggested RBNRODE based on the kNN and SVM classifiers emphasizes its behaviour to obtain the optimal solution on time, ensuring an effective equilibrium between exploration and exploitation capabilities. Eventually, due to the nonexact repeatability of the optimization outcomes, separate optimizer implementations can generate various subsets of attributes, which may confuse the user. Therefore, on different occasions or applications, RBNRODE or other optimizers implemented here can select multiple subsets of features.
Conclusion and future work
In this study, we applied the metaheuristic RBNRODE algorithm for solving FS problems of RNASeq gene expression data for the first time and identifying possible biomarkers for various tumour types to improve the best solution. Results were satisfactory, demonstrating the algorithm’s capabilities and effectiveness were significantly increased. kNN and SVM, two wellknown classifiers, were used to assess the usefulness of each subset of the chosen features. The performance of the proposed RBNRODE algorithm was compared to binary versions of 12 wellknown metaheuristic algorithms to validate it on various tumour types with multiple samples. The evaluation was conducted using a variety of metrics, such as the \(AVG_{Fit}\), \(AVG_{Acc}\), and \(AVG_{Feat}\) values. The suggested algorithm in this research, RBNRODE based on kNN and SVM classifiers, performed better than the other algorithms for dealing with FS problems results. Future research could examine how the RBNRODE algorithm integrates with various optimization algorithms. To further explore the effectiveness of the RBNRODE algorithm for FS in supervised classification, other classifiers (such as DTs, artificial neural networks, etc.) could be used.
Availability of data and materials
For transparency and reproducibility, the developed software and the relevant Python code of this paper are publicly available and obtainable in [77].
References
Wang Z, Gerstein M, Snyder M. Rnaseq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63.
Metzker ML. Sequencing technologiesthe next generation. Nat Rev Genet. 2010;11(1):31–46.
Lyu B, Haque A. Deep learning based tumor type classification using gene expression data. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics; 2018. p. 89–96.
Kim YW, Oh IS. Classifier ensemble selection using hybrid genetic algorithms. Pattern Recogn Lett. 2008;29(6):796–802.
Li Y, Wang G, Chen H, Shi L, Qin L. An ant colony optimization based dimension reduction method for highdimensional datasets. J Bionic Eng. 2013;10(2):231–41.
Tabakhi S, Moradi P, Akhlaghian F. An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell. 2014;32:112–23.
Jafari P, Azuaje F. An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inf Decis Making. 2006;6(1):1–8.
Gu Q, Li Z, Han J. Generalized fisher score for feature selection; 2012 arXiv preprint arXiv:1202.3725.
Mishra D, Sahu B. Feature selection for cancer classification: a signaltonoise ratio approach. Int J Sci Eng Res. 2011;2(4):1–7.
Vergara JR, Estévez PA. A review of feature selection methods based on mutual information. Neural Comput Appl. 2014;24(1):175–86.
Shreem SS, Abdullah S, Nazri MZA, Alzaqebah M. Hybridizing relief, MRMR filters and ga wrapper approaches for gene selection. J Theor Appl Inf Technol. 2012;46(2):1034–9.
AbdelBasset M, Sallam KM, Mohamed R, Elgendi I, Munasinghe K, Elkomy OM. An improved binary greywolf optimizer with simulated annealing for feature selection. IEEE Access. 2021;9:139792–822.
Tang J, Duan H, Lao S. Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review. Artif Intell Rev. 2023;56(5):4295–327. https://doi.org/10.1007/s10462022102817.
Wang D, Tan D, Liu L. Particle swarm optimization algorithm: an overview. Soft Comput. 2018;22:387–408.
Karaboga D, Basturk B. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Glob Opt. 2007;39(3):459–71. https://doi.org/10.1007/s108980079149x.
Xue J, Shen B. A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Control Eng. 2020;8(1):22–34.
Emary E, Zawbaa HM, Hassanien AE. Binary grey wolf optimization approaches for feature selection. Neurocomputing. 2016;172:371–81.
Yang XS. A new metaheuristic batinspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010). Springer; 2010. p. 65–74.
Mirjalili S, Lewis A. The whale optimization algorithm. Adv Eng Softw. 2016;95:51–67.
Hichem H, Elkamel M, Rafik M, Mesaaoud MT, Ouahiba C. A new binary grasshopper optimization algorithm for feature selection problem. J King Saud Univ Comput Inf Sci. 2022;34(2):316–28.
Shadravan S, Naji HR, Bardsiri VK. The sailfish optimizer: a novel natureinspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng Appl ArtifIntell. 2019;80:20–34.
Meng XB, Gao XZ, Lu L, Liu Y, Zhang H. A new bioinspired optimisation algorithm: Bird swarm algorithm. J Exp Theor Artif Intell. 2016;28(4):673–87.
Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: algorithm and applications. Future Generat Comput Syst. 2019;97:849–72.
Holland JH. Genetic algorithms. Sci Am. 1992;267(1):66–73.
Storn R, Price K. Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces. J Glob Opt. 1997;11(4):341–59.
Khalid AM, Hosny KM, Mirjalili S. COVIDOA: a novel evolutionary optimization algorithm based on coronavirus disease replication lifecycle. Neural Comput Appl. 2022. https://doi.org/10.1007/s0052102207639x.
Tang D, Dong S, Jiang Y, Li H, Huang Y. Itgo: invasive tumor growth optimization algorithm. Appl Soft Comput. 2015;36:670–98.
Simon D. Biogeographybased optimization. IEEE Trans Evol Comput. 2008;12(6):702–13.
Van Laarhoven PJ, Aarts EH. Simulated annealing. In: Simulated annealing: theory and applications. Springer; 1987. p. 7–15.
Rashedi E, NezamabadiPour H, Saryazdi S. Gsa: a gravitational search algorithm. Inf Sci. 2009;179(13):2232–48.
Zhao W, Wang L, Zhang Z. Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl Based Syst. 2019;163:283–304.
Hashim FA, Houssein EH, Mabrouk MS, AlAtabany W, Mirjalili S. Henry gas solubility optimization: a novel physicsbased algorithm. Future Generat Comput Syst. 2019;101:646–67.
Ma S, Huang J. Penalized feature selection and classification in bioinformatics. Brief Bioinform. 2008;9(5):392–403.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1):389–422.
DíazUriarte R, De Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006;7(1):1–13.
Oh IS, Lee JS, Moon BR. Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell. 2004;26(11):1424–37.
Cadenas JM, Garrido MC, MartíNez R. Feature subset selection filterwrapper based on low quality data. Expert Syst Appl. 2013;40(16):6241–52.
Sarafrazi S, NezamabadiPour H. Facing the classification of binary problems with a GSASVM hybrid system. Math Comput Model. 2013;57(1–2):270–8.
Wei Z, Huang C, Wang X, Han T, Li Y. Nuclear reaction optimization: a novel and powerful physicsbased algorithm for global optimization. IEEE Access. 2019;7:66084–109. https://doi.org/10.1109/ACCESS.2019.2918406.
Li Y, Kang K, Krahn JM, Croutwater N, Lee K, Umbach DM, Li L. A comprehensive genomic pancancer classification using the cancer genome atlas gene expression data. BMC Genomics. 2017;18(1):1–13.
Khalifa NEM, Taha MHN, Ali DE, Slowik A, Hassanien AE. Artificial intelligence technique for gene expression by tumor RNAseq data: a novel optimized deep learning approach. IEEE Access. 2020;8:22874–83.
Dillies MA, Rau A, Aubert J, HennequetAntier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, et al. A comprehensive evaluation of normalization methods for illumina highthroughput RNA sequencing data analysis. Brief Bioinform. 2013;14(6):671–83.
Xiao Y, Wu J, Lin Z, Zhao X. A deep learningbased multimodel ensemble method for cancer prediction. Comput Methods Progr Biomed. 2018;153:1–9.
Liu M, Xu L, Yi J, Huang J. A feature gene selection method based on relieff and PSO. In: 2018 10th international conference on measuring technology and mechatronics automation (ICMTMA), IEEE; 2018. p. 298–301.
Danaee P, Ghaeini R, Hendrix DA. A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing. World Scientific; 2017. p. 219–29.
Kira K, Rendell LA, et al. The feature selection problem: traditional methods and a new algorithm. In: AAAI; 1992. p. 129–34:2.
Kononenko I. Estimating attributes: analysis and extensions of relief. In: European conference on machine learning. Springer; 1994. p. 171–82.
Fergusson JE. The history of the discovery of nuclear fission. Found Chem. 2011;13(2):145–66.
Wei Z, Huang C, Wang X, Han T, Li Y. Nuclear reaction optimization: a novel and powerful physicsbased algorithm for global optimization. IEEE Access. 2019;7:66084–109.
Salimi H. Stochastic fractal search: a powerful metaheuristic algorithm. Knowl Based Syst. 2015;75:1–18.
Zhuoran Z, Changqiang H, Hanqiao H, Shangqin T, Kangsheng D. An optimization method: hummingbirds optimization algorithm. J Syst Eng Electr. 2018;29(2):386–404.
Alpaydin E. Introduction to machine learning. MIT press; 2020.
Cunningham P, Delany SJ. knearest neighbour classifiersa tutorial. ACM Comput Surv (CSUR). 2021;54(6):1–25.
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.
Thaher T, Heidari AA, Mafarja M, Dong JS, Mirjalili S. Binary harris hawks optimizer for highdimensional, low sample size feature selection. In: Evolutionary machine learning techniques. Springer; 2020. p. 251–72.
Mafarja M, Mirjalili S. Whale optimization approaches for wrapper feature selection. Appl Soft Comput. 2018;62:441–53.
Tharwat A, Hassanien AE, Elnaghi BE. A babased algorithm for parameter optimization of support vector machine. Pattern Recogn Lett. 2017;93:13–22.
Schölkopf B, Smola AJ, Bach F, et al. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press; 2002.
Gupta R, Alam MA, Agarwal P. Modified support vector machine for detecting stress level using EEG signals. Comput Intell Neurosci. 2020;2020:1–14.
Li S. Global face pose detection based on an improved PSOSVM method. In: Proceedings of the 2020 international conference on aviation safety and information technology; 2020. p. 549–53.
Mastromichalakis S, Chountasis S. An MR image classification scheme based on fourier moment analysis and linear support vector machine. J Inf Opt Sci. 2020;42:1–19.
Gopi AP, Jyothi RNS, Narayana VL, Sandeep KS. Classification of tweets data based on polarity using improved RBF kernel of SVM. Int J Inf Technol. 2020;15:1–16.
Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’M AZ, Mirjalili S, Fujita H. An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl Based Syst. 2018;154:43–67.
AbdelBasset M, Ding W, ElShahat D. A hybrid harris hawks optimization algorithm with simulated annealing for feature selection. Artif Intell Rev. 2020;54:1–45.
Sallam KM, Elsayed SM, Sarker RA, Essam DL, Improved united multioperator algorithm for solving optimization problems. In: IEEE congress on evolutionary computation (CEC). IEEE. 2018;2018. p. 1–8.
Normalizedlevel3 RNAseq gene expression dataset. https://gdac.broadinstitute.org/.
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM. Salp swarm algorithm: a bioinspired optimizer for engineering design problems. Adv Eng Softw. 2017;114:163–91. https://doi.org/10.1016/j.advengsoft.2017.07.002.
Mirjalili S, Mirjalili SM, Yang XS. Binary bat algorithm. Neural Comput Appl. 2014;25:663–81.
Mirjalili S, Lewis A. Sshaped versus vshaped transfer functions for binary particle swarm optimization. Swarm Evol Comput. 2013;9:1–14.
Ghosh KK, Ahmed S, Singh PK, Geem ZW, Sarkar R. Improved binary sailfish optimizer based on adaptive \(\beta\)hill climbing for feature selection. IEEE Access. 2020;8:83548–60.
Derrac J, García S, Molina D, Herrera F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput. 2011;1(1):3–18.
Xian S, Feng X. Meerkat optimization algorithm: a new metaheuristic optimization algorithm for solving constrained engineering problems. Expert Syst Appl. 2023;231: 120482. https://doi.org/10.1016/j.eswa.2023.120482.
Prakash T, Singh PP, Singh VP, Singh SN. A novel brownbear optimization algorithm for solving economic dispatch problem. In: Advanced control & optimization paradigms for energy system operation and management. River Publishers; 2023. p. 137–64.
Abualigah L, Yousri D, Abd Elaziz M, Ewees AA, AlQaness MA, Gandomi AH. Aquila optimizer: a novel metaheuristic optimization algorithm. Comput Ind Eng. 2021;107250:157.
Abdollahzadeh B, Gharehchopogh FS, Mirjalili S. African vultures optimization algorithm: a new natureinspired metaheuristic algorithm for global optimization problems. Comput Ind Eng. 2021;158: 107408.
Python code for gene selection via relief binary nuclear reaction optimization algorithm based on differential evolution). https://github.com/DAmrAtef/Gene_Selection_RBNRO_Algorithm.git.
Acknowledgements
Not applicable.
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Author information
Authors and Affiliations
Contributions
Amr A. Abd ElMageed: Conceptualization, methodology, software, formal analysis, investigation, data curation, validation, writing—original draft, writing—review and editing. Ahmed E. Elkhouli: investigation, visualization, data curation, validation, writing—original draft, writing—review and editing. Amr A. Abohany: resources, formal analysis, data curation, validation, writing—original draft, writing—review and editing. Mona Gafar: resources, validation, writing—review and editing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
ElMageed, A.A.A., Elkhouli, A.E., Abohany, A.A. et al. Gene selection via improved nuclear reaction optimization algorithm for cancer classification in highdimensional data. J Big Data 11, 46 (2024). https://doi.org/10.1186/s4053702400902z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4053702400902z