Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data

El-Mageed, Amr A. Abd; Elkhouli, Ahmed E.; Abohany, Amr A.; Gafar, Mona

doi:10.1186/s40537-024-00902-z

Research
Open access
Published: 03 April 2024

Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data

Amr A. Abd El-Mageed¹,
Ahmed E. Elkhouli²,
Amr A. Abohany³ &
…
Mona Gafar^4,5

Journal of Big Data volume 11, Article number: 46 (2024) Cite this article

376 Accesses
Metrics details

Abstract

RNA Sequencing (RNA-Seq) has been considered a revolutionary technique in gene profiling and quantification. It offers a comprehensive view of the transcriptome, making it a more expansive technique in comparison with micro-array. Genes that discriminate malignancy and normal can be deduced using quantitative gene expression. However, this data is a high-dimensional dense matrix; each sample has a dimension of more than 20,000 genes. Dealing with this data poses challenges. This paper proposes RBNRO-DE (Relief Binary NRO based on Differential Evolution) for handling the gene selection strategy on (rnaseqv2 illuminahiseq rnaseqv2 un edu Level 3 RSEM genes normalized) with more than 20,000 genes to pick the best informative genes and assess them through 22 cancer datasets. The k-nearest Neighbor (k-NN) and Support Vector Machine (SVM) are applied to assess the quality of the selected genes. Binary versions of the most common meta-heuristic algorithms have been compared with the proposed RBNRO-DE algorithm. In most of the 22 cancer datasets, the RBNRO-DE algorithm based on k-NN and SVM classifiers achieved optimal convergence and classification accuracy up to 100% integrated with a feature reduction size down to 98%, which is very evident when compared to its counterparts, according to Wilcoxon’s rank-sum test (5% significance level).

Introduction

DNA contains our recipe, “our genetic code”. Although each cell’s DNA is the same, each tissue structure is distinct and has a unique function, as DNA expresses which genes in a cell are active and which are not engaged through a mechanism called RNA transcription. This RNA is then converted into a protein responsible for cell structure and function. Therefore, analyzing a transcriptome profile is our method for determining the genetic changes in each cell from which we can evaluate diseases’ biomarkers. Differential expression analysis aims to discover quantitative changes in expression levels through statistical analysis to classify genes whose expression levels vary under different conditions, which helps us understand diseases and control them. In this manner, Gene expression Profiling technologies have been significantly developed. There are two leading popular technologies: the hybridization-based technique “micro-array”, which is elder, and the next-generation sequencing-based “RNA-Seq” [1]. Both techniques are meant to quantify gene expression for statistical analysis and classification. The quantification data based on the next-generation sequencing-based RNA-Seq technique is chosen in this paper because it can detect RNA quantification levels more accurately than micro-array data. This reason is not the only advantage of the RNA-Seq technique but also because the previous technique has many limitations that have been overcome thanks to the next-generation sequencing-based technology [2], which is the base of the RNA-seq method as mentioned above. One primer obstacle in the micro-array was the reliance upon existing sequencing knowledge that limited the detection range; this obstacle is no longer a problem in RNA-seq as it requires no previous knowledge and makes our dynamic detection range wide. That choice helps our results’ accuracy and the set of genes we get and gives us a close understanding of the disease’s accurate biomarker.

Lyu et al. [3] presented the scope of determining cancer genetic biomarkers depending on RNA-Seq gene expression data; it worked on normalized-level3 RNA-Seq gene expression data of 33 tumor types in Pan-Cancer Atlas, which we have also worked on in this paper. However, it was noted that the researchers in paper [3] used mixed samples of non-tumor samples as if they were all tumors. Therefore, we made a code to separate samples based on their type for binary classification and more accurate tumor data. It is noteworthy that every record of data is comprised of a set of 20531 genes “features”, which includes an abundance of extraneous genes and extra information.

The curse of dimensionality [4] is a popular challenge as a result of the evolutionary era of data availability, which leads to progress in Feature Selection (FS) algorithms and techniques. Generally, FS techniques follow four approaches: filter approach, wrapper approach, embedded approach, and hybrid approach [5, 6]. All these approaches aim to select the best features to distinguish the classes, which are, in our case, the informative genes related to their tumor.

The filter approach depends on the single relationship of each gene using statistical scores to represent the strength, which achieves high accuracy and selects the best group of genes. However, working on each gene separately discards the reality of the interrelationships between genes, and it can be trapped in a local optimum. It is also worth mentioning that the filter approach includes sub-types univariate and multivariate; the main difference is that the multivariate considers correlation in its rank. Examples of filter approach are t-test [7], Fisher score [8], signal-to-noise ratio [9], information gain [10], and Relief [11].

The wrapper approach can be seen as an exploration of all possible subsets, and the principle is to create and test a subset of genes. A particular classifier determines the output of a given subset, and the classification algorithm is used many times for each evaluation. This approach achieves higher performance than the filter approach because of the reality that it uses a classification algorithm that guides the learning process. However, that classifier requires high computational cost and slows the process, especially with our high-dimensional data.

A metaheuristic is a higher-level procedure or heuristic used in computer science and mathematical optimization to find, generate, or select a heuristic (partial search algorithm) that may offer a good enough solution to an optimization problem, particularly when there is incomplete or imperfect information or limited computing power. A subset of solutions that would otherwise be too numerous to be fully enumerated or otherwise investigated is sampled by metaheuristics. Metaheuristics may only make a few generalizations about the optimization problem, making them useful for various issues. Metaheuristics do not guarantee that a globally optimal solution can be found for a class of problems, unlike optimization algorithms and iterative techniques. Numerous metaheuristics use stochastic optimization, meaning that the outcome depends on the collection of generated random variables. Metaheuristics are generally more effective than optimization algorithms, iterative techniques, or basic heuristics in combinatorial optimization because they search a much more extensive range of feasible solutions. As a result, they are advantageous strategies for optimization issues. Several publications and research papers have been released on the issue. Meta-heuristic approaches can successfully address the FS problem among several wrapper solutions. Stochastic techniques may produce optimum (or nearly optimal) answers quickly, and academics have begun to use them. These techniques have many benefits, such as flexibility regarding dynamic changes, the ability to self-organize without requiring specific mathematical properties, and the capacity to evaluate multiple solutions simultaneously. For that, meta-heuristic algorithms have attracted researchers’ attention for tackling optimization problems. Several meta-heuristic-based algorithms for solving the FS issue have recently been developed [12]. These algorithms yield trustworthy (near-optimal) solutions at a drastically decreased computational cost.

Evolutionary Approaches (EA), Swarm intelligence (SI) approaches, and Physics-based Approaches (PHA) are the classes of metaheuristic approaches. SI approaches are a group inspired by swarms and animals’ behavior habits [13]. Multiple SI methods have been proposed in the literature and above. They have obtained reliable outcomes in a broad range of optimization issues, such as Particle Swarm Optimization (PSO) [14], Artificial Bee Colony (ABC), [15], Sparrow Search Algorithm (SSA) [16], Grey Wolf Optimization (GWO) [17], Bat Algorithm (BA) [18], Wheel Optimization Algorithm (WOA) [19], Grasshopper Optimization Algorithm (GOA) [20], Sailfish Optimizer (SFO) [21], Bird Swarm Algorithm (BSA) [22], and Harris Hawks Optimization (HHO) [23]. The EA approaches are designed by simulating biological evolutionary patterns such as mutation, crossover, and choice. Genetic Algorithm (GA) [24], Differential Evolution (DE) [25], COVIDOA Optimization Algorithm [26], invasive tumor growth optimizer [27], and biogeography-based optimizer [28] are significant EA-based metaheuristic methods that have demonstrated their effectiveness in multiple optimization areas. PHA has been created using the rules of physics found in nature techniques, including SA [29], Gravitational Search Algorithm (GSA) [30], Atom Search Optimization (ASO) [31], and Henry Gas Solubility Optimization (HGSO) [32].

The embedded approach uses a learning algorithm to choose the relevant genes, directly interacting with the classification; the FS algorithm is integrated as part of the learning algorithm. The learning model is trained using an initial feature set to establish a criterion for measuring the rank values of features. The main objective is to reduce the computation time for reclassifying different subsets, which is done in wrapper methods by incorporating the FS into the training process. The most common embedded techniques are tree algorithms like Random Forest (RF). Some embedded methods perform feature weighting based on regularization models with objective functions that minimize fitting errors and, in the meantime, force the feature coefficients to be small or precisely zero. These Methods are the LASSO [33] with the L1 penalty, Ridge with the L2 penalty for constructing a linear model, and Elastic Net [34]. Examples of the embedded approach are SVM based on Recursive Feature Elimination (SVM-RFE) [35], RF [36], and the First Order Inductive Learner (FOIL) rule-based feature subset selection algorithm.

The hybrid approach is designed to combine the filter and wrapper approaches to achieve the advantage of each and maximize each approach’s benefits. The feature space dimension space is first reduced using a filter approach, which may produce numerous candidate subsets with moderate complexity. Then, a wrapper is used as a learning strategy to determine the best candidate subset. The excellent efficiency of filters and the high accuracy of wrappers are typically achieved via hybrid approaches. Many intriguing methodologies, hybrid genetic algorithms [37], hybrid ant colony optimization [38], and mixed gravitational search algorithm [39], have recently been proposed. Practically any combination of filter and wrapper can be used to create a hybrid methodology.

Motivation and contributions

Nuclear Reaction Optimization (NRO) [40] is A brand-new meta-heuristic algorithm for global optimization, which mimics the nuclear reaction process. The proposed NRO algorithm can be divided into two phases, nuclear fission (NFi) and nuclear fusion, in accordance with the definitions of nuclear reaction characteristics (NFu). The nuclear fission phase primarily mimics this mechanism. The Gaussian walk and differential operators between the nucleus and neutron have been used for exploitation and exploration in nuclear fission based on the types of nuclei and the probability of decay following bombardment. The NFu phase primarily mimics the fusion of nuclear reactions. The ionization and fusion processes of the NFu can be included in this phase.

In order to address the Gene Selection (GS) problem, this paper suggests an improved binary version of the NRO algorithm, known as the RBNRO-DE algorithm, which is a promising method and shows precise performance. Initially, there’s a chance that the suggested algorithm will avoid local optima and achieve sufficient search accuracy, rapid convergence, and enhanced stability. The suggested RBNRO-DE achieves improved efficacy by obtaining optimal or nearly optimum outcomes for many of the investigated issues, in contrast to state-of-the-art meta-heuristic algorithms. Furthermore, RBNRO-DE uses a transfer function to convert continuous data into discrete values, and it incorporates the Relief algorithm and the DE technique for boosting exploration capacity and the best outcomes found inside the solution space through iterations. The rationality for applying the RBNRO-DE approach in FS is due to the fact that it is easy to understand and create, can handle a wide range of optimization problems and achieves worthwhile outcomes in a reasonable amount of time and lower computational costs; it also utilizes few control parameters. The fundamental contributions of this paper can be presented in the following:

RNA-seq next-generation sequencing-based level 3 data is pre-processed.
The proposed NRO algorithm is a novel type of metaheuristic algorithm that has not been applied before to RNA-Seq gene expression data. Thus, its ability to resolve this issue has not been examined.
NRO is modified and then re-created to develop a binary version called the RBNRO-DE algorithm.
For improving the feature space exploration capacity and enhancing the acquired optimal outcomes, the proposed RBNRO-DE algorithm embeds a Relief algorithm and a DE technique with the binary version of the NRO algorithm. This embedding enhances the algorithm’s performance by producing a new population that maintains the fundamental structure but has more appropriate positions.
As GS has a broad search space, it frequently leads to the issue of being trapped in local optima in most current algorithms. The RBNRO-DE can efficiently explore large spaces to locate optima or near optima solutions while avoiding falling into local optima.
The final results are estimated based on various performance metrics, including mean of fitness rate, mean of accuracy rate, and mean of features count selected.
The influence of the proposed RBNRO-DE algorithm using the two suggested classifiers (k-NN and SVM) is compared with its peers of literature algorithms.
The proposed RBNRO-DE algorithm is evaluated on 22 different types of cancer datasets, and the results are displayed.
The selected genes are conducted with cancer-type bio-mark.

Structure

The rest of the paper consists of five sections as follows: “Related work” section discusses the literature of FS with genome data; then “Background details” section analyzes and elaborates the base concepts of the presented methodology background; after that in “Proposed relief binary NRO based on DE (RBNRO-DE) for gene selection” section provides a detailed explanation for the proposed RBNRO-DE algorithm, which is the improved version of NRO and its parameters to handle GS; “Experimental results and discussion” section presents the experimental results and comparisons with some competitive algorithms; and finally “Conclusion and future work” section contains the work conclusion and suggestions for future research.

Related work

This section will demonstrate the literature on researchers’ techniques to handle the high-dimensionality of genome data for accurate classification. Deleting irrelevant genes plays an essential role in the performance of classification algorithms, so selecting genes is a necessary step before using any Machine Learning (ML), deep learning (DL) algorithms, or other classification methods. For this consideration, we have studied some related studies in this scope to reach the goal of RNA-seq classification for cancer detection.

Li et al. [41] had an interest in finding tumors’ biomarkers; they worked on the pan-cancer public data set for 31 different types. GA/K-NN was the method they used to extract the genes. In this method, they carried out multiple iterations of subsets of genes and then asses the accuracy with the k-NN algorithm. Using the resultant accuracy, they chose the best set of features. This method, with 90% success, has been used across 31 types of cancer.

Lyu et al. [3] presented work to find specific cancer biomarkers; they depended on the importance of genes to their contribution to the classification. They followed these steps: pre-processing the data and applying tumor-type classification using a convolutional neural network. After that, they generated heat maps for each class to pick out the genes corresponding to pixels with top intensities in the heat maps and finally validate the pathways of selected genes. In pre-processing, they used a variance threshold of 1.19 to delete the gene expression levels that had not changed as the GS step, which reduced the number of genes to 10381 of 19531, which is a filtering approach and The final accuracy they got was 95.59. Although the accuracy was good, it can still be much better, which can be achieved using a better FS approach to reduce that dimensionality.

Khalifa et al. [42] have followed the paper mentioned above [3] however, They focused on five cancer types of data: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), breast invasive carcinoma (BRCA), kidney renal clear cell carcinoma (KIRC) and uterine corpus endometrial carcinoma (UCEC). The total dataset is 2086 rows and 972 columns; each row contains a specific sample and the RPKM RNA-Seq values of a particular gene [43]. They used the hybrid approach for pre-processing the data as they proposed binary particle swarm optimization with design trees (BPSO-DT) algorithm; 615 features out of 971 were chosen as the best features of RNA-seq. The presented results and the performance metrics performed in this research showed that the proposed approach achieved an overall testing accuracy of 96.90%. The comparative results were introduced, and the accuracy achieved in the present work outperforms that of other related work for the testing accuracy for five classes of tumors. Moreover, the proposed approach is less complex and has less time for training.

Xiao et al. [44] evaluated their method on three RNA-Seq gene expression data sets: lung adenocarcinoma, Stomach Adenocarcinoma (STAD), and breast invasive carcinoma. They depended on the DL technique as they used five different classification models followed by the DL model to ensemble each result of the five models and that have made an improvement in all the predictions evaluation as follows: LUAD $99.20\%$, BRCA $98.4\%$, and STAD 98.78&.

Liu et al. [45] have investigated genetic data but not in RNA-Seq. However, they used microarray data and followed the hybrid approach as well. Unlike the papers mentioned above, they have worked on each type independently. They used four gene datasets of colon cancer, small-round-blue-cell tumors, leukemia, and lung cancer to evaluate the algorithm’s performance. The algorithm depends on Relief as the feature pre-filter to remove the genes with low relevancy with the cancer type. PSO is used as the search algorithm, and finally, the classification accuracy of SVM is used as the evaluation function of the feature subset to get the final optimal gene subset cancer.

Danaee et al. [46] worked on the gene expression data using the power of encoders and decoders of neural networks as they used Stacked Denoising Autoencoder (SDAE) as the FS method. The effectiveness of the extracted representation was then assessed using supervised classification models to confirm the use of the additional features in cancer detection. Finally, by studying the SDAE connection matrices, they discovered a collection of highly interacting genes. They used RNA-seq expression data for both tumor and healthy breast samples from The Cancer Genome Atlas (TCGA) database for our research. These data comprise 113 healthy samples and 1097 samples with breast cancer. The findings and analyses show that the highly interacting genes may serve as breast cancer indicators that merit further investigation. After training the SDAE, they chose a layer with a low dimension and validation error compared to other encoder stacks. It has four layers that were respectively 15,000, 10,000, 2000, and 500 dimensions thick. The chosen layer’s features are fed into the algorithms for classifying data. Deep learning models can, therefore, easily handle vast amounts of input data. Hence, they anticipate this model will perform better and highlight more insightful patterns if additional gene expression data becomes available.

According to the related work, most research with genetic data is at its beginning, and all of the work is trials to conduct and apply the concepts in this promising field. The research literature is filled with experiments on different methods, such as FS and deep learning state-of-the-art techniques. However, due to the very high dimensions of genetic data, there is no perfect technique. FS of genome data detects the link of a gene to its class, which is a critical preprocessing task to overcome the curse of dimensionality and verification of the gene biomarker of cancer. Because of this, the objective of this study is to use a new wrapper approach RBNRO-DE algorithm and apply it for the first time on the RNA-Seq and compare the influence of the algorithm with other FS methods.

Background details

Relief algorithm

Relief algorithm [47, 48] is a highly effective, simple, and rapid filtering method for determining the features associated with each other. The essential idea of this algorithm is to identify features that cause values to be close for identical samples that are near each other and significant for the distinction between the different samples. Therefore, the algorithm relies on the weighted order of features. The higher the feature’s weight, the better the feature to classify, and vice versa.

The Relief algorithm begins by selecting a sample at random, after which it investigates two types of nearest samples: one associated with comparable class samples called Near-Hit and the other related to different class samples called Near-Miss. Each feature’s weight can be assessed from the values of both Near-Hit and Near-Miss. The features are arranged according to their weights. The features with the highest weights will be chosen in the end. The weight W for the feature A can be measured using the following equation:

$$\begin{aligned} W_A = \sum _{j=1}^{N}\Big (x_A^j - NM(x^j)_A \Big )^2 - \Big (x_A^j - NH(x^j)_A\Big )^2. \end{aligned}$$

(1)

where $W_A$ is the feature’s weight A, $x_A^j$ is the feature’s value A for data $x^j$, and N represents the number of samples. $NH(x^j)$ and $NM(x^j)$ are the closest data points to $x^j$ that belong to the similar same and different classes, respectively.

NRO algorithm

The idea of nuclear reaction arose after finding neutrons derived from boron and nitrogen. This results from research into the interaction of uranium with neutron [49]. Nuclear fission and nuclear fusion are the two processes that make up the nuclear reaction [50]. As shown in Fig. 1, nuclear fission occurs when a heated neutron shells a weighty nucleus and transforms into lighter nuclei as fission outcomes and other molecules. When heated neutrons shell weighty nuclei, new neutrons are produced to shell other weighty nuclei. The nuclear fission chain reaction is the name for this methodology. As a result, a significant amount of power is released, which is relative to the variation in mass between the atom and the masses of the majority of fission fragments.

Nuclear fusion, on the other hand, occurs when a nucleus is warmed until it is in a condition of plasma, where the strong nuclear force causes nuclear particles to get close enough to join together and overcome the Coulomb repulsion force, as seen in Fig. 2.

The nuclear fission process is first used during the presented approach, in which nuclei fragments absorb hot neutrons and then form odd or even-even nuclei. Essential fission products, which might be utilized for exploitation, and subaltern fission products, which can be used for exploration, are two types derived from odd nuclei. The even-even nuclei not present in fission can be sought near the existing positions (current optimal solution). After that, the presented approach utilizes the process of nuclear fusion, whereby the energy generated during nuclear fission is used to heat the nuclei, causing atomic fusion. Some nuclei constrained by the force of Coulomb repulsion slow down the upcoming velocity for exploitation or reject one another for exploration. Other nuclei can be explored by overcoming Coulomb repulsion and bonding together by strong nuclear forces. The heated neutron or energy generated in nuclear interaction gives each nucle kinetic energy.

According to the above illustration, a physics-based optimization algorithm known as the NRO algorithm [50] has been developed to mimic the two nuclear reaction processes, namely fission and fusion processes. The nuclear fission process involves the nuclear fission operators comprising of two cases: essential and subaltern fission of odd nuclei and nearby seeking a solution of the even-even nucleus. As for the nuclear fusion process, it has ionization and fusion phases that make up its nuclear fusion operators. Since the NRO algorithm might slip into the local optima trap, a fusion process incorporates a Levy flight methodology to jump out of the local optimal value.

Base processes of NRO algorithm

According to the NRO algorithm, the cycle generated by fission energy and fusion neutrons might be employed to find the most stable nucleus (optimal fitness value). Hence, nuclear fusion can arise from heating lighter nuclei with the energy emitted by nuclear fission. In contrast, nuclear fission can result from shelling the weighty nuclei by thermal neutrons from nuclear fusion. For exploitation and exploration of a search solution area, the NRO algorithm considers nuclear fission and nuclear fusion processes to occur in a closed container where all nuclei interface. The NRO algorithm considers a nucleus characteristic that comprises elements like position, potential energy, nucleus mass number, and charge property, which is a solution in a search solution area. The specific binding energy of each nucleus is assessed as the energy for each mass, which describes the nucleus’ stability. The essential processes of the NRO algorithm are depicted below.

1.
Nuclear fission process: According to the cycle between nuclear fission and nuclear fusion, it is thought that hot neutrons shelling a weighty nucleus for nuclear fission may be created by the nuclear fusion of two separate arbitrary nuclei. In order to mathematically model nuclear fission, Gaussian walk [51] is utilized to mimic the various fission elements with diverse cases. In general, two cases can be used to distinguish the attributes of various products. The first case is associated with forming subaltern fission products for exploitation and essential fission products for exploration. These products are created when nuclear fission is applied to odd nuclei. The odd nuclei from which the subaltern fission products are generated are activated for fission utilizing energy emitted by heated neutrons and can be highly steady through $\beta$ decay. On the other hand, the information on neutron and the present best solution is used by the existing solution to find a more satisfactory solution depending on the Gaussian walk. As for the odd nuclei from which the essential fission products are produced following the absorption of a hot neutron may not be steady because the fission fragment may not afford $\beta$ decay. In the first case, $rand \le P_{Fi}$ is correct, where rand signifies an arbitrary number distributed uniformly within the range [0, 1], and $P_{Fi}$ is the probability of nucleus fission. For the subaltern fission products of odd nuclei, ${rand} \le P_\beta$ is correct, where $P_\beta$ is the likelihood of $\beta$ decay. ${rand} > P_\beta$ is suitable for essential fission products of odd nuclei. The composition process of subaltern and actual fission products of odd nuclei can be expressed as follows:
$$\begin{aligned}{} & {} {X}_{i}^{Fi}= \left\{ \begin{array}{ll} Gaussian(X_{best},\sigma _1)+(randn \cdot X_{best}-P_{ne}^s \cdot Ne_i), &{} \,\,\, \text{if}\,{rand} \le P_\beta ,\\ Gaussian(X_{i},\sigma _2)+(randn \cdot X_{best}-P_{ne}^e \cdot Ne_i), &{}\,\,\, \text{if}\,{rand} > P_\beta , \end{array} \right\} \text{if}\,\,\,{rand} \le P_{Fi}, \end{aligned}$$
(2)
$$\begin{aligned}{} & {} \sigma _1=\Big (\frac{log(g)}{g}\Big ) \cdot |{X}_{i} - {X}_{best}|, \end{aligned}$$
(3)
$$\begin{aligned}{} & {} \sigma _2=\Big (\frac{log(g)}{g}\Big ) \cdot |{X}_{r} - {X}_{best}|, \end{aligned}$$
(4)
$$\begin{aligned}{} & {} P_{ne}^s=round(rand+1), \end{aligned}$$
(5)
$$\begin{aligned}{} & {} P_{ne}^e=round(rand+2), \end{aligned}$$
(6)
$$\begin{aligned}{} & {} Ne_i = \frac{(X_i + X_j)}{2}. \end{aligned}$$
(7)
where ${X}_{i}^{Fi}$ means the $i{\text{th}}$ fission product nucleus, randn means a normally distributed arbitrary number, and $X_{best}$ denotes the present most suitable nucleus. The Gaussian distribution’s parameters for subaltern fission products are $X_{best}$ and $\sigma _1$, while the parameters of Gaussian distribution for essential fission products are $X_{i}$ and $\sigma _2$, $\sigma _1$ and $\sigma _1$ signifies the step sizes, g represents the present generation number, ${X}_{r}$ means the $r{\text{th}}$ nucleus whose index r is picked randomly from the population of nuclei. Additionally, $P_{ne}^s$ represents a mutation factor, indicating that the subaltern fission product can exploit the slighter searching range, whereas $P_{ne}^e$ indicates that the essential fission product can exploit the larger searching range, in which round is the closest integer and rand is an arbitrary number distributed uniformly within the range [0, 1]. $Ne_i$ is the $i{\text{th}}$ heated neutron, $X_i$ and $X_j$ represent the different random $i{\text{th}}$ nucleus and $j{\text{th}}$ nucleus, respectively. The second case is related to an even-even nucleus, which cannot be activated for fission. The status of the nucleus is altered even if there is no fission. The present nucleus’ information might be kept, and it comes from the Gaussian walk. In the second case, $rand > P_{Fi}$ is correct, where $P_{Fi}$ is the prospect of nucleus fission. It is expressed as follows:
$$\begin{aligned} {X}_{i}^{Fi}= \left\{ \begin{array}{ll} Gaussian(X_{i},\sigma _2),&\,\,\, \text{if}\,{rand} > P_{Fi}. \end{array} \right. \end{aligned}$$
(8)
2.
Nuclear fusion process: Whenever nuclei are heated to a plasma shape, they can merge to form nuclei heavier than the initial light nuclei, known as hot nuclear fusion. The nuclear fusion process includes two steps: ionization and fusion steps.
- The ionization step: It supposes that nuclear fission causes the emission of thermal ionization energy, which yields the motion of a nucleus. Differential operators can be involved in the ionization step. Firstly, each nucleus is rated given its fitness function level, starting with the biggest and ending with the smallest. For exploitation, the nucleus with a higher fitness function value is kept for guiding, whereas the nucleus with a lower fitness function value is utilized for exploration.
  
  In the ionization step, when $rand > Pa_{i}$, where $Pa_{i}$ is a probability value of the nucleus’s ionization and illustrates that the higher possibility value means a better nucleus, the ionization step can be described as mathematically, to enhance the exploration’s quality, as follows:
  $$\begin{aligned}{} & {} {X}_{i,d}^{Ion}= \left\{ \begin{array}{ll} X_{r1,d}^{Fi} + rand \cdot (X_{r2,d}^{Fi} - X_{i,d}^{Fi}), &{}\,\,\, \text{if}\,{rand} \le 0.5,\\ X_{r1,d}^{Fi} - rand \cdot (X_{r2,d}^{Fi} - X_{i,d}^{Fi}), &{}\,\,\, \text{if}\,{rand}> 0.5, \end{array} \right\} \,\,\,\text{if}\,rand > Pa_{i}, \end{aligned}$$
  (9)
  $$\begin{aligned}{} & {} Pa_{i}=\frac{rank(fit({X}_{i}^{Fi}))}{N}. \end{aligned}$$
  (10)
  where ${X}_{i,d}^{Ion}$ is the $d{\text{th}}$ variable of $i{\text{th}}$ ion after ionization. The $d{\text{th}}$ variables of ${r1}{\text{th}}$, ${r2}{\text{th}}$ and $i{\text{th}}$ fission nuclei are represented by $X_{r1,d}^{Fi}$, $X_{r2,d}^{Fi}$ and $X_{i,d}^{Fi}$, respectively, and rand implies an arbitrary number between 0 and 1. $Pa_{i}$ denotes a probability value of nucleus’s ionization, $fit({X}_{i}^{Fi})$ is the fitness function value of ${X}_{i}^{Fi}$, $rank(fit({X}_{i}^{Fi}))$ means the rank of ${X}_{i}^{Fi}$ in the population, and N is the overall number of nuclei. In contrast, when $rand \le Pa_{i}$, the thermal fission’s energy can’t ionize the more stable nucleus. As a result, $X_{i,d}^{Fi}$ is adjusted to improve the exploitation’s performance using the following formula:
  $$\begin{aligned} {X}_{i,d}^{Ion}= \left\{ \begin{array}{ll} X_{i,d}^{Fi} + round(rand) \cdot rand \cdot (X_{worst,d}^{Fi} - X_{best,d}^{Fi}),&\,\,\, \text{if}\,rand \le Pa_{i}. \end{array} \right. \end{aligned}$$
  (11)
  where $X_{worst,d}^{Fi}$ and $X_{best,d}^{Fi}$ mean the $d{\text{th}}$ variable for the worst and better fission product nucleus, respectively. The algorithm is sometimes susceptible to falling into the trap of local optimal conditions, where two solutions are almost identical, and the difference item might be zero. In this case, the search strategy is considered the most challenging part. Therefore, finding an algorithm-optimized approach for supporting the current solution in leaping out of a local optimal solution and investigating the global optimum is critical. This approach is called Levy flight distribution [52]. About Eq. (9), which was formed for improving the exploration in the ionization step, this equation can be applied appropriately when $X_{r2,d}^{Fi}$ is not equal to the value of $X_{i,d}^{Fi}$. However, in case the value of $X_{r2,d}^{Fi}$ is equal to the value of $X_{i,d}^{Fi}$. The Levy flight distribution approach should be employed to avoid a locally optimal solution as follows:
  $$\begin{aligned}{} & {} {X}_{i,d}^{Ion}=X_{i,d}^{Fi} + \Big (\alpha \otimes Levy(\beta )\Big )_{d} \cdot \Big (X_{i,d}^{Fi} - X_{best,d}^{Fi}\Big ), \end{aligned}$$
  (12)
  $$\begin{aligned}{} & {} Levy(\beta )=\frac{\mu }{|\nu |^{1/\beta }}, \end{aligned}$$
  (13)
  $$\begin{aligned}{} & {} \mu =N(0,\sigma _{\mu }^2),\,\,\,\,\,\, \nu =N(0,\sigma _{\nu }^2), \end{aligned}$$
  (14)
  $$\begin{aligned}{} & {} \sigma _{\mu }=\Big (\frac{\Gamma (1+\beta )\sin (\Pi \beta /2)}{\Gamma [(1+\beta )/2]\beta 2^{(\beta -1)/2}}\Big )^{1/\beta },\,\,\,\,\,\,\, \sigma _{\nu }=1. \end{aligned}$$
  (15)
  where $\alpha$ is a scale factor whose value is determined by the problem’s scales ($\alpha = 0.01$), and $Levy(\beta )$ denotes the Levy flight step size. $\mu$ and $\nu$ are calculated from the normal distribution $N(0,\sigma _{\mu }^2)$, and $N(0,\sigma _{\nu }^2)$ respectively, and $\beta = 1.5$. As for Eq. (11), which was formed for improving the exploitation in the ionization step, this equation can be applied appropriately when $X_{worst,d}^{Fi}$ is not equal to the value of $X_{best,d}^{Fi}$. However, in case the value of $X_{worst,d}^{Fi}$ is equal to the value of $X_{best,d}^{Fi}$, then the Levy flight distribution approach should be utilized as follows:
  $$\begin{aligned} {X}_{i,d}^{Ion}=X_{i,d}^{Fi} + \Big (\alpha \otimes Levy(\beta )\Big )_{d} \cdot \Big (UB_{d}-LB_{d}\Big ). \end{aligned}$$
  (16)
- The fusion step: It attempts to combine an ion with information from different ions and modify the status of the ions. Initially, all ions acquired from the ionization are ranked given their fitness function levels, starting with the largest and ending with the lowest. In the fusion step, if $rand > Pc_{i},$where $Pc_{i}$ is a probability value of the $i{\text{th}}$ ion, the ions of two light nuclei defeat the Coulomb repelling force and are fused through a robust nuclear force. Additional differential operators are used in the fusion stage to simulate the collision and fusion and boost the variety of the nuclei population to allow for more effective exploration. This situation can be depicted mathematically through the following equation:
  $$\begin{aligned}{} & {} {X}_{i}^{Fu}= \left\{ \begin{array}{ll} X_{i}^{Ion} + rand \cdot (X_{r1}^{Ion} - X_{best}^{Ion}) + rand \cdot (X_{r2}^{Ion} - X_{best}^{Ion}) \\ \,\,\,\,\,\,\,\,\, - e^{-norm(X_{r1}^{Ion} - X_{r2}^{Ion})} \cdot (X_{r1}^{Ion} - X_{r2}^{Ion}), &{}\,\,\, \text{if}\,rand>Pc_{i}, \end{array} \right. \end{aligned}$$
  (17)
  $$\begin{aligned}{} & {} Pc_{i}=\frac{rank(fit({X}_{i}^{Ion}))}{N}. \end{aligned}$$
  (18)
  where ${X}_{i}^{Fu}$ is the $i{\text{th}}$ product of fusion, $X_{i}^{Ion}$ represents the current ion, $X_{r1}^{Ion}$ and $X_{r2}^{Ion}$ denote the $r1{\text{th}}$ and $r2{\text{th}}$ ions, respectively, in which r1 and r2 are unlike.The difference expression $(X_{r1}^{Ion} - X_{best}^{Ion})$ is used to describe a portion of fusion process, the expression $(X_{r2}^{Ion} - X_{best}^{Ion})$ utilizes the difference to clarify another part’s information of fusion, and the final expression $(X_{r1}^{Ion} - X_{r2}^{Ion})$ means that ions defeat the Coulomb repelling force. The exponential coefficient seeks to accomplish an equilibrium between exploration and exploitation. $Pc_{i}$ stands for a probability value of nucleus’s fusion, $fit({X}_{i}^{Ion})$ is the fitness function value of ${X}_{i}^{Ion}$, and $rank(fit({X}_{i}^{Ion}))$ stands for the rank of ${X}_{i}^{Ion}$ in the population. On the other hand, when $rand \le Pc_{i}$, ions cannot defeat the Coulomb force and fail to be fused by a nuclear force. The Coulomb force may lessen the approach speed or repel the opposing motion if fusion does not occur. The mathematical formula is recommended as follows:
  $$\begin{aligned} {{X}_{i}^{Fu}= \left\{ \begin{array}{ll} X_{i}^{Ion} - 0.5 \cdot \Big (\sin (2 \Pi \cdot freq \cdot g + \pi ) \cdot \frac{G_{max} - g}{G_{max}} + 1 \Big ) \cdot (X_{r1}^{Ion} - X_{r2}^{Ion}), &{}\\ \text{if}\,{rand} > 0.5,\\ X_{i}^{Ion} - 0.5 \cdot \Big (\sin (2 \Pi \cdot freq \cdot g + \pi ) \cdot \frac{g}{G_{max}} + 1 \Big ) \cdot (X_{r1}^{Ion} - X_{r2}^{Ion}), &{}\\ \text{if}\,{rand} \le 0.5, \end{array} \right\} \text{if}\,rand \le Pc_{i}.} \end{aligned}$$
  (19)
  where freq denotes the sine function’s frequency, g represents the present generation number, $G_{max}$ is the permissible maximum generation number, $X_{r1}^{Ion}$ and $X_{r2}^{Ion}$ represent the $r1{\text{th}}$ and $r2{\text{th}}$ ions, respectively, with distinct indexes. In the first row of Eq. (19), the state where the Coulomb force might lower the approach speed used the non-adaptive sine adjustment to exploit the solution space and converge to the optimal solution. The case in which the two ions repulse and are far from each other to explore is in the second row of Eq. (19). The Levy flight distribution approach is applied to enhance the algorithm’s capability to avoid getting stuck into a local optimum in the fusion step. In case of the value of $X_{r1}^{Ion} = X_{r2}^{Ion}$ in the fusion step, then the Levy flight distribution approach should be utilized for avoiding a locally optimal solution as follows:
  $$\begin{aligned} {X}_{i}^{Fu}=X_{i}^{Ion} + \alpha \otimes Levy(\beta ) \otimes (X_{i}^{Ion} - X_{best}^{Ion}). \end{aligned}$$
  (20)
  The fission nucleus with the best fitness function value in the present generation should be saved as guiding information for the following process. While the fusion nucleus with the best fitness function value should be the globally acquired best solution. The individuals outside the search boundary are reformed using the boundary control approach.

Suggested classifiers

k-NN classifier

The k-NN [53, 54] is a pattern classification algorithm, which is used to predict whether new sample instances will belong to one or another class based on which class the cases closest to it belong to for making a decision [55]. The k-NN is a wrapper for generating classification rules from training samples. Then, by computing the distances amongst the new un-classified instance and its closest k-training neighbours, it tries to locate the cases in the training set most comparable to the new instances in the test set. Finally, depending on the training process, a novel instance is classified according to the most significant category likelihood.

However, while training k-NN, the option of k is fundamental and the sole factor to consider when categorizing a novel test set; therefore, it is picked after a series of trial and error runs. The k-NN classifier (k = five [56, 57]) with the Euclidean distance metric was utilized to assess the feature subsets in the literature experiments.

SVM classifier

The greatest margin hyper-planes in the space can be found using the SVM [58] to accurately classify training instances into different classes. SVM can analyze high-dimensional data with a fast training period and minimal computational resources, even with a few training examples.

SVM employs a margin maximization strategy to avoid assessing the distributions linked to the statistics of distinct classes in the hyper-dimensional space. It creates hyper-planes to produce resolution boundaries for linear or nonlinear classification. Since the classes cannot be divided along a straight line in the nonlinear classification, SVM makes the data linearly separable by using the so-called kernel function [59] as a scalar product. SVM is used in a variety of industries, including bioinformatics [60], face detection [61], image classification [62], and text categorization [63].

Proposed relief binary NRO based on DE (RBNRO-DE) for gene selection

As one of the most valuable uses of RNA-Seq gene expression data is disease classification, ML algorithms may be misled by the high dimensionality of data. Therefore, an enhanced version of NRO called RBNRO-DE, which indicates a Relief Binary NRO based on DE, is proposed to ignore irrelevant genes and identify the minor relevant genes’ subsets from the classification process.

The main characteristic of RBNRO-DE is that it achieves the best accuracy with the most minor subset of features. Two main phases constitute the proposed RBNRO-DE. Firstly, a pre-processing phase uses the Relief algorithm to identify the relevant features by computing a weight for every feature to describe its relationship and then ignoring the irrelevant features with the lowest weights. The second phase includes applying the binary NRO algorithm combined with the DE technique to determine the most relevant and non-redundant features. When solving large-scale problems, the NRO algorithm is susceptible to the local optimal trap. To prevent this, the DE technique is included in the NRO algorithm.

The stages required for the proposed RBNRO-DE to be able to handle the GS strategy include filtration, initialization, position improvement depending on the NRO algorithm, binary conversion, fitness estimation, and hybridization with DE. The following subsections describe these stages.

Filtration of features

As illustrated in subsection “Relief algorithm”, the Relief algorithm is used to pre-process the population by filtering the features and choosing the relevant features. The weight of each feature is first evaluated by Eq. (1), and then the weights are ordered from the largest to the smallest weights to determine relevance for the classification process. By concentrating only on the relevant features and minimizing the initial search space, the Relief algorithm supports the NRO algorithm to obtain better features faster.

Initialization of nuclei population

The suggested BRNO initiates by randomly producing a population of N nuclei. Each nucleus represents a potential solution within its restricted lower and upper limits, depicted by a D dimensions vector equal to the original dataset’s feature count. The randomly generated position of each nucleus is employed in this randomly initialized step, which is confined within the $[-1, 1]$ range at each variable of the position vector.

Improvement and adjustment of position

Positions are improved using equations linked to the NRO algorithm presented in Subsection “NRO algorithm”. These equations are repeated until a certain stopping condition is fulfilled. This paper’s acceptable stopping condition for suitably assessing the proposed algorithm’s quality is the maximum number of generations ${G}_{max}$.

Some nuclei may be outside the search space’s boundaries when optimizing the position utilizing the NRO algorithm. This paper offers a procedure for enhancing these worthless nuclei by adjusting them to an arbitrary position inside the permitted boundaries. By randomly varying the optimal position, this procedure will improve the exploitation of the NRO algorithm. This procedure can be expressed as follows:

$$\begin{aligned} {X}_{i,d}^{adjust}= \left\{ \begin{array}{ll} {X}_{i,d}, &{} \text{if}\,{X}_{d}^{LB}\le {X}_{i,d}\le {X}_{d}^{UB}\\ rand({X}_{d}^{LB},{X}_{d}^{UB}), &{} \text{if}\,{X}_{d}^{LB}>{X}_{i,d} \,\,\, or \,\,\, {X}_{i,d}>{X}_{d}^{UB}.\\ \end{array} \right. \end{aligned}$$

(21)

where ${X}_{i,d}^{adjust}$ refers to the proper product nucleus, ${X}_{i,d}$ is the value that surpasses the variable’s boundaries, ${X}_{d}^{LB}$ denotes the lower boundary of product nuclei, ${X}_{d}^{UB}$ denotes the upper boundary of product nuclei. An arbitrary value between ${X}_{d}^{LB}$ and ${X}_{d}^{UB}$ is returned through $rand({X}_{d}^{LB},{X}_{d}^{UB})$ with regular distribution.

Continuous to binary conversion

The nuclei positions are represented as continuous (real) values in the NRO. Therefore, they can’t be utilized directly for the GS binary problem. To fit in with the binary character of GS, a binary conversion strategy for transforming the continuous (real) values of the nucleus’ positions into binary values is required. At the same time, the original algorithm’s structure is preserved.

In the binary vector, the continuous (real) values of the relevant selected features are expressed by ones, whereas zeros express the continuous values of the irrelevant unselected features. The mathematical formulation to transform the continuous nucleus position ${X}_{i}^{g}$ to a binary position $({X}_{i}^{g})_{bin}$, at each generation g, is as follows:

$$\begin{aligned} ({X}_{i}^{g})_{bin}= \left\{ \begin{array}{ll} 1 &{} \text{if}\,{\textbf{X}}_{i}^{g} > \delta ,\\ 0 &{} \text{otherwise}. \end{array} \right. \end{aligned}$$

(22)

where $\delta$ represents a random threshold value within the range [0, 1]. This essential binary conversion strategy implies that if $({X}_{i}^{g})_{bin}$ is bigger than $\delta$. It changes from its continuous value to the binary “one” (selected features for the classification process). In contrast, the continuous value of $({X}_{i}^{g})_{bin}$ has been adjusted to the binary “zero” if it is less than delta. (unselected feature for the classification process).

Estimation of fitness function

Two clashing goals should be considered to estimate the goodness of a solution and reach the optimal solution: maximizing the accuracy of classification from the classifiers (k-NN and SVM classifiers) while searching for the shortest size of elected features, and this enhances the algorithm’s predictive capacity. The fitness function will be used to balance the size of selected features and the accuracy of (k-NN and SVM) classifiers since accuracy may be impaired if the size of selected features is reduced more than desired. To minimize the two goals, the fitness function will focus on reducing the error rate of classification instead of the accuracy, as follows:

$$\begin{aligned} {fit} = {w}_{1}\times {Err}_{rate}+{w}_{2}\times \frac{\vert {feat}_{elected} \vert }{\vert {D} \vert },\,\,\,\,\, w_{1} \in [0, 1], \,\, w_{2} = 1-w_{1}. \end{aligned}$$

(23)

where ${Err}_{rate}$ reflects the rate of classification error from the (k-NN and SVM) classifiers, ${feat}_{elected}$ signifies the selected features’ length, and D indicates the dataset’s overall feature count. The weight parameters $w_{1}$ and $w_{2}$ refer to the significance of classification accuracy and the length of the elected features, respectively. Based on the comprehensive trials executed in prior research [64, 65], $w_1$ is assigned to 0.99, and $w_2$ equals 0.01. Minimizing the error rate of classification ${Err}_{rate}$ (maximizing classification accuracy) is given more preference than shortening the length of the elected features ${feat}_{selected}$, which suggests that $w_1$ should be given more weight than $w_2$.

Embedding of the DE approach

One of the most influential and straightforward stochastic, population-based trial-and-error approaches for acquiring the preferable solution to complicated optimization problems is DE [25]. The DE approach requires few control parameters, is simple to learn and use, and can handle a variety of optimization problems while producing valuable results quickly and at a reduced computational cost. DE depends on three primary stages: mutation, crossover, and selection, as follows:

Mutation stage: It is also known as a differential mutation. With each iteration, this stage aims to create a mutated vector $\upsilon _i$ for each solution vector. To create the mutated vector $\upsilon _i$, three distinct nominee vectors ${X}_{r_{1}},{X}_{r_{2}}, {X}_{r_{3}}$ are randomly selected from the range [1, population size]. The difference between two of the nominee vectors ${X}_{r_{2}}, {X}_{r_{3}}$ is then estimated. The third nominee vector ${X}_{r_{1}}$ is then added to after this difference is multiplied by a mutation weighting factor ($W_M$) within the range [0, 1] [66]. The following is a mathematical representation of $\upsilon _i$:
$$\begin{aligned} \overrightarrow{\upsilon }_{i}=\overrightarrow{X}_{r_{1}} +W_M(\overrightarrow{X}_{r_{2}}-\overrightarrow{X}_{r_{3}}) \end{aligned}$$
(24)
Crossover stage: DE uses the crossover stage to enhance population diversity after the differential mutation stage. Combining values from the target vector $X_i$ and the mutated vector $\upsilon _i$ yields creating an offspring vector $u_i$. The binary crossover is characterized as the most popular and straightforward crossover search operator in DE, which is mathematically expressed as:
$$\begin{aligned} {u}_{i,d}= \left\{ \begin{array}{ll} \upsilon _{i,d}, &{} \text{if}\,{rand} \le C_R\,\,\,\,\, {or}\,\,\,\, \,d=j_{rand},\\ X_{i,d}, &{} \text{otherwise}. \end{array} \right. \end{aligned}$$
(25)
where $j_{rand}\in [1,\,2,\,\ldots , D_{X}]$ is a uniformly distributed arbitrary number to guarantee that the mutated vector includes at least one dimension. Crossover rate $C_R$ is employed to determine the likelihood of each element being crossed; it is often set to a high value ($C_R$ = 0.9). It is evident from Eq. (25) that $C_R$ and rand are compared. $u_i$ is derived from $\upsilon _i$ if the value of rand is less than or equal to the value of $C_R$. If not, $X_i$ is used to infer $u_i$.
Selection stage: Eventually, the selection stage is performed, as illustrated in Eq. (26), in which the target vector’s fitness function $fit(X_i)$ and the corresponding offspring vector’s fitness function $fit(u_i)$ are compared, and the fitness function with the lowest value is retained, and the best possible solution is ready for the next generation.
$$\begin{aligned} {X}_{i}= \left\{ \begin{array}{ll} u_{i}, &{} \text{if}\,fit(u_{i}) < fit(X_{i})\\ X_{i}, &{} \text{otherwise}. \end{array} \right. \end{aligned}$$
(26)
$u_{i}$ is set to $X_{i}$ if $fit(u_{i})$ yields a value that is smaller than $fit(X_{i})$. If not, the previous target vector $X_{i}$ remains in place.

After illustrating the main stages of DE, the pseudo-code for these stages is presented in Algorithm 1.

The exhaustive RBNRO-DE algorithm

Finally, after describing the steps of the suggested RBNRO-DE algorithm in the preceding Subsections to handle the GS strategy, Algorithm 2 provides the pseudo-code for the proposed RBNRO-DE algorithm. In addition, Fig. 3 includes a flowchart of the proposed RBNRO-DE algorithm to show its essential steps.

Experimental results and discussion

The experimental results for the proposed RBNRO-DE algorithm and its peers are presented in this section. The models are evaluated using training and testing datasets. The final findings are derived using the evaluation metrics’ average value. The datasets used to verify the efficacy of the proposed model are described in Subsection “Dataset description”, the parameters that are utilized in working environments are presented in Subsection “Parameters setting”, the evaluation criteria are shown in Subsection “Evaluation criteria”, and experimental results analysis is explained in Comparison results of the proposed RBNRO-DE against popular ML classifiers.

Dataset description

Extensive experiment techniques and other wrapper algorithms are conducted on 22 gene expression datasets. The data used is the normalized-level3 RNA-Seq gene expression data of 22 tumor types in Broad Institute. It is publicly found and obtainable in [67]. We followed the whole process applied in paper [3] and noticed the difference between the data used in the mentioned paper from GitHub and the numbers written in the mentioned paper, which is copied from the site. The data from the site was a mixture of tumor and normal samples, while it was used as a tumor as a whole in the mentioned paper. Therefore, we investigated the data closely. First of all, the site contains different forms of the same data that we chose to work on; we explored the data and found these challenges:

Some Genes are named with ID but without symbol.
Some Genes are not found in the annotation file.
Samples are mixed up between normal and tumor and other staff.

As a result, we needed some pre-processing to separate and identify samples to get normal samples versus tumor samples that could be used in binary classification and to facilitate the process of FS. We faced the mentioned challenges as follows:

We searched the annotation file for the found ID and got the gene symbol.
We have compared with the annotation file, so more than 100 genes are removed.
depending on the samples report, we separated each row depending on the sample type in an Excel sheet for binary classification.

Furthermore, the Relief algorithm, described in subsection “Relief algorithm”, is employed for pre-processing by computing the weight for each feature in the dataset, and the weights are then sorted from biggest to smallest. Finally, the features with small weights are eliminated. After applying the Relief algorithm on the 22 gene expression datasets, we found that just 500 features had the largest weights. For that, the remaining irrelevant features with small weights were ignored, and these 500 relevant features were only chosen for use in the FS process. The Relief algorithm can eliminate features that are irrelevant to classification.

After pre-processing, the resulting file became clean enough for use in the FS process. Still, unlike paper [3], which provided multi-classification of all cancer types, we worked on each type separately to be more specific. Table 1 shows a detailed list of all 22 tumour types and the corresponding number of samples.

Table 1 Description of the datasets used in this study

Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data

Abstract

Introduction

Motivation and contributions

Structure

Related work

Background details

Relief algorithm

NRO algorithm

Base processes of NRO algorithm

Suggested classifiers

k-NN classifier

SVM classifier

Proposed relief binary NRO based on DE (RBNRO-DE) for gene selection

Filtration of features

Initialization of nuclei population

Improvement and adjustment of position

Continuous to binary conversion

Estimation of fitness function

Embedding of the DE approach

The exhaustive RBNRO-DE algorithm

Experimental results and discussion

Dataset description

Parameters setting

Evaluation criteria

Comparison results of the proposed RBNRO-DE against popular ML classifiers

Comparison results of different versions of the proposed RBNRO-DE

Comparison results of the proposed RBNRO-DE against other state-of-the-art meta-heuristic algorithms

Comparisons based on the suggested k-NN classifier

Comparisons based on the suggested SVM classifier

Convergence analysis

Wilcoxon’s rank-sum test

Computational complexity of the RBNRO-DE and other state-of-the-art meta-heuristic algorithms

Time computational complexity of the RBNRO-DE algorithm

Space computational complexity of the RBNRO-DE algorithm

Comparison results between the RBNRO-DE and other state-of-the-art meta-heuristic algorithms based on the computational complexity

Comparison results of the proposed RBNRO-DE versus various recent algorithms from the published literature

Comparisons based on the suggested k-NN classifier

Comparisons based on the suggested SVM classifier

Comparison results of the proposed RBNRO-DE versus different filter and embedded methods

Comparisons based on the suggested k-NN classifier

Comparisons based on the suggested SVM classifier

Discussion

Conclusion and future work

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords