 Research
 Open access
 Published:
An adaptive hybrid african vulturesaquila optimizer with XgbTree algorithm for fake news detection
Journal of Big Data volume 11, Article number: 41 (2024)
Abstract
Online platforms and social networking have increased in the contemporary years. They are now a major news source worldwide, leading to the online proliferation of Fake News (FNs). These FNs are alarming because they fundamentally reshape public opinion, which may cause customers to leave these online platforms, threatening the reputations of several organizations and industries. This rapid dissemination of FNs makes it imperative for automated systems to detect them, encouraging many researchers to propose various systems to classify news articles and detect FNs automatically. In this paper, a Fake News Detection (FND) methodology is presented based on an effective IBAVOAO algorithm, which stands for hybridization of African Vultures Optimization (AVO) and Aquila Optimization (AO) algorithms, with an extreme gradient boosting Tree (XgbTree) classifier. The suggested methodology involves three main phases: Initially, the unstructured FNs dataset is analyzed, and the essential features are extracted by tokenizing, encoding, and padding the input news words into a sequence of integers utilizing the GLOVE approach. Then, the extracted features are filtered using the effective Relief algorithm to select only the appropriate ones. Finally, the recovered features are used to classify the news items using the suggested IBAVOAO algorithm based on the XgbTree classifier. Hence, the suggested methodology is distinguished from prior models in that it performs automatic data preprocessing, optimization, and classification tasks. The proposed methodology is carried out on the ISOTFNs dataset, containing more than 44 thousand multiple news articles divided into truthful and fake. We validated the proposed methodology’s reliability by examining numerous evaluation metrics involving accuracy, fitness values, the number of selected features, Kappa, Precision, Recall, F1score, Specificity, Sensitivity, ROC_AUC, and MCC. Then, the proposed methodology is compared against the most common metaheuristic optimization algorithms utilizing the ISOTFNs. The experimental results reveal that the suggested methodology achieved optimal classification accuracy and F1score and successfully categorized more than 92.5% of news articles compared to its peers. This study will assist researchers in expanding their understanding of metaheuristic optimization algorithms applications for FND.
Graphical Abstract
Introduction
FND techniques have been getting extra attention since the circulation of disinformation has increased on the Internet, which has become a concern of the modern community [1]. Generally, the concept of FNs has been around for a while. This problem existed before the growth of the Internet. Many publishers utilize misinformation to promote their interests [2]. Many publishers publish FNs through convenient print media news and online platforms. Online platforms play an essential role in disseminating FNs in the community; these online platforms, such as online newspapers and social media, provide users access to various publications in one session to provide greater ease and speed than printed news media. In addition, the nature of social networks suggests an accessible platform for the fast dissemination of information in realtime; even with the reliability of this information, it has caused severe information credibility problems [3].
Not only do FNs negatively affect individuals, but they devastate the community as a whole over time. For example, FNs went viral on Facebook in the US 2016 presidential election instead of the more popular and trusted traditional news sources [4], revealing that readers may pay more attention to FNs than truthful news. Social media users who participate in spreading disinformation can have many motivations for spreading such information online, such as manipulation, political agendas, and influence. Still, while many of these users are genuine, those spreading disinformation may or may not be genuine users [5]. Because social media profiles are inexpensive and uncomplicated, many people have created social media profiles for malicious tasks. If a computer algorithm manages social media profiles, it will be used as a social bot [4]. These social bots can interact with individuals via social media and automatically produce and publish content online, making it significantly challenging for individuals to recognize such manipulated content [6].
Therefore, it isn’t easy to validate online content using manual methods because, in recent years, a large amount of online content has been created and published online. Moreover, many researchers emphasized that automated and computerized FND methods should be used and are necessary [7]. FND systems have generally been divided into “news content” and “social context” classes using their information sources. The first class is “news content” techniques, which attempt to validate news content and utilize attributes such as body text, title, and more metadata to recognize FNs. These techniques are called “contentdriven” techniques [8]. The Second category is social context techniques that focus on social attributes such as users’ interactions in social media with specific news (liking it or sharing it on Facebook, retweeting it on Twitter). These techniques are referred to as “socialdriven” techniques [8].
Deep Learning (DL) and machine learning methods have been employed in different regions [9,10,11] and have recently been used to tackle FND issues effectively and efficiently [12]. The leading cause for effective outcomes utilizing DL techniques is the large data volume and high dimensionality of the data for FNs. Today’s scenario is a fast and largescale growth of social media, and people are using social media to view the latest updates. Thus, social media platforms such as WhatsApp, Twitter, Facebook, and YouTube struggle to detect FNs from many user posts. There is a potential danger of publishing and disseminating such FNs via social media Platforms [13]. Many challenges must be considered when working in these areas, including selecting the most appropriate attributes, highdimensionality data, heterogeneity, and choosing the most appropriate DL technique [14].
[15] proposed a DL method based on an automated detector via a threelevel hierarchical focus network for fast and accurate FND. [16] proposed deep Convolutional Neural Networks (CNNs) for detecting FNs. [14] presented a learning model based on linguistic features to detect FNs. [17] presented a method for FND using a hybrid neural network structure, integrating the power of Long ShortTerm Memory (LSTM) and CNNs. [13] presented several attributesoriented methods for the automated detection of FNs on social media employing DL. [18] presented three DLbased models intended to classify and detect FNs. [19] presented a method for FND employing a geometric DL. [20] introduced a neural network method to accurately forecast the stance between a given pair of headlines and the text of the article. [21] introduced several methods for FND based on the relationship between the headlines and the body of the articles. Their methods are primarily based on BidirectionalLSTM, CNN, and LSTM.
Due to their effective performance in addressing many optimization problems, metaheuristic algorithms have attracted much attention recently. Therefore, MHA is an efficient solutionfinding method to detect FNs on social media. [22] introduced the issue of detecting FNs as an optimization problem. This study proposes two metaheuristic algorithms, Grey Wolf Optimization (GWO) and Salp Swarm Optimization (SSO), for tackling the FND issue. The proposed FND approach is initialized through a preprocessing phase and then utilizes GWO and SSO to handle the FND issue. The suggested approach was verified utilizing three realworld FNs datasets. The experimental outcomes show that the GWO optimization algorithm achieved optimal results in different performance metrics than the SSO optimization algorithm and other metaheuristic algorithms. [23] improved their study by proposing a new method that integrated MHA and text mining to discover FNs via online social media. Modified variants of GWO and SWO optimization algorithms based on nonlinear decreasing coefficient and oscillating inertia weight are used for the FND issue. The evaluation measures of the suggested approaches are verified on different datasets. The empirical outcomes revealed that the proposed new approaches exceeded other approaches in realworld FNs datasets. [24] introduced a new method for identifying FNs articles using the WOAXgbTree technique and contentdriven attributes. The suggested model can be implemented in several scenarios for classifying news articles. The proposed model has two phases: first, the necessary attributes are identified and investigated. Then, the XgbTree optimizer tuned by the Whale Optimization Algorithm (WOA) classifies the news articles using the specified attributes. In their empirical results, They considered F1score and classification accuracy as the basis of their investigations. Then, they compared the results of their proposed system to various modern classification techniques using a dataset that has collected more than 40,000 news articles recently. The empirical outcomes reveal that the suggested system obtained a reliable F1score rate and efficiently classified more than 91 percent of the articles.
Motivations
This paper presents a framework relying on the IBAVOAO algorithm to tackle the issue of FND. The proposed IBAVOAO is a hybrid AVOAO optimizer with an XgbTree classifier. The primary stages of the suggested methodology are as follows: Firstly, the collected unstructured data is converted into structured data for usage in the classification process, known as data preprocessing. In this stage, beneficial features are extracted by removing superfluous words and unnecessary special symbols, stemming from altering words into root words, tokenizing the resulting data into a bag of words, and finally encoding and padding words into sequence vectors of numerical values using Global Vectors (GLOVE) [25, 26], which is a countbased approach for pretraining and relies on terms or vectors from cooccurrence data. After that, the extracted features are filtered using an efficient Relief algorithm to determine only the associated features and provide the final classification dataset. Using the Relief algorithm aims to enhance the ability to explore the best outcomes discovered inside the solution space. In the final stage, the classification process utilizes the IBAVOAO algorithm based on the XgbTree classifier with high detection performance. The effectiveness of the suggested methodology is assessed by employing a variety of evaluation metrics and applying them to the ISOTFNs dataset that includes more than 44 thousand news articles. After the suggested methodology has been evaluated and compared with stateoftheart optimization techniques [27, 28], the results indicate that the presented methodology produces high classification accuracy. It is advised to use it in the FND problem.
Contributions
This paper offers an FND methodology based on the IBAVOAO algorithm with XgbTree classifier; its fundamental contributions can be clarified in the following points:

Preprocess the FNs data to extract the necessary features.

For improving and reducing the initial search space exploration capacity and enhancing the acquired optimal outcomes, the proposed IBAVOAO algorithm embeds a Relief algorithm with the hybridization of AVO and AO algorithms. This embedding enhances the algorithm’s performance by producing a new population that maintains the fundamental structure but has more appropriate positions.

Filter and determine only the most appropriate features for predictive modeling using the Relief algorithm.

Classify the news items utilizing the IBAVOAO with the XgbTree classifier.

Assess the proposed methodology against stateoftheart optimization algorithms using a variety of evaluation metrics involving accuracy, fitness values, the number of selected features, Kappa, Precision, Recall, F1score, Specificity, Sensitivity, ROC_AUC, and MCC toward the ISOTFNs dataset.

The proposed methodology outcomes achieve high classification accuracy and a positive impact compared to its peers.
Structure
The remainder of the paper is formulated as follows. Section “Literature review” presents the related work and literature review. The proposed methodology and its components for FND are presented in Sect. “The proposed IBAVOAO algorithm for FND”. Section “Experimental results and analysis” shows the numerical results and comparisons. Finally, conclusions and future work are drawn in Sect. “Conclusion and future work”.
Literature review
The primary purpose of factchecking is to use new technologies to recognize unreliable and manipulated news content on the Internet. It is an attractive major topic within specific streams of information and library science [29]. As a result, many researchers are trying to address the issues of FNs in different areas, especially online news. This section will survey the various methods utilized to discover FNs on online platforms, and we will briefly mention their results and advantages.
[30] has proposed a new model for detecting real and fake stories. They used linguistic attributes like special characters, emoticon symbols, negative/positive words, and hashtags to categorize news stories. [31] suggests a system for detecting information sequences in a Twitter OP. Within their work, patterns analysis methods were implemented, allowing them to classify original and FNs. [32] proposed a graph kernelbased Support vector Machine (SVM) method that learns highorder distribution patterns to detect FNs. [33] proposed a novel model that uses a Recurrent Neural Network (RNN) to identify FN articles utilizing linguistic attributes accessed from a collection of user comments.
[34] introduced a novel system to identify authentic news articles. They utilized inevitable connections among conversation parts to identify trustworthy news stories. In another system, [35] analyzed the same user features on the social platform Sina Weibo, the most popular Chinese microblogging site. [36] proposed an approach to identify sarcastic tweets and product reviews automatically. They have used generic attributes based on baseline and lexicon features. The features such as character ngrams, word ngrams, and word skipgrams are extracted and integrated with lexicon properties. Then, they categorized these features utilizing different methods, such as ensemble classifiers, Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF).
[37] studied the value of various attributes in identifying and categorizing sarcastic and ironic reviews on different types of products. Firstly, they elicited attributes utilizing lexiconbased features and BagofWords. Then, they used these elicited attributes on various Machine Learning (ML) classifiers such as LR, SVM, DT, and RF. [38] suggested a novel system to recognize truthful news. They integrated the different user, linguistic, structural, and temporal features to categorize FNs.
[39] proposed a method to address the problem of FNs based on DL. The suggested method contains three stages: text encoding, feature extraction, and classification. The text encoding phase is performed on the entered news words utilizing GLOVE to represent the words. The encoded words at a given word length are then included to be enrolled in the suggested DL methods. The suggested DL methods include both automated feature extraction and classification capabilities. Moreover, this search presents four DL methods containing CNNs and Concatenated CNNs, Gated Recurrent Units, and LSTM to obtain an optimum method before the problem of FNs exceeds previous studies. The suggested DL methods are implemented on FNC and FNs datasets supported by Kaggle. The proposed Concatenated CNNs method achieved a classification accuracy of 99.6% and trained faster than others.
Table 1 presents the stateoftheart papers implemented in FND, including the dataset, model description, limitations, advantages, and outcomes. According to the outcomes of Table 1, there are many issues still open in this area, which can be summarized in the following points:

The limited suitable qualitylabeled benchmark datasets.

Little studies have been implemented on regional languages.

Lack of a comprehensive standardized data set on FNs.

Very few studies have been done on detecting FNs as a multicategory classification problem.

The DL algorithms have poor reusability and transfer learning capability.

The accuracy of classification is not effective in many studies.

The performance of many of these studies in detecting FNs is still insufficient.
These issues encouraged us to introduce a novel approach to classifying news articles automatically utilizing contentbased attributes and useful linguistics and the proposed hybrid algorithm based on AVO and AO with the XgbTree algorithm for detecting FNs with high performance.
The proposed IBAVOAO algorithm for FND
Data preprocessing
During this stage, the unstructured data gathered from the suggested FNs dataset (see Sect. “Dataset description”) is transformed into structured data for classification. The two primary methods of this stage are to extract features from the FNs dataset, then filter the consequent features and elect only the pertinent ones. These methods are covered in the subsequent subsections.
Feature extraction
This method is carried out as follows: first, extraneous words and special symbols that are not required, such as digits, stopwords, words with only one letter, commas, hashtags, punctuation marks, etc., are eliminated from the unstructured data for an FNs dataset. Then, the necessary words are adjusted into root words, called stemming, by ignoring suffixes, affixes, inserts, and a mix of starting and ending on derived words. After correcting misspellings or abbreviations, the remaining words are changed to lowercase to handle a unified form.
The resultant data are tokenized by segregating them into a bag of words (small tokens) to obtain the words that have value in the created matrix for employment in the classification process. Additionally, the data is encoded into sequence vectors using the GLOVE method [25] for word representation, which turns tokens into a sequence of integers. Since the encoded data used are of varying lengths, each sequence vector is padded to ensure that all sequences have identical lengths; this is done by padding zero at the start of each sequence until each one equals the maximum length specified for each padded vector, which is set to 1000. Labels are also encoded, with positive labels encoded as one and negative labels encoded as zero.
Feature filtration
The sheer number of features is considered one of the significant challenges in the data preprocessing stage. Processing time and computing effort are often increased while dealing with these numerous features. Also, it could hurt classification performance. Thus, there is a need to introduce an effective method for filtering the features and picking appropriate ones. This paper suggests a straightforward and quick filtering method called the Relief algorithm [48, 49], which is suggested to identify related features.
This method focuses only on pertinent features and reduces the initial search space by locating features with comparable values for identical close samples and significant for the difference between dissimilar samples. According to the features’ weighted ranking, the algorithm works as follows: First, it distinguishes between NearHit samples related to congruent class samples and NearMiss samples related to mismatched class samples. The weight of the feature is then evaluated based on the NearHit and NearMiss values to assess the suitability of the classification process. The features are then ranked according to their weights from the biggest to the least. The following equation can be employed to evaluate the feature weight \(W_A\):
where \(\mathcal {W}_A\) denotes the weight of the feature, N is the sample number, and \({X}_{A}^{j}\) means the feature value A of data \({X}^{j}\). \(NH({X}^{j})\) and \(NM({X}^{j})\) indicate the nearest data points to \({X}^{j}\) related to the identical and the distinct class, respectively. The Relief algorithm holds significance in feature filtration and is suitable for this problem due to the following key reasons:

Robustness with noisy data: The Relief algorithm is known for its robustness in handling noisy data. If the presented datasets contain noise or outliers, Relief can perform well despite these challenges. It evaluates feature importance by considering the proximity between instances, which helps mitigate the impact of noise or irrelevant features.

Capability to identify relevant features: The Relief algorithm is designed to identify the most relevant features by considering their contribution.

Balancing feature relevance: The Relief algorithm considers feature relevance and redundancy factors. It helps identify a subset of features that contribute most substantially to the model’s performance by differentiating between those that may be redundant and those most relevant to the target variable.

BiasFree feature selection: the Relief algorithm is less prone to exhibit bias in feature selection since it does not make any assumptions about any specific data distribution. This feature guarantees a more impartial assessment of feature significance.

Efficiency in handling highdimensional data: The Relief algorithm performs well and is appropriate for datasets with substantial features, handling highdimensional data efficiently. Compared to certain feature selection methods, Relief could be preferred as it tends to perform well without suffering from the curse of dimensionality.

Handling diverse data types: The Relief algorithm is adaptable to a wide range of dataset types since it can handle continuous and categorical features.

Simple implementation and interpretability: The Relief algorithm is comparatively simple and easy to implement and comprehend, and its outcomes are frequently interpretable. Its usefulness is improved by its simplicity, particularly when the interpretability of feature selection is important.

Previous success or familiarity: The choice of the Relief algorithm might also stem from its success or prevalence in similar studies or datasets. It has been successfully applied in various problem domains, including healthcare, finance, and bioinformatics.
Overall, the Relief algorithm’s significance lies in its ability to efficiently and rapidly filter out unnecessary or redundant features, leading to improved model performance and interpretability even on noisy and varied datasets.
Position improvement via proposed hybrid AVOAO optimization algorithm
The proposed AVO algorithm
An efficient natureinspired metaheuristic optimization algorithm termed AVO algorithm [50] is presented in this paper for modeling and imitating the natural behaviors of vultures in Africa concerning living and nutrition behavior. This algorithm is set up dependent on basic conceptions related to vultures, as follows: Initially, the AVO algorithm assumes that the population size of African vultures consists of N vultures, which vary depending on the problem being tackled. After that, the fitness function value is computed for all solutions of the African vultures’ initial population, allowing the vultures’ population to be tangibly split into three sets; the first set comprises the best solution, which is a vulture that is stronger than all other vultures, the second set contains the secondbest solution, which is the weaker vulture than the first set, and the last set has the remaining weakest African vultures. These three sets can formulate the most significant natural function of vultures. Each set has a unique incapability to obtain and consume food. Further, the advantages and disadvantages of vultures may be reflected in the fitness function’s value of the solution. As a result, the two best solutions characterize the best and strongest vultures, whereas the worst solution represents the weakest and most starved vultures. In general, the vultures attempt to retain a safe range from the worst while attempting to get close to the best vultures.
According to the conceptions mentioned above, the proposed AVO algorithm can be formulated into four essential steps to model the behavior of various vultures. These steps are depicted in the next few subsections.
Population splitting step
This step aims to divide the initial population into sets by evaluating the fitness function of their solutions. The first set includes the best solution as the first set’s best vulture and the secondbest solution is chosen as the second set’s best vulture. The residual solutions are in the final set. The population should be reformulated for each iteration because the solutions always try to come as near as possible to the best and secondbest solutions, as follows:
where \(BestVulture_{1}^{g}\) and \(SecondBestVulture_{2}^{g}\) denote the first set’s best vulture and the secondbest vulture in the second set at the \(g^{th}\) iteration, respectively. \({pr}^{g}\) is the likelihood of choosing the most suitable solution for each set at the \(g^{th}\) iteration, which is defined using the Roulette wheel procedure illustrated in Eq. (3). \(L_1\) and \(L_2\) are two random parameters within the range [0, 1].
Vultures’ starvation level step
In this step, the amount of starvation is measured for the vultures, which can also be used mathematically to model the processes of exploration, exploitation, and transformation among them. The vultures can fly farther and have more capacity to look for food when they are not starving. On the other hand, Vultures cannot fly long distances for food and might turn hostile when starving. The starvation level (\(F_{i}^{g}\)) of the \(i^{th}\) vulture at the \(g^{th}\) iteration can be expressed as follows:
The variable \(F_{i}^{g}\) indicates the vulture’s transition from exploration to exploitation, which implies that the vultures are full. A rand is an arbitrary number between 0 and 1, z implies an arbitrary value within the range \([1, 1]\), \(g_{i}\) signifies the present iteration’s number, and \(G_{max}\) indicates the maximum iteration’s number. The t value is computed by Eq. (5) to improve the effectiveness in tackling complex optimization problems and avoid falling into a local optimum. h is an arbitrary value within the range \([2, 2]\). The predetermined constant parameter w determines the likelihood of performing the exploration process; the likelihood of exploration increases as its value rises. As its value drops, the likelihood of exploration decreases.
According to Eq. (4), the value of \(F_{i}^{g}\) progressively reduces with the increasing number of iterations. Therefore, the next step can be defined in the proposed AVO algorithm as follows:
AVO’s exploration step
During the exploration step, the vultures are distinguished by their high capacity and optical ability to seek suitable food. Vultures are compelled to fly long distances for extended periods and inspect various random sites for food. Hence, the exploration step utilizes two distinct techniques. A predefined parameter \(P_1\) and a random value \(rand_{P_1}\) are employed to pick one of these techniques with values in the range [0, 1]. Notice that the starvation level \(F_{i}^{g}\) in the exploration step is more major than or equal to 1. The exploration techniques can be explained as follows:
where \({X}_{i}^{g+1}\) is the vulture’s next updated position at the next \((g+1)^{th}\) iteration, \(R^{g}\) is the chosen best vulture in the present iteration g, which is specified through Eq. (2), \(D_{i}^{g}\) is calculated using Eq. (8), \(F_{i}^{g}\) is the starvation level of the \(i^{th}\) vulture at the \(g^{th}\) iteration, estimated by Eq. (4). rand is a random value amidst zero and one; to keep food safe from other vultures and to provide a high arbitrary coefficient at the search environment scale, the vultures move randomly. UB indicates the variables’ upper limit, LB presents the variables’ lower limit, and \({X}_{i}^{g}\) is the present position at the \(g^{th}\) iteration.
AVO’s exploitation step
In the AVO’s exploitation step, the value of \(F_{i}^{g}\) is smaller than 1. The exploitation step consists of two internal substeps, where the effectiveness of the proposed AVO algorithm is assessed. Each of these substeps has two distinct techniques. Two predetermined parameters with values between 0 and 1 are utilized to specify the appropriate technique in each internal substep: \(P_2\) for the first substep and \(P_3\) for the second substep. The following is an explication of these two internal substeps.

1.
First exploitation substep: This substep is executed when the \(F_{i}^{g}\) value is smaller than 1 and greater than or equal to 0.5, which utilizes two distinct techniques. A predefined parameter \(P_2\) and a random value \(rand_{P_2}\), with values ranging from [0, 1], are employed to decide which of these two techniques is selected.
The first technique of this substep is known as siegefight, in which the vultures have enough power and are moderately satiated. Because vultures gather around one specific food source, the stronger and healthful vultures attempt not to exchange food with others. In contrast, the more powerless vultures attempt to steal food from the healthful vultures by swarming close to them and starting little fights. On the other hand, the second technique is referred to as rotationalflight; it models and forms a spiral motion between one of the best vultures and the remaining. The techniques of the first exploitation substep can be illustrated as follows:
$$\begin{aligned} {X}_{i}^{g+1}= & {} \left\{ \begin{array}{ll} D_{i}^{g} \times (F_{i}^{g} + rand)  d_{t}^{g}, &{} {{if}}\,\,rand_{P_2}\le P_2,\\ R^{g}  (S_{1}^{g} + S_{2}^{g}), &{} {{if}}\,\,rand_{P_2}>P_2,\\ \end{array} \right. \,\,\,{{if}}\,\, 1 > F_{i}^{g} \ge 0.5, \end{aligned}$$(9)$$\begin{aligned} d_{t}^{g}= & {} R^{g}  {X}_{i}^{g}, \end{aligned}$$(10)$$\begin{aligned} S_{1}^{g}= & {} R^{g} \times \left( \frac{rand \times {X}_{i}^{g}}{2 \pi } \right) \times \cos ({X}_{i}^{g}), \end{aligned}$$(11)$$\begin{aligned} S_{2}^{g}= & {} R^{g} \times \left( \frac{rand \times {X}_{i}^{g}}{2 \pi } \right) \times \sin ({X}_{i}^{g}). \end{aligned}$$(12)where \({X}_{i}^{g+1}\) denotes the vulture’s next updated position at the following \((g+1)^{th}\) iteration, \(D_{i}^{g}\) is derived using Eq. (8), \(F_{i}^{g}\) indicates the degree of starvation for the \(i^{th}\) vulture at the \(g^{th}\) iteration as determined by Eq. (4), and rand is a random value between 0 and 1 to provide a high arbitrary coefficient. \(d_{t}^{g}\) is the distance between the vulture and one of the best two vultures, which is estimated by Eq. (10), \(R^{g}\) means the preferred best vulture in the present \(g^{th}\) iteration, which is set via Eq. (2), \(S_{1}^{g}\) and \(S_{2}^{g}\) are estimated utilizing Eq. (11) and (12) respectively, and \({X}_{i}^{g}\) represents the present position at the \(g^{th}\) iteration.

2.
Second exploitation substep: when the value of \(F_{i}^{g}\) is less than 0.5, this substep is implemented, in which numerous sieges and violent fights are performed among diverse species of vultures that have congregated around the food source. Two various techniques are used in this substep. To determine which of these two techniques to select, a predetermined parameter \(P_3\) and a random value \(rand_{P_3}\), with values ranging from 0 to 1, are created.
Congregate vultures around the food source is the name of the first technique of this substep, as diverse species of vultures are hungry so that they may attract and compete near a single food supply. Furthermore, the second technique is termed an aggressive siegefight. The vultures become more offensive and attempt to scavenge the remaining food from the healthful vultures by flocking toward them in various directions. In contrast, the healthful vultures weaken and lose the power to resist the other vultures. The techniques of the second exploitation substep can be depicted as follows:
$$\begin{aligned} {X}_{i}^{g+1}= & {} \left\{ \begin{array}{ll} \frac{A_1^{g} + A_2^{g}}{2}, &{} {{if}}\,\,rand_{P_3}\le P_3,\\ R^{g}  d_{t}^{g} \times F_{i}^{g} \times Levy_d, &{} {{if}}\,\,rand_{P_3}>P_3,\\ \end{array} \right. \,\,\,{{if}}\,\, F_{i}^{g} < 0.5, \end{aligned}$$(13)$$\begin{aligned} A_{1}^{g}= & {} BestVulture_{1}^{g}  \frac{BestVulture_{1}^{g} \times {X}_{i}^{g}}{BestVulture_{1}^{g}  ({X}_{i}^{g})^2} \times F_{i}^{g}, \end{aligned}$$(14)$$\begin{aligned} A_{2}^{g}= & {} SecondBestVulture_{2}^{g}  \frac{SecondBestVulture_{2}^{g} \times {X}_{i}^{g}}{SecondBestVulture_{2}^{g}  ({X}_{i}^{g})^2} \times F_{i}^{g}, \end{aligned}$$(15)$$\begin{aligned} Levy_d= & {} {0.01} \times \frac{\mu \times \sigma }{\nu ^{\frac{1}{\beta }}}, \,\,\,\,\,\,\,\,\,\sigma = \left( \frac{\Gamma (1+\beta )\times \sin \left( \frac{\Pi \beta }{2}\right) }{\Gamma ({1+\beta 2})\times \beta \times 2^{\left( \frac{\beta 1}{2}\right) }}\right) ^ \frac{1}{\beta }. \end{aligned}$$(16)where \({X}_{i}^{g+1}\) signifies the vulture’s next updated position at the following \((g+1)^{th}\) iteration, which reflects the congregation of vultures. \(A_{1}^{g}\) and \(A_{2}^{g}\) are assessed by using Eq. (14) and (15) respectively, \(R^{g}\) means the choice best vulture at the present \(g^{th}\) iteration, which is defined via Eq. (2), \(d_{t}^{g}\) stands for the distance between the vulture and one of the best two vultures, which is estimated by Eq. (10), \(F_{i}^{g}\) indicates the degree of starvation for the \(i^{th}\) vulture at the \(g^{th}\) iteration that computed by Eq. (4), and \(Levy_d\) is the function of levy flight distribution acquired by Eq. (16) to improve the efficiency of the AVO algorithm. The best vulture in the first set and the secondbest vulture in the second set at the present \(g^{th}\) iteration are denoted by \(BestVulture_{1}^{g}\) and \(SecondBestVulture_{2}^{g}\) respectively, while the present position at the \(g^{th}\) iteration is represented by \({X}_{i}^{g}\). d is the dimensional space, \(\mu\) and \(\nu\) are arbitrary values evenly distributed throughout the range [0, 1], and \(\sigma\) is specified by Eq. (16), where \(\beta = 1.5\) is a constant number.
Pseudocode of the proposed AVO algorithm
Following clarifying the critical steps of the suggested AVO algorithm illustrated above and presenting the techniques that are recommended for mimicking the natural behaviors of African vultures in living and feeding, the pseudocode defining the proposed AVO algorithm is provided in Algorithm 1. Moreover, a flowchart of the AVO algorithm is shown in Fig. 1 to highlight its main steps.
The proposed AO algorithm
In this subsection, the preyhunting behavior of Aquila is simulated by introducing an efficacious metaheuristic optimization algorithm inspired by nature, dubbed the AO algorithm [51]. Due to its bravery, agility, and speed, depending on steady feet and sharp talons when hunting various animals, including badgers, squirrels, and rabbits, the Aquila is the most famous prey bird [52]. Aquila relies on four hunting techniques, which can be summed up mathematically in two crucial steps: exploration and exploitation. The appropriate step can be chosen from these two steps in the proposed AO algorithm according to the present iteration’s number, \(g_{i}\), and the maximum iteration’s number, \(G_{max}\), as follows:
The following subsections illustrate these steps of the proposed AO algorithm.
AO’s exploration step
The exploration step includes two distinct techniques. Extensive exploration is the name of the first technique, where the Aquila flies far above the land in search of suitable prey. The Aquila begins a long, lowangled glide with growing speed as it approaches the optimal region for prey. It then extends its wings and tail and lowers vertically toward the prey. On the other hand, the second technique is restricted exploration, in which the Aquila carefully inspects the chosen area of the prey from a high altitude, whether the prey is in flight or a running state. The Aquila then spirals around the chosen prey and goes up low off the ground to get ready to catch the prey. A random value rand, ranging from [0, 1], determines which of these two techniques to pick.
For improving the exploration’s efficiency, the exploration step can be mathematically stated when \(g_{i}\) is smaller than or equal to \(\left( \frac{2}{3}\right) \cdot G_{max}\). The techniques mentioned above of the AO’s exploration step can be represented as follows:
where \({X}_{i}^{g+1}\) denotes the aquila’s next updated position at the subsequent \((g+1)^{th}\) iteration, \(X_{Best}^{g}\) indicates the present best position found during searching at \(g^{th}\) iteration. \(g_{i}\) means the current iteration, while \(G_{max}\) means the maximum allowed iterations’ number, the phrase \(\left( 1  \frac{g_{i}}{G_{max}}\right)\) is used to dominate the extended exploration throughout the set of iterations. \({X}_{Mean}^{g}\) is the mean value of the present positions at \(g^{th}\) iteration, which is evaluated through Eq. (19). The number of permitted positions is N, and d is the problem’s dimension size. \(Levy_d\) is the levy flight distribution function, emanated using Eq. (16). \({X}_{\tau }^{g}\) is a randomly chosen Aquila’s position. The twisting form in the search is represented by \(y^{g}\) and \(\zeta ^{g}\), which are evaluated using Eq. (20) and (21), respectively. \({\mathfrak {r}}_{1}\) indicates the number of search rotations ranges from 1 to 20, and \(U = 0.00565\). \({\mathfrak {D}}_{1}\) is integer numbers from 1 to d, and \(\omega = 0.005\).
AO’s exploitation step
Two diverse techniques are used in the exploitation step. The first technique is dubbed extensive exploitation. In this technique, the Aquila lands on the ground after exactly locating and exploiting the prey region and slowly approaches it for catching. This technique suits slowmoving prey or prey that lacks an escape response. Restricted exploitation is the name of the second technique used in the exploitation step, in which the Aquila moves on the ground as it nears and attacks its prey at the last location by following its random motions. A random number rand, with a value between [0, 1], is employed to choose between these two techniques.
Mathematically, in the exploitation step, when \(g_{i}\) is greater than \(\left( \frac{2}{3}\right) \cdot G_{max}\), \({X}_{i}\) is modified for enhancing the exploitation’s performance. The aforementioned exploitation techniques of the AO can be illustrated as follows:
where \({X}_{i}^{g+1}\) is the aquila’s next updated position at the following \((g+1)^{th}\) iteration, \({X}_{Best}^{g}\) means the current best position found during the search at \(g^{th}\) iteration. \({X}_{Mean}^{g}\) denotes the mean value of the present positions at \(g^{th}\) iteration, and can be assessed through Eq. (19). The exploitation step’s adjustment parameters, \(\alpha\), and \(\delta\), are given to (0.1). UB and LB indicate the upper and lower limits of the search space, respectively. The search strategy is balanced using the quality function value \({QF}^{g}\), which is calculated using Eq. (23). Aquila’s arbitrary motions while pursuing its prey are reflected in \(\mathcal {Q}_1^{g}\) by Eq. (24). Aquila’s flying slope when tracking its prey is represented by \(\mathcal {Q}_2^{g}\), which decreases in value from 2 to 0 and is determined using equation Eq.(24). \({X}_{i}^{g}\) is the present position at the \(g^{th}\) iteration, \(Levy_d\) is the function of levy flight distribution, defined using Eq. (16), \(g_{i}\) means the current iteration, while \(G_{max}\) represents the maximum allowed iterations’ number.
Pseudocode of the proposed AO algorithm
After introducing the steps mentioned above of the AO algorithm, exploration and exploitation, and showing the four techniques suggested to imitate Aquila’s hunting behavior, the pseudocode of the proposed AO algorithm is presented in Algorithm 2. Additionally, Fig. 2 includes a flowchart of the AO algorithm to show its main steps.
Adjustment of boundconstraint
This paper presents a boundconstrained adjustment method for repositioning impractical decision variables beyond the search space’s scope during position improvement employing the abovementioned metaheuristic optimization algorithms (AVO and AO algorithms). Using the random method that adjusts decision variable values outside the permissible limits with randomly generated ones inside those limits is recommended. This method can be mathematically stated as follows:
Where \({X}_{i,d}^{adjust}\) represents the value of the appropriate decision variable, \({X}_{i,d}\) denotes the infeasible value that is beyond the variable’s limits, \({X}_{d}^{LB}\) and \({X}_{d}^{UB}\) depict the lower and upper limits, respectively, and rand(0, 1) is a random number falling within the range [0, 1].
XgbTree classification algorithm
The XgbTree [53] is a developed algorithm for the gradient boosting framework [54,55,56], which can classify sample instances into a specific class. This algorithm utilizes integrating methods and additional training procedures to integrate many weak learners to create a powerful learner. The XgbTree algorithm’s core concept is to boost the gradient tree by consecutively producing DTs. According to complementary models from prior iterations, boosting decreases errors and enhances classification performance. The XgbTree’s objective function consists of a training loss part, which is used to gauge how well the model performs on training data, and a regularization part, which handles the problems of overfitting and model complexity. The structure score of XgbTree is an objective function that is expressed as:
Where \(T_L\) stands for the whole leaves’ number on the tree, and \(\varphi _j\) signifies a vector value representing leaves’ scores.
The proposed IBAVOAO algorithm
Since FNs are purposefully designed to provide false information, detecting it can be challenging. This paper suggests an effective IBAVOAO algorithm to specify FNs by combining the AVO algorithm and the AO algorithm, leading to more accurate findings. In our proposed IBAVOAO algorithm, we tried to solve the FND problem by hybridizing the natural processes of AVO and AO. In the proposed algorithm, the AVO algorithm creates solutions in their search space and tries to improve them. After that, the AO algorithm improves the solutions produced in the space of the AVO solutions through exploration and exploitation processes. The IBAVOAO algorithm combines the AVO and AO algorithms through the following steps:

Firstly, within the specified search space, the population of AVO solutions with random values is initialized. the AVO algorithm handles The exploration and exploitation of the search space. The exploration step permits the algorithm to search for new areas of the search space, while the exploitation step concentrates on boosting the search around promising solutions. AVO algorithm simulates an African vulture’s living and feeding behavior to improve solutions iteratively.

The vultures navigate the search space by adjusting their positions based on their current positions and a set of candidate solutions that have already been discovered.

Secondly, the AO algorithm is another optimization technique used in the IBAVOAO algorithm. It operates on a set of the candidate solutions obtained by the AVO algorithm and improves them iteratively.

The AO algorithm aims to improve the produced AVO solutions and strike a balance between exploration and exploitation capabilities. This balance facilitates the effective exploration of the search space, preventing the occurrence of local optima and enhancing the convergence towards optimum solutions.

A set of new candidate solutions is created during the AO algorithm and combined with the population’s preexisting solutions. After that, a comparison between the new candidate solutions and the original solutions is made, and chosen is performed based on the values of their objective functions.
Combining the AVO and AO algorithms involves creating a hybrid IBAVOAO algorithm that leverages the strengths of each algorithm to improve overall optimization performance, convergence speed, and solution quality. The main advantages offered by the hybrid IBAVOAO algorithm integrating the AVO and AO algorithms over using them separately are as follows:

Enhanced exploration and exploitation: AVO and AO algorithms might be supreme in different aspects of exploration and exploitation. Combining them allows the hybrid IBAVOAO algorithm to explore a broader solution space effectively.

Diversity in search: AVO and AO algorithms have different search mechanisms, enabling the hybrid IBAVOAO algorithm to maintain a diverse population of solutions. This diversity can prevent premature convergence to suboptimal solutions.

Improved convergence: Leveraging the complementary strengths of AVO and AO algorithms, the hybrid IBAVOAO algorithm can converge faster toward better solutions than using each algorithm separately.

Robustness: The hybrid IBAVOAO algorithm enhances the robustness of the optimization process. It will be more resilient to getting stuck in local optima.
The proposed IBAVOAO algorithm divides News items into two class labels–Fake and Truthful–meaning that the FND issue is drafted as a binary classification. The suggested IBAVOAO algorithm’s flowchart is depicted in Fig. 3. The proposed methodology for FND includes the following steps: Initially, as described in Sect. “Data preprocessing”, the FNs dataset is preprocessed employing feature extraction and feature filtration methods. After that, the ultimate classification dataset is generated utilizing the pertinent features. Eventually, the proposed IBAVO AVO algorithm is applied to the developed dataset, which updates the positions and determines the best values depending on the XgbTree classification algorithm.
Experimental results and analysis
This section details the experimental results to evaluate the suggested FND methodology based on the IBAVOAO algorithm with the XgbTree classifier, describes the evaluation measures, and discusses the classification results.
Dataset description
The ISOTFNs dataset [57] is an extensive collection comprising approximately 44,900 news articles. This dataset is bifurcated into two primary categories: truthful and FNs. The methodology employed in compiling this dataset is meticulous, involving a selection of news articles from various sources, each meticulously vetted for reliability. For sourcing truthful news, the dataset relies on articles from Reuters, a wellregarded international news organization known for its comprehensive and factbased reporting. In contrast, the FNs articles are sourced from websites identified as unreliable by reputable factchecking entities such as Politifact.com and Wikipedia.
While the ISOTFNs dataset offers a valuable resource for studying the characteristics and spread of fake versus truthful news, it’s essential to consider potential biases. The selection of sources, particularly for FNs, might reflect biases inherent in the criteria used by Politifact.com and Wikipedia. This could result in a dataset that may not fully represent the spectrum of FNs sources, especially those that are more subtle or sophisticated in their misinformation strategies.
Moreover, when comparing the ISOTFNs dataset to other popular datasets in the field, such as the FNs Challenge (FNC1) dataset or the Liar dataset, there are noticeable differences in size, source diversity, and categorization methodologies. For instance, the FNC1 dataset focuses more on the stance detection between headline and body text, whereas the Liar dataset includes short statements and speeches labeled for truthfulness. These differences highlight the varying approaches in the field of FND and the importance of considering multiple datasets to gain a comprehensive understanding of the issue.
Experimental setup
FND is a complicated process, and the appropriate method requires various factors to identify manipulated news efficiently. That is the leading cause for integrating the IBAVOAO optimization algorithm and the XgbTree classification algorithm into the suggested methodology. Moreover, in contrast to the different methods presented, our suggested approach employed the Relief algorithm, which is explained in Sect. “Feature filtration”, to pick only the relevant features and identify FNs articles in less time and with lower processing cost by calculating the weights of each feature in the dataset and sorting them from biggest to smallest. Lastly, the features that have small weight are removed. Upon executing the Relief algorithm on the datasets, we discovered that the greatest weights were only associated with 50 features. For this reason, just these 50 important features were selected, while the remaining irrelevant features with minor weights were omitted.
This method focuses only on pertinent features and reduces the initial search space by locating features with comparable values for identical close samples and significant for the difference between dissimilar samples.
Thus, to adequately assess the performance of the proposed system, two sets of experiments were carried out on the utilized ISOT FNs dataset. In the first part of the experiments, we conducted a detailed comparative analysis using the ISOTFNs dataset. To provide a robust benchmarking framework, we selected a variety of wellestablished classification algorithms, each known for its unique strengths in the domain of FND. These include:

Decision Tree (DT) [58]: A simple yet powerful algorithm valued for its interpretability and ease of use in various classification problems.

Knearest Neighbors (kNN) [59]: This algorithm is effective in handling multiclass classification tasks and is known for its simplicity and efficacy.

Gaussian Naive Bayes (GNB) [60]: Chosen for its proficiency in managing highdimensional data, GNB applies a probabilistic approach to classification.

Support Vector Machine (SVM) [61]: Renowned for its robustness, especially in highdimensional spaces, making it suitable for complex classification tasks.

Random Forest (RF) [62]: Selected for its high accuracy and efficiency, especially in large datasets, RF is a versatile and powerful ensemble method.

Multilayer Perceptron (MLP) [63]: A feedforward artificial neural network known for its ability to learn nonlinear models and patterns in data. These algorithms were rigorously tested against our IBAVOAO with XgbTree classification on the same dataset, providing a comprehensive and balanced benchmarking environment.
Table 2 shows the significant parameters of the classification algorithms introduced in this paper.
Secondly, a comprehensive comparative analysis between our proposed IBAVOAO combined with the XgbTree classification algorithms and a range of widely recognized metaheuristic optimization algorithms. These algorithms were meticulously chosen for their relevance and popularity in optimization tasks. They include:

Binary African Vulture Optimization (BAVO) [50]: An optimization algorithm inspired by the foraging behavior of vultures, known for its efficiency in binary search spaces.

Binary Aquila Optimizer (BAO) [51]: This algorithm mimics the hunting strategy of Aquila eagles and is notable for its precision and speed.

Binary Sparrow Search Algorithm (BSSA) [64]: A novel algorithm based on the social behavior of sparrows, appreciated for its ability to explore and exploit the solution space.

Binary Atom Search Optimization (BASO) [65]: Inspired by the laws of physics and molecular movement, known for its robustness in binary optimization problems.

Binary Henry Gas Solubility Optimization (BHGSO) [66]: This algorithm simulates the gas solubility process and is recognized for its adaptability in various optimization contexts.

Binary Harris Hawks Optimization (BHHO) [67]: Mimics the cooperative hunting strategy of Harris hawks, known for its effectiveness in complex optimization scenarios.

Binary Sailfish Optimizer (BSFO) [68]: Based on the predatory behavior of sailfish, this algorithm is praised for its swift convergence and flexibility.

Binary Bat Algorithm (BBA) [69]: Utilizes echolocation behavior of bats and is popular for its balance between exploration and exploitation.

Binary Grasshopper Optimization Algorithm (BGOA) [70]: Inspired by the swarming behavior of grasshoppers, it’s efficient in finding global optima in complex landscapes.

Binary Artificial Bee Colony (BABC [71]): Mimics the foraging behavior of honey bees, wellregarded for its simplicity and effectiveness in binary domains.

Binary Particle Swarm Optimization (BPSO) [72]: Based on the social behavior of bird flocking, this algorithm is known for its efficiency and easy implementation.
These selected algorithms represent a diverse range of strategies in metaheuristic optimization, ensuring a robust and comprehensive benchmarking against our proposed IBAVOAO algorithm. The comparative study was conducted on the ISOTFNs dataset, and the specific parameters employed for each algorithm in the comparison are detailed in Table 3.
Python was used on a computing environment with a Dual Intel^{®} Xeon^{®} Gold 5115 2.4 GHz CPU and 128 GB of RAM on the Microsoft Windows Server 2019 operating system to run all experiments in this study. For a reliable comparison, the size of the population is estimated to be ten, and the maximum number of iterations is estimated to be 100 for all methods. Accordingly, the population size was set to 10, and the number of iterations was set to 100. Also, in this study, the new dataset is split into learning and testing after defining contentoriented attributes and creating a new dataset. Thus, 80% of the data was utilized for learning, while 20% was utilized for evaluating the proposed system. Finally, a 10fold crossvalidation method is employed to reduce model error for learning and testing purposes.
Evaluation measures
In this study, the effectiveness of the suggested IBAVOAO with the XgbTree methodology must be assessed utilizing standard metrics to ensure that the empirical outcomes are statistically valuable. To that end, the primary evaluation metric employed was accuracy [73], which is the number of successful predictions divided by the total number of predictions.
Accuracy is expressed as in Eq. (27):
Where True Positive \((T_P)\) is the percentage of FNs that were successfully classified utilizing the proposed system, True Negative \((T_N)\) is the percentage of truthful news that was successfully classified utilizing the proposed system, False Positive \((F_P)\) is the percentage of truthful news classified as FNs, and the percentage of FNs items classified as truthful news is represented by False Negative \((F_N)\).
Kappa is calculated with the following formula:
\(P_o\) is the model’s overall accuracy, and \(P_e\) is the agreement between the model predictions and the actual class values.
Precision [74] is expressed as in Eq. (29):
Recall [75] is expressed as in Eq. (30):
FMeasure \((F_1)\) [76] is obtained as in Eq. (31):
Specificity [77] is expressed as in Eq. (32):
Sensitivity [77] is expressed as in Eq. (33):
The fitness measure calculates the mean fitness results achieved by running the suggested method separately for 30 runs, demonstrating the synergy between minimizing the number of features selected and reducing the error classification rate as Eq. (34). The minimum value presents the best result, which is assessed according to fitness as:
where \(f_\cdot ^{k}\) is the optimum fitness result achieved in the kth run.
Features Size measure shows the mean number of selected features by running the method separately 30 runs and is defined as:
where \(\vert d_\cdot ^{k} \vert\) is the size of features chosen in the optimal solution for the kth run, and \(\vert D \vert\) represents the complete size of features in the used benchmark.
Standard Deviation (SD): Corresponding to the measures mentioned above, the final results achieved over the 30 independent runs for each algorithm on every dataset are evaluated and analyzed in terms of stability as:
where Y denotes the metric to be measured, \(Y_*^k\) is the value of the metric Y in the kth run, and \(\mu _Y\) is the average of the metric over the 30 independent runs.
Effect of different components of the proposed IBAVOAO algorithm for FND
The proposed IBAVOAO algorithm is compared to the original versions of AVO and AO algorithms to show how this hybridization improves the performance of the IBAVOAO algorithm. Table 4 displays the results of the proposed IBAVOAO algorithm and its component algorithms on the utilized ISOTFNs dataset for FND, in which boldface numbers indicate the best results.
Results analysis of the proposed IBAVOAO algorithm versus diverse stateoftheart ML methods and metaheuristic algorithms for FND
In the first part of the analysis of the results, we compared the empirical outcomes of the proposed IBAVOAO algorithm with some stateoftheart ML methods on the used ISOTFNs dataset for FND. For a reliable comparison, the suggested system and selected methods are executed on a framework with identical parameters and tested on the same ISOTFNs dataset.
Table 5 shows the results of the proposed IBAVOAO algorithm and other stateoftheart Ml methods on the utilized ISOTFNs dataset for FND, where boldface numbers indicate the best results. Table 5 shows that the proposed IBAVOAO algorithm and ML methods are compared and assessed regarding average accuracy, Kappa, Precision, Recall, and F1 score. According to the results obtained, the overall performance of the proposed IBAVOAO algorithm was further compared with other ML methods on the extracted attributes. The proposed IBAVOAO algorithm succeeded in categorizing 92.75% of the news articles. After the suggested system, GNB ranked second with a classifying rate of 81.96% of news articles, but there is a gap of more than 10% between it and the suggested system. As presented in Table 5, the DT and kNN methods generated the least values in categorizing various news articles.
In the second part of the analysis of the results, we compared the experimental results of the proposed IBAVOAO algorithm with some known metaheuristic optimization techniques on the used data set for FND. As shown in Table 6, the proposed IBAVOAO algorithm and wellknown metaheuristic optimization techniques are compared and evaluated in terms of average accuracy, fitness values, number of selected features, Kappa, Precision, Recall, F1score, Specificity, Sensitivity, ROC_AUC, and MCC. Note that boldface values denotes the best results. Regarding classification accuracy values presented in Table 6, the proposed IBAVOAO algorithm succeeded in categorizing 92.7% of the news articles. After the proposed IBAVOAO algorithm, BAVO ranked second with an average rating of 92.62% of news articles. Also, the stability of the proposed IBAVOAO algorithm is relatively strong depending on the SD values of the different algorithms. Based on the number of features chosen, it is noted that the proposed IBAVOAO algorithm comes first by selecting the minimum mean size of attributes on the used ISOTFNs dataset, followed by the BSSA method. Additionally, the proposed IBAVOAO algorithm obtained the best exploration capability over other algorithms regarding the mean selected features number, which was confirmed by choosing the least features number on the selected ISOTFNs dataset. That verifies the capability of the proposed IBAVOAO algorithm to neglect nonsignificant search regions and discover the most feasible regions. Therefore, the proposed IBAVOAO algorithm can minimize the feature search region by identifying the most relevant attributes while preserving the highest classification accuracy. Based on fitness value. It should be noted that the proposed IBAVOAO algorithm first obtains the minimum mean fitness value on the used ISOTFNs dataset, followed by the BAVO and BAO methods. Finally, the proposed IBAVOAO algorithm is based on the remaining evaluation measures.
Figure 4 reveals that the proposed approach on the selected attributes generated the highest performance compared to other optimization algorithms. The mean accuracy and Fmeasure of the proposed IBAVOAO algorithm are 92.75% and 92.76%, respectively. Following the proposed IBAVOAO algorithm, BAVO outperforms other known methods with average accuracy and Fmeasure of 92.62% and 92.62%. The BAO method also achieved nearly comparable outcomes with BAVO, producing accuracy and Fmeasure of 92.61% and 92.62%. The BSSA algorithm obtained good results by successfully classifying 92.58% of news articles. The BHGSO method obtained the least results and classified 92.11% of news articles. The comparative study results show that the proposed IBAVOAO algorithm obtained greater Fmeasure and accuracy than stateoftheart optimization methods and was reliable in classifying various news articles.
Analysis of convergence
This section reveals an asymptotic investigation of the proposed IBAVOAO algorithm for handling the FND strategy on the selected dataset to verify its capability in convergence, as shown in Fig. 5. These convergence graphs show the convergence capability of the proposed IBAVOAO algorithm against their peers, which are all evaluated and executed under identical situations of the number of iterations and population size. Figure 5 shows that the proposed IBAVOAO algorithm demonstrated fast yet optimal convergence behavior on the selected dataset. Hence, the proposed IBAVOAO algorithm emphasizes its ability to acquire the optimal solution on time, ensuring an effective balance between exploration and exploitation capabilities.
Comparison results of the proposed IBAVOAO algorithm against different algorithms from existing studies for FND
Table 7 illustrates the experimental outcomes of comparisons in terms of Accuracy, Kappa, Precision, Recall, and F1score metrics between the suggested IBAVOAO algorithm and other algorithms from existing studies, including the WOAXgbTree [24], AB [12], WangCNN [78], WangBiLSTM [78], Ridor [79, 80], and IBk [80, 81] for FND issue. Note that boldface values denotes the best results. It can be noted that the proposed IBAVOAO exceeded others in all performance measures except recall. Moreover, the WOAXgbTree algorithm is ranked first in recall and second in precision and F1score. Finally, the IBK algorithm is ranked last in all performance measures.
Results analysis of the proposed IBAVOAO algorithm versus various stateoftheart ML methods and metaheuristic algorithms on new unseen datasets for FND
In this section, the proposed methodology is validated using different common datasets for detecting FNs. These datasets includes: FAKES [82], BuzzFeed [83], UTK (Kaggle) [84], and Data (Kaggle) [85]. Two types of experiments are conducted on these datasets. First, a comparison with the most common ML models. Second, a comparison with recent optimization techniques. Table 8 shows the results of the proposed IBAVOAO algorithm and other stateoftheart ML methods on the new unseen datasets for FND regarding average accuracy, Kappa, Precision, Recall, and F1 score, where boldface values determine the best results. According to the results obtained in Table 8, the proposed IBAVOAO algorithm outperformed all utilized ML models in all performance measures. None of the ML models used in the comparison ranked first in any of the performance measures.
In the second part of the analysis of the results, we compared the experimental results of the proposed IBAVOAO algorithm with some known metaheuristic optimization algorithms on the new unseen datasets for FND, in which boldface values determine the best results. As shown in Table 9, the proposed IBAVOAO algorithm and wellknown metaheuristic optimization regarding average accuracy, fitness values, number of selected features, Kappa, Precision, Recall, F1score, Specificity, Sensitivity, ROC_AUC, and MCC. According to the results obtained in Table 9, the proposed IBAVOAO algorithm outperformed all utilized optimization techniques in all performance measures. None of the optimization techniques used in the comparison ranked first in any of the performance measures.
Conclusion and future work
Recently, FNs have been the most critical issue that harms society and individual users, making FND a great challenge. This study presented a new FNs classification and detection paradigm depending on an effective IBAVOAO algorithm with the XgbTree classifier. The proposed IBAVOAO algorithm has preliminary stages: The ISOTFNs dataset is retrieved first. Then, a preprocessing step is performed to transfer the unstructured data into structured data and analyze and extract the necessary attributes. This step includes extracting attributes from the ISOTFNs dataset by ignoring useless words, stemming, tokenizing, encoding, and padding data into a sequence of integers using the GLOVE method; the extracted attributes are then filtered utilizing the effective Relief method to discover only suitable ones. Finally, the retrieved features are used to categorize the news items using the proposed IBAVOAO based on The XgbTree classifier. The suggested system obtained results have been analyzed and compared with stateoftheart ML classifiers and optimization techniques concerning the accuracy, fitness values, the number of selected features, Kappa, Precision, Recall, F1score, Specificity, Sensitivity, ROC_AUC, and MCC toward the same ISOTFNs dataset. Moreover, one focus point was extracting attributes from news articles to assist the FND system in getting higher accuracy and shorter processing time. The results obtained from the proposed IBAVOAO algorithm showed that the extracted attributes positively affect the performance of the proposed FND system.
Following this, it’s important to state the limitations of the study to provide a balanced and realistic view of its scope and applicability:

Dataset Scope and Diversity: The ISOTFNs dataset, while comprehensive, may not fully encompass the broad spectrum of FNs sources, especially more subtle or complex misinformation strategies.

SingleModality Focus: The study focused solely on textbased news articles, excluding multimedia elements like images or videos often integral to FNs.

Algorithmic Adaptability: The performance and adaptability of the IBAVOAO algorithm across various datasets and types of FNs content require further exploration.
In the future, we tend to analyze and investigate these topics:

Incorporating Multimodal Data: Future research will focus on processing news articles that include images and text, moving beyond the textonly approach to provide a more comprehensive analysis of FNs.

Exploring Diverse Classification Methods: Plans include applying other classification methods like neural networks, kNN, and Random Forest (RF) to assess further the behavior and efficacy of the IBAVOAO algorithm in various classification tasks.

Broadening Input Features: We aim to analyze optimization methods with multiple input features, ranging from raw text to handcrafted attributes. This approach could uncover new insights and enhance the system’s ability to detect FNs more accurately.
Availability of data and materials
The developed software and code in this study are available on request.
References
Bessi A, Coletto M, Davidescu GA, Scala A, Caldarelli G, Quattrociocchi W. Science vs conspiracy: collective narratives in the age of misinformation. PLoS ONE. 2015;10(2):e0118093.
Gravanis G, Vakali A, Diamantaras K, Karadais P. Behind the cues: a benchmarking study for fake news detection. Expert Syst Appl. 2019;128:201–13.
Alzanin SM, Azmi AM. Detecting rumors in social media: a survey. Procedia Comput Sci. 2018;142:294–300.
Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: a data mining perspective. ACM SIGKDD Explorations Newsl. 2017;19(1):22–36.
Bondielli A, Marcelloni F. A survey on fake news and rumour detection techniques. Inf Sci. 2019;497:38–55.
Ferrara E, Varol O, Davis C, Menczer F, Flammini A. The rise of social bots. Commun ACM. 2016;59(7):96–104.
Zhang X, Ghorbani AA. An overview of online fake news: characterization, detection, and discussion. Inf Proc Manag. 2020;57(2):102025.
Della Vedova ML, Tacchini E, Moret S, Ballarin G, DiPierro M, de Alfaro L, Automatic online fake news detection combining content and social signals, In: 2018 22nd conference of open innovations association (FRUCT), IEEE, 2018;272–9.
Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FJM, Ignatious E, Shultana S, Beeravolu AR, De Boer F. Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques. IEEE Access. 2021;9:19304–26.
Shamrat FJM, Tasnim Z, Ghosh P, Majumder A, Hasan MZ, Personalization of job circular announcement to applicants using decision tree classification algorithm, in: 2020 IEEE International Conference for Innovation in Technology (INOCON), IEEE, 2020;1–5.
Afrin S, Shamrat FJM, Nibir TI, Muntasim MF, Moharram MS, Imran M, Abdulla M. Supervised machine learning based liver disease prediction approach with lasso feature selection. Bull Electric Eng Inf. 2021;10(6):3369–76.
Nasir JA, Khan OS, Varlamis I. Fake news detection: a hybrid cnnrnn based deep learning approach. Int J Inf Manag Data Insights. 2021;1(1):100007.
Sahoo SR, Gupta BB. Multiple features based approach for automatic fake news detection on social networks using deep learning. Appl Soft Comput. 2021;100:106983.
Choudhary A, Arora A. Linguistic feature based learning model for fake news detection and classification. Expert Syst Appl. 2021;169:114171.
Singhania S, Fernandez N, Rao S. 3han: A deep neural network for fake news detection, in: International conference on neural information processing, Springer, 2017;572–581.
Kaliyar RK, Goswami A, Narang P, Sinha S. Fndneta deep convolutional neural network for fake news detection. Cogn Syst Res. 2020;61:32–44.
Umer M, Imtiaz Z, Ullah S, Mehmood A, Choi GS, On BW. Fake news stance detection using deep learning architecture (cnnlstm). IEEE Access. 2020;8:156695–706.
Rodríguez ÁI, Iglesias LL. Fake news detection using deep learning, arXiv preprint arXiv:1910.03496.
Monti F, Frasca F, Eynard D, Mannion D, Bronstein MM. Fake news detection on social media using geometric deep learning, arXiv preprint arXiv:1902.06673.
Thota A, Tilak P, Ahluwalia S, Lohia N. Fake news detection: a deep learning approach. SMU Data Sci Rev. 2018;1(3):10.
Abedalla A, AlSadi A, Abdullah M. A closer look at fake news detection: a deep learning perspective, in: Proceedings of the 2019 3rd International Conference on Advances in Artificial Intelligence, 2019;24–28.
Ozbay FA, Alatas B. A novel approach for detection of fake news on social media using metaheuristic optimization algorithms. Elektronika ir Elektrotechnika. 2019;25(4):62–7.
Ozbay FA, Alatas B. Adaptive salp swarm optimization algorithms with inertia weights for novel fake news detection model in online social media. Multimedia Tools Appl. 2021;80(26):34333–57.
Sheikhi S. An effective fake news detection method using woaxgbtree algorithm and contentbased features. Appl Soft Comput. 2021;109:107559.
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation, in: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014;1532–1543.
Jain V, Kaliyar RK, Goswami A, Narang P, Sharma Y. Aenet: an attentionenabled neural architecture for fake news detection using contextual features, Neural Computing and Applications 2021;1–12.
Abd ElMageed AA, Gad AG, Sallam KM, Munasinghe K, Abohany AA. Improved binary adaptive wind driven optimization algorithmbased dimensionality reduction for supervised classification. Comput Ind Eng. 2022;167:107904.
Abd ElMageed AA, Abohany AA, Saad HM, Sallam KM. Parameter extraction of solar photovoltaic models using queuing search optimization and differential evolution. Appl Soft Comput 2023;110032.
Vishwakarma DK, Varshney D, Yadav A. Detection and veracity analysis of fake news via scrapping and authenticating the web search. Cogn Syst Res. 2019;58:217–29.
Castillo C, Mendoza M, Poblete B. Information credibility on twitter, in: Proceedings of the 20th international conference on World wide web, 2011;675–684.
Jin F, Dougherty E, Saraf P, Cao Y, Ramakrishnan N. Epidemiological modeling of news and rumors on twitter, in: Proceedings of the 7th workshop on social network mining and analysis, 2013;1–9.
Wu K, Yang S, Zhu KQ, False rumors detection on sina weibo by propagation structures, in: 2015 IEEE 31st international conference on data engineering, IEEE, 2015;651–62.
Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong KF, Cha M. Detecting rumors from microblogs with recurrent neural networks., in: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), 2016;3818–3824.
Sampson J, Morstatter F, Wu L, Liu H. Leveraging the implicit structure within social media for emergent rumor detection, in: Proceedings of the 25th ACM international on conference on information and knowledge management, 2016;2377–2382.
Yang F, Liu Y, Yu X, Yang M. Automatic detection of rumor on sina weibo, in: Proceedings of the ACM SIGKDD workshop on mining data semantics, 2012;1–7.
Reganti AN, Maheshwari T, Kumar U, Das A, Bajpai R. Modeling satire in English text for automatic detection, in: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), IEEE, 2016;970–977.
Buschmeier K, Cimiano P, Klinger R. An impact analysis of features in a classification approach to irony detection in product reviews, in: Proceedings of the 5th workshop on computational approaches to subjectivity, sentiment and social media analysis, 2014;42–49.
Kwon S, Cha M, Jung K. Rumor detection over varying time windows. PLoS ONE. 2017;12(1):e0168344.
Sedik A, Abohany AA, Sallam KM, Munasinghe K, Medhat T. Deep fake news detection system based on concatenated and recurrent modalities. Expert Syst Appl. 2022;208:117953.
Meel P, Vishwakarma DK. A temporal ensembling based semisupervised convnet for the detection of fake news articles. Expert Syst Appl. 2021;177:115002.
Kumar S, Asthana R, Upadhyay S, Upreti N, Akbar M. Fake news detection using deep learning models: a novel approach. Trans Emerg Telecommun Technol. 2020;31(2):e3767.
Shim JS, Lee Y, Ahn H. A link2vecbased fake news detection model using web search results. Expert Syst Appl. 2021;184:115491.
Zervopoulos A, Alvanou AG, Bezas K, Papamichail A, Maragoudakis M, Kermanidis K, Deep learning for fake news detection on twitter regarding the 2019 Hong Kong protests. Neural Comput Appl. 2021;1–14.
Huang YF, Chen PH. Fake news detection using an ensemble learning model based on selfadaptive harmony search algorithms. Expert Syst Appl. 2020;159:113584.
Sansonetti G, Gasparetti F, D’aniello G, Micarelli A. Unreliable users detection in social media: Deep learning techniques for automatic detection. IEEE Access. 2020;8:213154–67.
Samadi M, Mousavian M, Momtazi S. Deep contextualized text representation and learning for fake news detection. Inf Proc Manag. 2021;58(6):102723.
Khan JY, Khondaker MTI, Afroz S, Uddin G, Iqbal A. A benchmark study of machine learning models for online fake news detection. Mach Learning Appl. 2021;4:100032.
Kira K, Rendell LA et al. The feature selection problem: traditional methods and a new algorithm, in: Aaai, 1992;2:129–134.
Kononenko I. Estimating attributes: analysis and extensions of relief, in: European conference on machine learning, Springer, 1994;171–182.
Abdollahzadeh B, Gharehchopogh FS, Mirjalili S. African vultures optimization algorithm: a new natureinspired metaheuristic algorithm for global optimization problems. Comput Ind Eng. 2021;158:107408.
Abualigah L, Yousri D, Abd Elaziz M, Ewees AA, AlQaness MA, Gandomi AH. Aquila optimizer: a novel metaheuristic optimization algorithm. Comput Ind Eng. 2021;157:107250.
Steenhof K, Kochert MN, Mcdonald TL. Interactive effects of prey and weather on golden eagle reproduction. J Animal Ecol. 1997;66(3):350–62.
Chen T, Guestrin C. Xgboost: a scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016;785–794.
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407.
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;1189–1232.
Friedman JH. Stochastic gradient boosting. Comput Stat Data Anal. 2002;38(4):367–78.
Isot fake news dataset, https://www.uvic.ca/engineering/ece/isot/datasets/fakenews.
Rokach L, Maimon O. Data mining with decision trees: theory and applications. World Scientific; 2020.
Zhang P, Zhou D. Understanding the knearest neighbor: from an algebraic perspective. Pattern Recogn. 2020;100:107149.
Webb GI, Keogh E, Miikkulainen R. Naive bayes: the good, the bad, and the ugly, in: Advances in Intelligent Data Analysis XVII, Springer, 2019;428–440.
Liu Q, Zhou ZH. Support vector machines: theory, algorithms, and extensions. CRC Press; 2019.
Belgiu M, Drăguţ L. Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens. 2016;114:24–31.
Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016.
Xue J, Shen B. A novel swarm intelligence optimization approach: sparrow search algorithm. Syst Sci Control Eng. 2020;8(1):22–34.
Zhao W, Wang L, Zhang Z. Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. KnowlBased Syst. 2019;163:283–304.
Hashim FA, Houssein EH, Mabrouk MS, AlAtabany W, Mirjalili S. Henry gas solubility optimization: a novel physicsbased algorithm. Futur Gener Comput Syst. 2019;101:646–67.
Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: algorithm and applications. Futur Gener Comput Syst. 2019;97:849–72.
Shadravan S, Naji HR, Bardsiri VK. The sailfish optimizer: a novel natureinspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng Appl Artif Intell. 2019;80:20–34.
Mirjalili S, Mirjalili SM, Yang XS. Binary bat algorithm. Neural Comput Appl. 2014;25(3):663–81.
Mirjalili SZ, Mirjalili S, Saremi S, Faris H, Aljarah I. Grasshopper optimization algorithm for multiobjective optimization problems. Appl Intell. 2018;48(4):805–20.
Karaboga D, Basturk B. On the performance of artificial bee colony (abc) algorithm. Appl Soft Comput. 2008;8(1):687–97.
Poli R, Kennedy J, Blackwell T. Particle swarm optimization. Swarm Intell. 2007;1(1):33–57.
Yin M, Wortman Vaughan J, Wallach H. Understanding the effect of accuracy on trust in machine learning models, in: Proceedings of the 2019 chi conference on human factors in computing systems, 2019;1–12.
De Medeiros AKA, Guzzo A, Greco G, Van Der Aalst WM, Weijters A, Van Dongen BF, Saccà D. Process mining based on clustering: A quest for precision, in: International Conference on Business Process Management, Springer, 2007;17–29.
Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retrieval. 2009;12(4):461–86.
Amigó E, Gonzalo J, Artiles J, Verdejo F. Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. J Artif Intel Res. 2011;42:689–718.
Parikh R, Mathai A, Parikh S, Sekhar GC, Thomas R. Understanding and using sensitivity, specificity and predictive values. Indian J Ophthalmol. 2008;56(1):45.
Ahmad I, Yousaf M, Yousaf S, Ahmad MO. Fake news detection using machine learning ensemble methods. Complexity. 2020;2020:1–11.
Lakmali K, Haddela PS. Effectiveness of rulebased classifiers in sinhala text categorization, in: 2017 National Information Technology Conference (NITC). IEEE. 2017;153–8.
Ozbay FA, Alatas B. Fake news detection within online social media using supervised artificial intelligence algorithms. Physica A. 2020;540:123174.
Kaladhar D, Pottumuthu BK, Rao PVN, Vadlamudi V, Chaitanya AK, Reddy RH. The elements of statistical learning in colon cancer datasets: data mining, inference and prediction. Algorithms Res. 2013;2(1):8–17.
Fakes fake news dataset, https://zenodo.org/records/2607278.
Buzzfeed fake news dataset, https://www.buzzfeednews.com/article/craigsilverman/viralfakeelectionnewsoutperformedrealnewsonfacebook#.gt1ygzDN.
Utk fake news dataset, https://www.kaggle.com/c/fakenews.
Data fake news dataset, https://www.kaggle.com/datasets/jruvika/fakenewsdetection.
Acknowledgements
Not applicable.
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Author information
Authors and Affiliations
Contributions
Conceptualization, methodology, software, statistical analysis, data analysis, literature review, discussion, writing—original draft preparation: ABE, AAA, and KMH; data downloading: AHA, ABE, and AAA; writing—review and editing: ABE, AAA, and KMH; visualization: AHA, AAA, and ABE; supervision: KMH. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Abd ElMageed, A.A., Abohany, A.A., Ali, A.H. et al. An adaptive hybrid african vulturesaquila optimizer with XgbTree algorithm for fake news detection. J Big Data 11, 41 (2024). https://doi.org/10.1186/s40537024008959
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40537024008959