Unlocking the potential of Naive Bayes for spatio temporal classification: a novel approach to feature expansion

Prasetiyowati, Sri Suryani; Sibaroni, Yuliant

doi:10.1186/s40537-024-00958-x

Research
Open access
Published: 05 August 2024

Unlocking the potential of Naive Bayes for spatio temporal classification: a novel approach to feature expansion

Sri Suryani Prasetiyowati¹ &
Yuliant Sibaroni¹

Journal of Big Data volume 11, Article number: 106 (2024) Cite this article

328 Accesses
Metrics details

Abstract

Prediction processes in areas ranging from climate and disease spread to disasters and air pollution rely heavily on spatial–temporal data. Understanding and forecasting the distribution patterns of disease cases and climate change phenomena has become a focal point of researchers around the world. Machine learning models for prediction can generally be classified into 2: based on previous patterns such as LSTM and based on causal factors such as Naive Bayes and other classifiers. The main drawback of models such as Naive Bayes is that it does not have the ability to predict future trends because it only make predictionsin the present time. In this study, we propose a novel approach that makes the Naive Bayes classifier capable of predicting future classification. The process of expanding the dimension of the feature matrix based on historical data from several previous time periods is performed to obtain a long-term classification prediction model using Naive Bayes. The case studies used are the prediction of the distribution of the annual number of dengue fever cases in Bandung City and the distribution of monthly rainfall in Java Island, Indonesia. Through rigorous testing, we demonstrate the effectiveness of this Time-Based Feature Expansion approach in Naive Bayes in accurately predicting the distribution of annual dengue fever cases in 30 sub-districts in Bandung City and monthly rainfall in Java Island, Indonesia with with both accuracy and F1-score reaching more than 97%.

Graphical Abstract

Introduction

Prediction of future events has become an interesting topic by many researchers [1,2,3] because with the prediction results obtained, many parties will obtain information about future events, which is very important in preparing appropriate strategies or mitigation. Most of the prediction methods are found in statistical sciences, and usually predictions are made based on time and spatial. The predictive models are used in various fields such as health, business, climate, transportation. Predictive modeling is a statistical technique that is commonly used to predict future behavior. Predictive model development is one of the statistical techniques used to predict future behavior. The resulting solution is a data mining technique that analyzes historical and current data.

Prediction can also be implemented with a machine learning approach. In addition to prediction, machine learning can also be used to solve problems in regression, classification and clustering [4]. This method can be computationally intensive, as it involves large and complex data, so it can play an important role in solving spatial problems in various application areas, from multivariate prediction to image classification to spatial pattern detection [5,6,7]. But the predictions model obtained based on machine learning is limited in predicting cases at the present time, cannot be used for future predictions. This makes prediction models based on machine learning inappropriate for predicting future events. It is very interesting to develop a prediction model based on machine learning that can be used to predict future events.

Time feature -based learning that represents and analyzes the property of time elements, including mapping time series properties, such as trends, seasonal, and stationary [3, 5, 8,9,10,11] has implemented the engine learning method In various spatial time data, namely the fields of geology, epidemiology, health care, climate science, enviromental science, precision agriculture, neuroscience, social media, etc.

Several studies conducted by [2, 3, 12, 13] used machine learning methods that were applied to time spatial data, with the aim of making predictions for the future. But in all of these studies there is no entanglement and continuity between the machine learning model and the prediction process. Prediction system performance is measured based on classifier model accuracy, while the prediction process used linear regression with the independent variables is classification results, time, without involving its features. In these studies, the classification methods used are Artificial Neural Network, Naïve Bayes, K Nearst Network, Logistic Regression, Super Vector Machine, Decssion Tree, and Random Forest [2, 3]. Studied classification predictions on spatial-time data, using simple linear regression, with a predictive time period of 100 years and 6 days, respectively. In both studies, before predicting using regression, a classification study was carried out using machine learning and Random Forest. Classification model obtained by machine learning, using all the features in the data train, while in the predicting process of target data, the regression method is used with only involves the time feature. Predicted results with regression for the future do not contain features used in previous machine learning models. Of all the classification methods used, Naïve Bayes is the simplest classification method. Naïve Bayes is one of the most popular classification algorithms, simple but very practical. Its efficiency comes from the assumption of feature independence, although this may be met in some studies using big data [14, 15]. The Naïve Bayes classification method has the advantage of adjusting parameter freedom and is more robust. But Naïve Bayes is also still quite reliable for use in small data sets [16].

For prediction purposes, the Naïve Bayes classification method is adaptive for feature weighting and makes feature selection easier, simpler and more efficient [16,17,18]. Empirical results show that the Naive Bayes selection method shows high classification accuracy [15]. When compared to other feature selection approaches, Naïve Bayes obtains more competitive results regarding accuracy, sparity, and time for balanced data sets. But for the case of class-imbalanced data sets, Naive Bayes still works well, with different levels of classification for different classes still achievable [19]. Meanwhile [20] uses the Naïve Bayes Classifier with Maximal Time Series Motif to classify ECG abnormalities, with features such as record numbers and discrete sequences. The Time Series Motif Detection is proposed as Feature extraction and when combined with NB classifier it is superior with 98% accuracy, than feature selection with classifier. While in Electroencephalogram (EEG) classification research [12] and urban waterlogging classification [21], the use of feature extraction and the weighted Niave Bayes method on spatial time data has been shown to provide high accuracy. Another technique to increase the accuracy of Naive Bayes is to reduce the problem of interdependence between its features [22].

Based on previous studies, it can be concluded that the use of classifier methods such as Naive Bayes in future predictions has never been done fully based on the training model. In this study, the process of developing the feature expansion method is carried out and combined with the Naive Bayes classifier so that the obtained classifier training model can be used to predict future class events.

The main contributions of this research are as follows: (1) Produce a feature expansion method based on spatial time data that can be used to build a classification prediction model for some time in the future based on the Naive Bayes classifier, (2) Produce a prediction model for the number of annual dengue fever cases based on the Naive Bayes classifier, (3) Produce a monthly rainfall level prediction model based on the Naive Bayes classifier. The baseline used in this research is prediction model research using regression on the same data or similar data.The data used in this research are data on the number of dengue fever cases and rainfall, where both data are spatial-time data and have been used as data sets in previous studies Network [23,24,25]. Prediction of dengue case classes using Naive Bayes in [23] provided an accuracy of 74% while the use of a Voting-based Hybrid method for Naïve Bayes, K-Nearest Neighbor, and Artificial Neural Networks resulted in an accuracy of 90%. Meanwhile, for the rainfall dataset, there are 2 researches that have used the data. Prediction of rainfall levels using Logistic Regression [24] resulted in an accuracy of 72%, while the use of Naive Bayes and hybrid Naive Bayes-C4.5 methods [25] resulted in an accuracy of 52.98% and 64.95% respectively.

Related work

Feature expansion

In recent years, the idea of feature engineering has confirmed the outstanding performance of machine learning techniques, which can automate several applications. Feature engineering techniques such as feature extraction, feature selection and feature expansion are often applied to machine learning classification [26]. Feature extraction is the process of selecting the best subset of features from the overall feature set [27]. This approach is the creation of a new feature which is also known as feature construction. Whereas feature expansion is combining additional features from the input data, which combines the different relationships between the original features of the two objects. The goal is to extend the original vector or form a new feature, which is related to the distance from each data sample to the number of centroids found by the clustering algorithm. The use of this approach is usually applied to pattern recognition problems [28, 29], and in certain contexts, such as sentence retrieval, intrusion detection and sentiment analysis. Research [28] discusses classification with a combination of feature extraction and feature expansion, to perform a transformation from the original feature, which can increase the classification similarity score independently.

The feature expansion technique for the classification of one-dimensional time series data proposed by [30], extended features can include temporal, frequency, and statistical characteristics. The study stated that the value of classification accuracy obtained was higher than the results of conventional machine learning. Feature expansion techniques allow the classifier to consider multiple dimensions that are not feasible in low-dimensional data. Feature expansion works by taking features in the original data and doing something with or on those features, then adding additional dimensions, to see if there is an increase in the accuracy of the resulting hyperplane [31]. In feature expansion it is possible to use linear classifiers on some data by creating new features in new dimensions.

Regarding the possibility of irrelevant features appearing in the data set, feature engineering also requires substantial manual effort in designing and selecting features [32]. According to [33,34,35], feature selection provides an effective way, by removing irrelevant and redundant data. Feature selection is the process of selecting certain features that are considered the most influential in the classification process.

Impact of feature expansion

In some cases, [36] states that merging all features into one feature space does not guarantee optimal performance, because of dimensional problems. To overcome these problems, a variation of Bayesian approach to the multinomial probit model is used, for base expansion and kernel combinations. This model has a solid foundation in a hierarchical Bayesian framework and is capable of instructively combining available sources of information, for multinomial classification. Meanwhile [37] discusses adding features to the classification process with machine learning, to increase the accuracy or precision value of the original data classification results. The scenario used is to compare the accuracy of the addition of 2-dimensional and 4-dimensional features in the classification process. There are 5 classification methods used, namely CART, RF, Gradient Boosting, SVM, Logistics Regression, where the scenario used is with and without feature selection. The result obtained is a classification with the addition of feature dimensions can increase the value of F1-score. The other result is the classification using feature selection research apparently can not increase the value of f1-score, although in others research [33,34,35], feature selection can increase the value of accuracy, validity of extracted information and reduce computational costs (processing time). Adding features automatically causes additional dimensions, so it is necessary to pay attention to the balance between the problem of dimensions and the addition of new features. The simplest way to add a feature is to add the degree and logarithm function of the original feature. This process has several stages that can increase the degree of new features or the number of multipliers in new features [32]. Meanwhile, to reduce the dimensions, the main component analysis is implemented after the feature addition procedure.

The consequence of applying feature expansion to the classification process is the addition of feature dimensions. High-dimensional data analysis is a challenge in the fields of machine learning and data mining. Complex multidimensional data usually has four types of features [34] namely high-weighted features, moderately weighted features, less-weighted features, zero-weighted features. With regard to these types of features, feature engineering requires substantial manual effort in designing and selecting features [32].

Naive Bayes classifier

Naive Bayes learning refers to the Bayesian probabilistic model that determines the posterior class probability, namely $P\left({y}_{j}|{x}_{i}\right)$. The simple Naïve Bayes classifier uses these probabilities to assign a class sample [37,38,39]. The Bayes theorem obtained is.

$$P\left({y}_{j}|{x}_{i}\right)=\frac{P \left({x}_{i}|{y}_{j}\right) P\left({y}_{j}\right)}{P \left({x}_{i}\right)}$$

(1)

where ${x}_{i}$:feature at $i$; ${y}_{j}$:class at $j$; $P\left({y}_{j}|{x}_{i}\right)$:probability of even ${y}_{j}$ given ${x}_{i}$ has occurred; $P \left({x}_{i}|{y}_{j}\right)$:probability of even ${x}_{i}$ given ${y}_{j}$ has occurred; $P\left({y}_{j}\right)$:probability of event ${y}_{j}$; $P \left({x}_{i}\right)$:probability of event ${x}_{i}$.

It is known that the feature set are${x}_{i}={x}_{1}, {x}_{2}, {x}_{3}, \dots ., {x}_{n}$, and class ${y}_{j}={y}_{1}, {y}_{2}, {y}_{3}, \dots ., {y}_{m}$, then the relationship between class ${y}_{j}$ and attribute ${x}_{i}$ can be described [40], as follows:Based on the naive Bayes network structure in Fig. 1, equation (1) can be developed into

$$P\left({y}_{j}|{x}_{1},{x}_{2}, {x}_{3}, \dots , {x}_{n} \right)=\frac{P \left({x}_{1},{x}_{2}, {x}_{3}, \dots , {x}_{n}|{y}_{j}\right) P\left({y}_{j}\right)}{P \left({x}_{1},{x}_{2}, {x}_{3}, \dots , {x}_{n}\right)}=\frac{\prod_{i=1}^{n} P\left({x}_{i}|{y}_{j}\right) P\left({y}_{j}\right)}{P \left({x}_{1},{x}_{2}, {x}_{3}, \dots , {x}_{n}\right)}$$

(2)

The denominator in Eq. (2) does not depend on the target class, but acts as a scaling factor ensuring that the posterior probabilities $P\left({y}_{j}|{x}_{i}\right)$ are properly scaled. So that the maximum posterior rule can be used, namely assigning each instance to exactly one class, by simply calculating the value of the quantifier for each class, then selecting the class with the maximum value [38]. The resulting selected class is referred to as the Maximum A Posteriori (MAP) class with the following formula.

$$\widehat{y}=arg\underset{{y}_{j}}{\text{max}}\prod_{i=1}^{n} P\left({x}_{i}|{y}_{j}\right) P\left({y}_{j}\right)$$

(3)

Maximum A Posteriori (MAP) Estimation can also be used as an estimate of $P(y)$ and $P\left({x}_{i}|y\right)$ [37, 39].

The proposed methods

Basically, the machine learning methods that have been used in previous research are only limited to predicting classification at this time. Likewise, the feature expansion method used in previous research was only limited to feature expansion for text data. Research on developing the Naive Bayes classification model is currently only used to predict classification at this time. Several previous studies carried out a prediction process for some time in the future using a linear regression model based on time-independent variables, by first carrying out classification using machine learning methods. In research that has been carried out previously regarding the development of classification prediction models, to date there has been no research that has developed a prediction model for classification of spatial-time data for the future directly using the time-based Naive Bayes method algorithm.

Therefore, in this research, a Naive Bayes classification prediction model was developed for the future with the scenario of expanding time-based features on spatial-time data, taking into account the stationarity of feature data over time. The proposed procedure is the development of a classification prediction model for time $t+k$, with the identification of a classification prediction model for time $t-k$, namely predicting the target class ${Y}_{t}$ using a combination of previous time $t-k$ features. The most optimal model combination of the previous $t-k$ classification prediction models, namely the model with the best accuracy, is selected as a candidate classification prediction model for the future $t+k$ time. The data matrix development framework with extended time-based features, the Naive Bayes classification model with expanded time-based features and its architecture will be explained in the algorithm and Fig. 2.

Experiment

This section describes the development of a feature expansion model based on the previous $t-k$ time, the development of the Naive Bayes Time-Based Feature Expansion model, and experiments using Naive Bayes-Time-Based Feature Expansion to build a classification prediction model for some future time. This study used 2 data sets, namely the Dengue Hemorrhagic Fever and Rainfall which are secondary data from the Bandung City Health Office and the Meteorology, Climatology and Geophysics Agency (BMKG). These two data are used as an example of implementing the development of a time-based classification prediction model to improve the performance of Naïve Bayes classification, which so far has not been able to predict classification in the future. The use of these two data is based on the same characteristics to be tested using the proposed model. This data is research data that has been used in previous studies which meets the characteristics of time series and spatial factors, so that it can be used to build time-based classification prediction models.

Time-based feature expansion process

The feature expansion technique developed in this research makes the feature dimension (N) to be used in the Naïve Bayes model increase k times (which depends on the time characteristics of the data). This is of course very different from traditional feature selection techniques which at most only produce N-dimensional features. Feature expansion in this research is carried out to obtain a feature matrix as input to Naïve Bayes classification as a t + k classification prediction model. The feature selection technique in this study is still carried out but the selection process is on the expanded feature matrix, which is used when determining the combination of features that produce the best classification performance for the t + k prediction model.

In performing feature expansion, one must consider that the resulting output must appropriate with the input of matrix format in the classification process using machine learning methods. There are two stages carried out in this feature expansion process. The first stage is carried out to determine the shape of the feature matrix for the standard classifier model. The second stage is carried out to develop new features based on the feature matrix of the first stage. The development of the feature matrix is conducted by expanding the feature column partition based on time k from the previous target class.

Table 1 shows that to classify data in the same time, it takes n features based on the original dataset at the same time. There are k model classifiers for different times that can be obtained from this dataset. In the first scenario, all data from all time are merged and one classifier model for all time is obtained based on this dataset.

Table 1 Feature matrix for data classification at the equal time

Unlocking the potential of Naive Bayes for spatio temporal classification: a novel approach to feature expansion

Abstract

Graphical Abstract

Introduction

Related work

Feature expansion

Impact of feature expansion

Naive Bayes classifier

The proposed methods

Experiment

Time-based feature expansion process

Naïve Bayes-Time-Based Feature Expansion

Data set

Performance of Naïve Bayes-Time-Based Feature Expansion

Vizualization of classification prediction

Experiment result

The performance of the classification prediction model for the previous \({\varvec{t}}-{\varvec{k}}\) times

Evaluation of the result

Features characteristics of the classification prediction model for the previous \({\varvec{t}}+{\varvec{k}}\) times

Visualization of classification prediction

Discussion and conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords