 Research
 Open Access
 Published:
Distance variable improvement of timeseries big data stream evaluation
Journal of Big Data volume 7, Article number: 85 (2020)
Abstract
Realtime information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMTDD) algorithm. The standard FIMTDD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet’s big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMTDD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.
Introduction
Realtime information mining for regression problems involving a huge time series dataset is becoming an increasingly challenging task in the data mining community. This condition has motivated some researchers to develop an incremental algorithm that executes fast and is able to accurately adapt to such problems. Adak and Akpinar [1] applied a hybrid approach combining the artificial honey bee algorithm and multiple linear regression for processing timeseries datasets. In addition, Aghaborzogi and Wah [2] employed a multistep clustering approach to discover valuable pieces of information from big timeseries datasets. Such an approach was taken since the classic data mining method proved to be ineffective at processing big timeseries datasets that generally have large dimensionality, high feature correlation, and vast amounts of noise [2].
Researchers have proposed a number of algorithms to address these issues. An incremental stream mining algorithm that can predict and form model trees was introduced by Ikonomovska et al. [3]. This algorithm uses the Standard Deviation Reduction (SDR) as a method to determine the splitting criterion and uses the Hoeffding bound to evaluate and determine the mechanism of the tree splitting process [4]. It uses the Binary Search Tree (EBST) as its tree structure and calculates a linear model for the leaves using the linear model perceptron. Moreover, the online change detection of this algorithm is measured using Page–Hinckley (Ph) change detection. We have improved the accuracy of the fastincremental model tree with drift detection (FIMTDD) algorithm [5], which was developed by Ikonomovska et al. [3]. The authors suggested using tanh as its activation function rather than using a linear activation function.
Zhang presented Bennettype generalization bounds for a learning process with independent and identically distributed (i.i.d.) samples [6]. The authors provide two types of Bennetdeviations: the first provides a generalization bound using uniform entropy numbers, and the second uses the Rademacher complexity. The results showed that an alternative expression that is developed results in a faster rate of convergence than traditional results. Beygelzimer et al. proposed an online algorithm to develop a logarithmic depth tree to predict the conditional probability of a label [7]. The natural reduction of the problem is examined to make a set of binary regressions in the form of a tree, and then it determines a regret bound that changes based on the depth of the tree. A new algorithm framework for nonparametric testing was developed in [8]. The authors presented sequential nonparametric testing with the law of the iterated algorithm. The novel approach presented in this paper conducts onthefly testing computations, which take linear time and constant space.
Several researchers use various techniques to assess large datasets. The discovery of a connection between the traffic flow and weather parameters is presented in [9]. This study constructs a deep belief network architecture to predict the weather and traffic flows. The results of this research showed that the weather data affected the traffic flow prediction, and a data fusion technique could increase the accuracy of traffic flow prediction.
A technique to predict the traffic flow using deep learning and Dempster–Shafer theory is introduced by Soua et al. [10]. In this research, the authors divided data into two categories: eventbased data and a data stream. The authors applied deep belief networks to predict the traffic flow and Tennet’s wind power plant dataset using the data stream and eventbased data, and Dempster–Shafer theory was used to renew the belief and integrate the results.
We have created a framework to visualize and predict very large traffic flows by using the FIMTDD algorithm. Detailed visualization of the traffic flow and Tennet’s wind power plant dataset is developed from the prediction system that has been trained using the datasets. The results of the research showed that the accuracy (measurement error) of the FIMTDD algorithm follows a decreasing trend in the stream evaluation process [11]. In our previous work, we have also proposed an intelligent system architecture based on a verified police department account [12]. The authors described the system architecture and algorithm that could be used to classify the street status into the low traffic flow, medium traffic flow, or high traffic flow. The authors used a standard neural network approach, which is called Learning Vector Quantization, to train the dataset and predict the traffic flow for 30–60 min ahead of the current time.
Another study proposed a mechanism using the first deep architecture model, which included stacked autoencoders, to learn the generic traffic flow features to be used in a prediction system [13]. The results of the research showed that this method could represent latent traffic flow. A greedy layerwise unsupervised learning algorithm was used to train the deep network, and the model parameters were tuned to improve the prediction performance. The authors’ proposed method is superior to the BP NN, RW, SVM, and RBF NN models.
Xia et al. studied traffic flow prediction using a Hadoop architecture [14]. A parallel KNearest neighbors approach was implemented using the MapReduce mechanism, which was used to predict traffic flows. Correlation analysis was also conducted using the MapReduce platform. A realtime prediction system was developed using two key modules: offline distributed training and online parallel prediction. The result of this research showed that the measurement error of the traffic flow prediction using correlation analysis was significantly improved compared to the ARIMA, Naive Bayes, and Multilayer Perceptron. Additionally, this method provided a solution that could be scaled up because it is implemented in the Hadoop platform.
Hou and Li presented a repeatability and similarity method to predict big traffic data in China [15]. By using the repeatability and similarity of the traffic flow, the authors were able to combine the predictions of short and longterm traffic flow forecasting. The results showed that the repeatability and similarity approach could effectively observe and predict the traffic flow of big data in China.
The review of the online evaluation of big data stream are compared to identify the models by A. Bifet [16]. The utilization of the prequential method to evaluate the result is used in this paper. The distributive regression task has been developed and tested to get a speedup of 4.7 × execution time compared to the sequential version [17]. The development of big data stream architecture for a certain area has been designed [18]. Some tools have been recommended to enable big data to be processed, such as Kafka, nimbus, zookeeper, Hadoop, and storm.
In this paper, we propose an improved method for big data stream problems. We decreased the Mean Absolute Percentage Error (MAPE) by 3% compared to our previous improvement [19], which used the Chernoff bound approach. Additionally, we decreased the MAPE by 12% compared to the standard method [3].
FIMTDD algorithm
Currently, datasets are increasing in size. An incremental algorithm to process vast data is needed because it is impossible to store and process the whole datasets at once. The FIMTDD algorithm works iteratively based on the instance’s arrival. This algorithm decides the best split for all its attributes. It will split attributes if the splitting criterion is met. Then, the adaptation strategy will be performed if the local concept drift occurs.
The attribute selection, which is used to determine the best attribute for samples, is conducted using the Hoeffding Bound and SDR. In particular, dataset S with the size of \(N\) is introduced. Attribute A will split the data into two categories \(S_{L}\) and \(S_{R}\) with the size of \(NL\) and \(NR\), respectively, where \(S\) = \(S_{L}\) U \(S_{R}\) \({\text{and }}N\) = \(NL\) + \(NR\). The SDR \(\left( {hA} \right)\) is calculated by Eq. (1).
It can be observed in Eq. (2) that the FIMTDD algorithm preserves the values of attributes y and y^{2}. We can see that the real random variable r is the ratio of the SDR values for \(hA\) and \(hB\); and its value varies between 0 and 1, depending on if \(\left( {hA} \right)\) is the best split of attribute A and \(\left( {hB} \right)\) is the best split of attribute B.
Then, the evaluation ratio can be obtained by Eq. (3). Each r of each stream can be represented by real numbers r_{1}, r_{2},…, r_{n}. To obtain a high confidence interval of the mean random variables, the FIMTDD uses the Hoeffding bound probability. It enables us to use 1 – δ, where the value of δ is 5%. This is the average of a random sample of \(N\) i.i.d. Variables with range R within a distance ε of the true mean.
Equation (4) can be used to calculate the value of ε.
When values are observed, the value of ε continues to decrease. The sample mean will approach the true mean. In this process, the Hoeffding bound contributes to decreasing the sum of a random variable’s deviation from its expected value. The FIMTDD algorithm calculates the lower and upper bounds of the estimated sample with Eq. (5).
The gradient descent method is used to calculate the weight update for every instance in the stream. It uses the linear perceptron to weight the relations among the parameters. The weights are updated regularly for every arrival of new instances. It does not use the whole dataset at once to calculate the weights. To be able to obtain the output value, every weight is updated using the difference of the normalized attributes (\(x_{i}\)), the real value (y), the learning rate (η), and the output (o). The formula for the weight is given in Eq. (6).
Before the learning process, the variables are categorized and changed into binary (numerical) variables. The normalization is conducted for all of the attributes. Therefore, all of the attributes will have the same effect in the filtering process. The normalization process is conducted incrementally.
Hoeffding bound
The Hoeffding Bound is used in the standard FIMTDD algorithm. It is defined as the following Equation. Let \(X_{i}\), where \(i = 1,2,3, \ldots ,N\), be an independent random variable such that \(PrPr \left( {X_{i} \in \left[ {a_{i} ,b_{i} } \right]} \right) = 1\). Then, for \(X = \mathop \sum \limits_{i = 1}^{N} X_{i}\) for all \(\varepsilon > 0\), we have the inequality in Eq. (7).
Assume \(\mathop \sum \limits_{i = 1}^{N} \left( {b_{i}  a_{i} } \right)^{2} = R^{2}\). Then, we obtain
because \(PrPr \left( {X  E\left[ X \right] \ge N\varepsilon } \right) \le \delta\). Thus, we can simplify Eq. (8) into
After solving for \(\varepsilon\) by taking the logarithm of both sides, we obtain the Hoeffding Bound as shown in Eq. (9).
We propose adding the values of k and m as the modified standard deviation and mean distance, respectively. Equation (14) depicts k, which is the actual value of the standard deviation d divided by the sum of the actual values y for n instances.
Based on Eq. (15), \(x_{i}\) is the feature of the dataset and \(S_{{}}\) is the sum of the value of the feature in one instance. In Eq. (16), we obtained \(f_{{}}\) from the sum of features value \(S_{{}}\) and divide it by \(i\), where \(i\) is the number of features. \(m\) is calculated based on the absolute value difference between \(f_{{}}\) and \(S_{{}}\) divided by \(f_{{}}\).
Therefore, we modify Eq. (5) as Eq. (19). It is modified with variable \(k\) from Eq. (14) and \(m^{{}}\) from Eq. (17).
Where \(\varepsilon\) is the value of the Hoeffding bound.
Results and discussions
In this research, we assessed our approach using three datasets that contain large numbers of instances. The first dataset is a traffic demand dataset, the second dataset is power system data, and the third dataset is water absorption in Chicago. We evaluate and compare our approach with the standard FIMTDD algorithm [3], our previous improvement of the FIMTDD Chernoff [19], and the current approach (Distance Improvement). According to the evaluation metrics (MAE, RMSE, and MAPE), our approach gives consistently lower errors compared to previous methods.
The traffic demand data were obtained from the Grab challenge. The goal of this challenge was to predict the order demand at a specific time and in a specific area. The features that are used to predict the traffic demand are the location, which is in the form of geocoding (location); day; hour; and minute. All locations have been masked to protect user privacy, and the traffic demand values have been normalized from 0 to 1 [22]. The number of instances in this dataset is 4,206,332.
The second dataset that we used is the power system dataset provided by the Open Power System Data (OPSD) [23]. The dataset used in this research are Tennet’s wind power plant dataset from Germany, which consisted of wind power generation data. These data were gathered from 2005 until 2018. The size of the dataset is 21.45 MB. The dataset contains 435,268 instances. The dataset has 9 attributes, which describe onshore and offshore wind plant’s actual power generation, and also its forecasted figures over 15 min time intervals.
The third dataset is obtained from sensors that were mounted by the Chicago government to measure water absorption from roads and sidewalks [24]. These data can be used to measure the development of the green infrastructure against flooding in the city of Chicago. The sensors also include sensors to obtain weather information. Each row of data represents the measurement results of the sensors for each time, location, and type of measurement. The number of instances or rows in these data is 31,642,635 rows. The size of this data set is approximately 3.7 GB. We convert timestamps to the day of the week format, and we use the period, which is the same as the road weather dataset and Traffic Demand Dataset.
We measured the errors from those datasets for every 100,000 instances for the traffic demand data, every 5000 instances for the power system data, and every 100,000 instances for the infrastructure monitoring data. These measurement points are based on the number of instances for each dataset to obtain the best graph visualization. The evaluation metrics for the measurement error that we used are the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), and the Mean Absolute Percentage Error (MAPE), as described by Eqs. (16), (17), and (18), respectively.
Here, f_{i} is the predicted values, y_{i} is the real values, and \(N\) is the amount of data in each stream.
The specifications of the computer that we used for the simulations are an Intel(R) Core(TM) i76800 K CPU @ 3.4 GHz, 32 GB of RAM, and a 2 TB hard disk drive. We modified the code of the FIMTDD algorithm from the Massive Online Analysis (MOA) application. The simulation is conducted on top of the MOA application [25]. The information that we measured from the simulation is the measurement errors (MAE, RMSE, and MAPE).
In this research, the measured MAEs for those three datasets are described in Fig. 1 (Traffic Demand Dataset), Fig. 2 (Power System Dataset) and Fig. 3 (Road Weather Dataset). The STD is used as the identifier for the results that are obtained by using the standard FIMTDD algorithm, IM is used as the identifier for the ChernoffBound approach [19], and ED is the identifier for our new approach (distance value).
Based on the results of our experiment, our approach can increase the accuracy and lower the error of the FIMTDD algorithm. In the first simulation using the Traffic Demand Dataset, the Hoeffding bound MAESTD is 0.069 at the 4,200,000th instance, whereas MAEIM is 0.049 and MAEED is 0.046. Similarly, in the second simulation using Tennet’s wind power plant dataset, the maximum MAESTD is 118.88 at the 435,000th instance, whereas the MAEIM and MAEED are 94.51 and 74.18, respectively. Moreover, for the simulation using the road weather dataset, MAESTD is 10.90 at the 5,000,000th instance, whereas MAEIM and MAEED are 2.38 and 1.12, respectively. The MAEED for the three datasets is lower than the MAESTD and MAEIM in every stream evaluation.
In this research, the measured RMSEs for those three datasets are described in Fig. 4 (Traffic Demand Dataset), Fig. 5 (Power System Dataset) and Fig. 6. (Road Weather Data). In the Traffic Demand Dataset simulation, RMSESTD is 0.12 at the 4,200,000th instance, whereas RMSEIM is 0.088 and RMSEED is 0.085. The Tennet’s wind power plant dataset simulation results in an RMSESTD of 380.51 at the 435,000th instance, whereas RMSEIM is 237.03 and RMSEED is 165.91. Moreover, for the simulation by using the Road weather dataset, RMSESTD is 23.13 at the 31,600,000th instance, whereas RMSEIM is 14.41 and RMSEIM is 9.42. Our approach produces a lower RMSEED compared to RMSESTD and RMSEIM in every stream evaluation.
The measured MAPEs for the three datasets described in Fig. 7 (Traffic Demand Dataset), Fig. 8 (Power System Dataset) and Fig. 9 (Road Weather Dataset). In the Traffic Demand Dataset experiment, MAPESTD is 1332% at the 4,200,000^{th} instance, whereas MAPEIM is 761%, and MAPEED is 714%. Additionally, for the Tennet’s wind power plant dataset simulation, MAPESTD is 6.62% at the 435,000^{th} instance, whereas MAPEIM is 5.29%, and MAPEED is 4.13%. In addition, for the simulation using the Road Weather Dataset, MAPESTD is 22.97% at the 31,600,00^{th}, for instance, whereas MAPEIM is 3.57% and MAPEED is 3.03%. MAPEED results in a lower MAPE than the MAPESTD and MAPEIM in every stream evaluation. Based on the MAPE result, our proposed MAPEED gives a lower MAPE compared to MAPESTD and MAPEIM.
The differences in MAPEED compared to MAPEIM are 47% for the traffic demand data, 1.16% for the power system data, and 0.54% for the road weather data. Comparing our approach with the standard method, the differences in MAPEED compared to MAPESTD are 618% for the traffic demand data, 2.49% for the power system data, and 19.65% for the road weather data. We use the MAPE to measure the error performance in the form of a percentage error. As seen in Figs. 7, 8, and 9, the overall MAPE measurements from those three datasets show that our proposed method can reduce the error percentage in every stream’s evaluation process.
The comparison summary of MAEs, RMSEs, and MAPEs value of the standard method (STD), Chernoffbound approach, and our improvement (distance variable) are described in Table 1. Based on Table 1 result, ED approach gives smaller errors compared to IM and STD method. Also, the evaluation of the real value, predicted value and the measurement error show that the distance means approach can accelerate the learning rate because it causes the tree to split more often in the early stage of the learning process.
Conclusion
The FIMTDD algorithm is a data mining method that enables us to perform data stream evaluations. The standard FIMTDD algorithm uses the Hoeffding bound method for its split criterion process. In this study, we evaluate and analyze the Distance Mean and Standard Deviation approach for the FIMTDD algorithm. We evaluate using three big timeseries datasets, namely, the Traffic Demand Dataset, Tennet’s wind plant power generation dataset, and the Road Weather Dataset. In all simulations, our proposed approach of the FIMTDD algorithm can consistently lower the error in every step of the learning process compared to the standard method and Chernoff method approach. Based on the experiments that we have conducted and the measurement errors that are produced, all measurement errors (MAE, RMSE, and MAPE) show that our approach has lower measurement errors compared to the previous approaches (Chernoff Bound) and the standard method. Our approach (distance mean) contributes by lowering the MAPE in every stage of learning by approximately 2.49% compared to the Chernoff Bound Method and 19.65% compared to the standard method. In the future, we plan to optimize and determine which bound is appropriate to be used for certain streams of data.
Availability of data and materials
Abbreviations
 FIMTDD:

Fast Incremental Model Tree with Drift Detection
 MAE:

Mean Absolute Error
 MAPE:

Mean Absolute Percentage Error
 RMSE:

Root Mean Square Error
 OPSD:

Open Power System Data
 SDR:

Standard Deviation Reduction
References
 1.
Adak, M. Fatih, and Mustafa Akpinar. 2018. A hybrid artificial bee colony algorithm using multiple linear regression on timeseries datasets.
 2.
Aghabozorgi S, Wah TY. Clustering of Large Time Series Datasets. Intelligent Data Analysis. 2014;18(5):793–817. https://doi.org/10.3233/ida140669.
 3.
Ikonomovska E, Gama J, Džeroski S. Learning model trees from evolving data streams. Data Min Knowl Disc. 2011;23(1):128–68.
 4.
Hoeffding W. Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association. 1963;58(301):13–30.
 5.
Wibisono, A., Wisesa, H.A., Jatmiko, W., Mursanto, P., Sarwinda, D. 2016. Perceptron rule improvement on FIMTDD for large traffic data stream. In: Proceedings of the International Joint Conference on Neural Networks. 2016; 5161–7.
 6.
Zhang, C., Bennetttype generalization bounds: Largedeviation case and faster rate of convergence. 2013. In: Uncertainty in Artificial Intelligence  Proceedings of the 29th Conference UAI 2013. 2013; 714–22.
 7.
Beygelzimer, A., Langford, J., Lifshits, Y., Sorkin, G., Strehl, A. 2009. Conditional probability tree estimation analysis and algorithms. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence UAI. 2009; 51–8.
 8.
Balsubramani, A., Ramdas, A. 2016. Sequential nonparametric testing with the law of the iterated logarithm. 32nd Conference on Uncertainty in Artificial Intelligence 2016 UAI 2016. 4251.
 9.
Koesdwiady A, Soua R, Karray F. Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE Trans Veh Technol. 2016;65(12):9508–17.
 10.
Soua, R., Koesdwiady, A., Karray, F. 2016. Bigdatagenerated traffic flow prediction using deep learning and dempstershafer theory. In: Proceedings of the International Joint Conference on Neural Networks. 2016; 3195–202.
 11.
Wibisono A, Jatmiko W, Wisesa HA, Hardjono B, Mursanto P. Traffic big data prediction and visualization using Fast Incremental Model TreesDrift Detection (FIMTDD). KnowlBased Syst. 2016;93:33–46.
 12.
Wibisono, A., Sina, I., Ihsannuddin, M.A., Hafizh, A., Hardjono, B., Nurhadiyatna, A., Jatmiko, W., Mursanto,.P. 2012. Traffic intelligent system architecture based on social media information, International Conference on Advanced Computer Science and Information Systems, ICACSIS. 2012; 25–30.
 13.
Y. Lv, Y. Duan, W. Kang, Z. Li and F. Y. Wang. 2015. Traffic Flow Prediction with Big Data: A Deep Learning Approach. In: IEEE Transactions on Intelligent Transportation Systems. vol. 16, p. 865–73.
 14.
Xia D, Li H, Wang B, Li Y, Zhang Z. A map reducebased nearest neighbor approach for bigdatadriven traffic flow prediction. IEEE Access. 2016;2016:2920–34.
 15.
Hou Z, Li X. Repeatability and Similarity of Freeway Traffic Flow and LongTerm Prediction Under Big Data. In: IEEE Transactions on Intelligent Transportation Systems. 2016; 1786–96.
 16.
Bifet A, et al. Efficient online evaluation of big data stream classifiers. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015.
 17.
Vu AT, et al. Distributed adaptive model rules for mining big data streams. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, 2014.
 18.
Ta VD, ChuanMing L, Goodwill WN. Big data stream computing in healthcare realtime analytics. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA). IEEE, 2016.
 19.
Wibisono A, Sarwinda D, Mursanto P. Tree stream mining algorithm with Chernoffbound and standard deviation approach for big data stream. Journal of Big Data. 2019;6:1.
 20.
Phillips, J.M. 2012. Chernoffhoeffding inequality and applications. arXiv preprint arXiv:1209.6396. 2012 Sep 27.
 21.
Y. Lv, Y. Duan, W. Kang, Z. Li and F. Y. Wang. 2015. Traffic Flow Prediction with Big Data: A Deep Learning Approach. In: IEEE Transactions on Intelligent Transportation Systems. 16, 2 (April 2015), 865–73.
 22.
Grab, Traffic Management Grab AI, https://www.aiforsea.com/trafficmanagement.
 23.
Open Power System Data (OPSD). (2018). Data Platform: Renewable Power Plants. https://data.openpowersystemdata.org/renewable_power_plants/.
 24.
Smart Green Infrastructure Monitoring Sensors  Historical, https://data.cityofchicago.org/EnvironmentSustainableDevelopment/SmartGreenInfrastructureMonitoringSensorsHist/ggws77ih, USDepartment of TransportationSeattle, Accessed 5 Apr 2019.
 25.
Bifet A, Holmes G, Kirkby R, Pfahringer B. MOA: massive Online Analysis. J Mach Learn Res. 2010;11:1601–4.
Acknowledgements
We would like to express our gratitude for the grant received from Universitas Indonesia.
Funding
Universitas Indonesia (2020)
Author information
Affiliations
Contributions
AW: Contributed the idea of the distance variable improvement, implemented the coding, created the simulation scenarios, measured the dataset simulations, revised the introduction and methods, added datasets, and revised the results & discussions. PM: Verified the experiment process, the complied data and the consistency of derived formula application; and revised the results, analysis and discussion sections. JA: Evaluated the model and verified the algorithm. WDWTB: Prepared and cleansed the dataset. MIR, LMH, VFA: Wrote the introduction, related work, and results of this paper. All authors read and approved the final manuscript.
Authors’ information
Ari Wibisono was born in Jakarta, December 27, 1988. Now, he works as a lecturer with the Faculty of Computer Science at Universitas Indonesia. He received his Master of Computer Science degree in 2012 and Bachelor’s Degree in 2010 from the Faculty of Computer Science, Universitas Indonesia. The author’s specific fields of interest are System Programming, Intelligent Systems, and HighPerformance Computing.
Petrus Mursanto was born in Surakarta, June 25, 1967. He has been working as a senior lecturer with the Faculty of Computer Science at Universitas Indonesia since 1992. He received his Doctoral degree from the Faculty of Computer Science at Universitas Indonesia. He obtained his Master’s degree in Computer Science from the University of Auckland in 1999 and his Bachelor’s degree in Electrical Engineering from Universitas Indonesia in 1992. The author’s fields of expertise are software engineering, reconfigurable computing, and digital technique design.
Jihan Adibah, Wendy D. W. T. Bayu, May Iffah Rizki, Lintang Matahari Hasani, and Valian Fil Ahli are students with the Faculty of Computer Science at Universitas Indonesia.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wibisono, A., Mursanto, P., Adibah, J. et al. Distance variable improvement of timeseries big data stream evaluation. J Big Data 7, 85 (2020). https://doi.org/10.1186/s4053702000359w
Received:
Accepted:
Published:
Keywords
 Intelligent Systems
 Data stream
 Distance improvement
 Big data regression