A novel hybrid model based on Hodrick–Prescott filter and support vector regression algorithm for optimizing stock market price prediction
 Meryem Ouahilal^{1}Email authorView ORCID ID profile,
 Mohammed El Mohajir^{2},
 Mohamed Chahhou^{2} and
 Badr Eddine El Mohajir^{1}
Received: 17 July 2017
Accepted: 20 September 2017
Published: 4 October 2017
Abstract
Predicting stock market price is considered as a challenging task of financial time series analysis, which is of great interest to stock investors, stock traders and applied researchers. Many machine learning techniques have been used in this area to predict the stock market price, including regression algorithms which can be useful tools to provide good performance of financial time series prediction. Support Vector Regression is one of the most powerful algorithms in machine learning. There have been countless successes in utilizing SVR algorithm for stock market prediction. In this paper, we propose a novel hybrid approach based on machine learning and filtering techniques. Our proposed approach combines Support Vector Regression and Hodrick–Prescott filter in order to optimize the prediction of stock price. To assess the performance of this proposed approach, we have conducted several experiments using real world datasets. The principle objective of this paper is to demonstrate the improvement in predictive performance of stock market and verify the works of our proposed model in comparison with other optimized models. The experimental results confirm that the proposed algorithm constitutes a powerful model for predicting stock market prices.
Keywords
Stock price prediction Financial time series forecasting Business analytics Support vector regression Noise filtering techniques Hodrick–Prescott filter Decision supportIntroduction
This paper addresses the issue of predicting stock market price in financial time series. Specifically we focus on the closing price which is the most uptodate valuation of a security until trading commences again on the next trading day. The closing prices provide a useful marker for investors to evaluate changes in stock market prices over time.
A financial time series consists of various components equivalent to shortterm irregular and seasonal variations, a mediumterm business cycle, and longterm trend movement. Most macroeconomic analysis is concerned with a mediumterm business cycle and longterm trend movement. However these fundamental movements are hidden in the original financial data because of multiple irregular and seasonal variations are dominant in the data [1].
Consequently, it is often difficult to read directly from the original data the fundamental movement of a financial variable under study. The financial time series includes some noise that may influence the information of the dataset. For better understanding and analysis of the data, and improve the accuracy of stock price prediction, noise filtering is necessary before using the predictive model.
On the other hand, there have been many studies using machine learning techniques to predict the stock market price [2]. A large number of successful applications have shown that regression algorithms, in particular, the support vector regression models, can be very useful tools for financial time series modeling and predicting [3, 4].
Support Vector Regression is one of the most powerful algorithms in machine learning. The theory has been developed over the last three decades by Vapnik, Chervonenkis and others [5–7]. There have been countless successes in utilizing SVR algorithm for stock market prediction. To mention a few, the author in [8] predicts future direction of stock price index using SVM model. In this study, he investigated the effect of the parameters in SVM. The goal was to find the optimal value of the parameters in order to improve the prediction results. The author also compared SVM with BPN and CBR. The experimental results showed that SVM outperformed BPN and CBR. The authors in [9] used support vector regression algorithm together with the independent component analysis (ICA) which is a statistical signal processing technique to implementing financial time series forecasting. The experimental results showed that their proposed model outperformed the SVR model without the ICA filtering. Recently, researchers in [10] predicted the stock market price of real world datasets using a hybrid model based on support vector regression and modified harmony search algorithm. The proposed method was tested on two sets of reliable financial datasets and experimental results on time series data showed that the proposed model improved accuracy of prediction compared to other optimization methods.
However, all these researchers have raised the problem of high noise in the financial time series, as well as some financial time series only possess poor information and small sample size. Predicting them with SVR directly is probably sensitive to the noise and may lead to overfitting.
The significance and novelty of this paper are summarized as follows:
A novel efficient algorithm for stock market price prediction
We develop an efficient algorithm for predicting stock market price based on machine learning and noise filtering techniques. Our proposed algorithm is based on a hybrid approach which combines support vector regression algorithm and Hodrick–Prescott filter, in order to improve the performance of the stock market price predictions.
Empirical demonstration of the effectiveness of our approach
We use real world datasets to compare our method with other existing financial time series predictive methods. To assess the performance of this proposed approach, we have conducted several experiments using real world datasets of different moroccan financial time series. The experimental results show that the proposed framework is a powerful predictive tool for stock market price.
The rest of this paper is organized as follow. Section II gives a short overview of financial time series. In the section III we present different filtering techniques used in economic field including: Hodrick–Prescott filter, Christiano Fitzgerald filter and Baxter king filter, and how to use them for noise filtering. Section IV provides a brief theoretical overview of some machine learning techniques which are regressive predictive algorithms including: Decision Tree Regression, Multiple Linear Regression and Support Vector Regression. Section V presents our methodology. Section VI focuses on prediction of stock market prices using our hybrid approach. Section VII evaluates our proposed model by conducting additional experiments on eight different financial time series with different sizes. Section VIII discusses the experimental results of the case study. Finally, section IX concludes this work and presents some direction for future research.
Related work: financial time series

Constructing a model that represents a time series.

Using the model to predict future values.
The goal of building a time series model is the same as the goal for other types of predictive models which is to construct a model such that the error between the predicted value of the target variable and the observed value is as small as possible.
Time series prediction is one of the most basic predictive analytics needs of several businesses. Many data elements are observed as time series. These may be product sales, stock market prices and so on. From a strategic perspective, managers and decision makers will regularly need to be able to predict trends and seasonal patterns for these elements.

Qualitative techniques refer to a number of forecasting approaches based on subjective estimates from informed experts. Usually, no statistical data analysis is involved. Rather, estimates are based on a deliberative process of a group of experts, based on their past knowledge and experience. Examples are the Delphi technique and scenario writing. These approaches are useful when good data are not available, or we wish to gain general insights through the opinions of experts.

Quantitative Techniques refer to forecasting based on the analysis of historical data using mathematical and statistical principles and concepts. The quantitative forecasting approach is further subdivided into two parts: causal techniques and time series techniques.

Causal techniques are based on regression analysis that examines the relationship between the variable to be forecasted and other explanatory variables.

Time Series techniques usually use historical data for only the variable of interest to forecast its future values (see Table 1).

Time series forecasting techniques
Categories  Application  Specific techniques 

Qualitative techniques  Useful when historical data are scare or nonexistent  Delphi technique Scenario writing Visionary forecast Historic analogies 
Causal techniques  Useful when historical data are available for both the dependent (forecast) and the independent variables  Regression models Econometric models Leading indicators Correlation methods 
Time Series techniques  Useful when historical data exists for forecast variable and the data exhibits a pattern  Moving average Autoregression models Seasonal regression models Exponential smoothing Trend projection Cointegration models 
The causal techniques will be more developed in the “Predictive analysis technique” of this research work.
An economic time series consists of several components corresponding to shortterm irregular and seasonal variations, a mediumterm business cycle, and a longterm trend movement. Most macroeconomic analysis is concerned with a mediumterm business cycle and a longterm trend movement. However these fundamental movements are hidden in the original economic data because multiple irregular and seasonal variations are dominant in the data [12, 13].
Therefore it is often difficult to read directly from the original data the important movement of an economic variable under study. The financial time series contains some noise that may influence the information of the dataset. For better understanding and analysis of the trend, and improve the accuracy of stock price prediction, noise filtering is necessary before using the predictive model.
Noise filtering techniques
It is expected for a time series to contain some noise that may influence the whole information of the dataset. In the stock market, the volume of stocks vary every day and don’t show any signs for prediction in the stock market, therefore resulting in difficulty to understand the trend of the change in it.
However, for a macroeconomic perspective of the stock market, the longterm trend should be predicted and analyzed. Although this longterm trend cannot give an clear indication which specific stock will rise tomorrow, it reveals nonetheless the performance of the whole investment environment and to a certain extent gives important hints helping make decisions on the stock market [11, 14].
To better understand and analyze the trend, noise filtering is essential. In this research, we will evaluate three different noise filtering techniques and compare their effectiveness on the financial time series analysis.
Hodrick–Prescott filter
The Hodrick–Prescott filter is a mathematical tool used in macroeconomics, specifically in real business cycle theory, to eliminate the cyclical component of a time series from raw data.
It is used to obtain a smoothedcurve representation of a time series, one that is more sensitive to longterm than to shortterm fluctuations. The adjustment of the sensitivity of the trend to shortterm fluctuations is achieved by modifying a multiplier λ.
The HP filter removes a smooth trend τ _{ t } from a time series y _{ t } by solving the minimization problem.
The parameter λ penalizes fluctuations in the second differences of y _{ t }, and must be specified by the user of the HP filter [15–17].
Baxter king filter
BaxterKing band pass filter is a method of smoothing the time series, which is a modification of the Hodrick–Prescott filter that provides wider opportunities for removing cyclical component from a time series.
The filter method consists of singling out the repeated component of a time series by setting the width for oscillations of periodic component. BaxterKing filter is a band pass filter that removes the cyclical component from the time series based on weighted moving average with specified weights.
A generalized BaxterKing filter is applied to nonstationary time series. Nonstationarity is accounted for in the matrix of weights that depend on the observation number in generalized model [18, 19].
Christiano Fitzgerald filter
The Christiano–Fitzgerald random walk filter is a band pass filter that was constructed on the same principles as the Baxter and King (BK) filter. These filters formulate the detrending and smoothing problem in the frequency domain. Should we have continuous and/or infinitely long time series the frequency filtering could be an exact procedure. However the granularity and finiteness of real world time series do not permit perfect frequency filtering. Both the BK and CF filters approximate the ideal infinite band pass filter. The Baxter and King version is a symmetric approximation, with no phase shifts in the resulting filtered series. But symmetry and phase correctness comes at the expense of series trimming. Depending on the trim factor a certain number of values at the end of the series cannot be calculated. There is a tradeoff between the trimming factor and the precision with which the optimal filter can be approximated. On the other hand the Christiano–Fitzgerald random walk filter uses the whole time series for the calculation of each filtered data point. The advantage of the CF filter is that it is designed to work well on a larger class of time series than the BK filter, converges in the long run to the optimal filter, and in real time applications outperforms the BK filter.
Predictive analytics techniques
Predictive analytics determines what is likely to happen in the future. This analysis is based on machine learning and statistical techniques as well as other more recently developed techniques that fall under the general category of data mining. The objective of these techniques is to be capable to provide predictions and forecasts about the future of the businesses activities.
There have been many studies using machine learning techniques to predict the stock market price. A large number of successful applications have shown that regression algorithms can be very useful tools for financial time series modeling and forecasting [2, 22, 23].
Regression is a data mining function that predicts a number. A regression task begins with a data set in which the target values are known. Regression models are tested by computing various statistics that calculate the difference between the predicted values and the expected values. The historical data for a regression project is typically divided into two data sets: one for building the model, the other for testing the model.
Regression modelling has many applications in trend analysis, business planning, marketing, financial forecasting, time series prediction, and environmental modelling. There are different families of regression algorithms and different ways of measuring the error [23, 24].
Decision tree regression
Decision tree builds regression or classification models in the form of a tree structure. It decomposes a dataset into smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The highest decision node in a tree which corresponds to the best predictor named root node. Decision trees can manipulate both categorical and numerical data.
The core algorithm for building decision trees named ID3 by Quinlan which engages a topdown, greedy search through the space of possible branches with no backtracking. The ID3 algorithm can be used to build a decision tree for regression by changing Information Gain with Standard Deviation Reduction.
The standard deviation reduction is based on the reduction in standard deviation after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest standard deviation reduction (i.e., the most homogeneous branches) [24, 25].
Multiple linear regression
The linear regression models we represent in this session can be represented as a sum of k terms from this complete linear model, developed in a generic form as.
Support vector regression
In machine learning, support vector machines (SVM) are supervised learning models with associated learning algorithms that analyse data used for classification and regression analysis. Support Vector Machine is one of the most powerful algorithms in machine learning. The theory has been developed over the last three decades by Vapnik, Chervonenkis and others. When support vector machines were used to solve the regression problem they were usually called support vector regression.
When the error of estimation is taken into account, introduction of two positive slack variables ζ and ζ^{*} is to represent the distance between the actual value and the corresponding boundary values.
Different kernel functions are nominated; in fact, there is different network structure in support vector machines. The selection of kernel function is important to the effectiveness of Support Vector Regression. However, there is no mature theory in the selection of kernel function of SVR [24, 27].
Methodology
In this paper, we propose a novel hybrid approach based on the combination of the Hodrick–Prescott filter (HP) and the Support Vector Regression algorithm (SVR). Therefore, we propose a new Framework of financial time series prediction based on our hybrid approach [28].
The objective of this approach is to improve and optimize SVR model predictions with the help of HP filter that will parse and normalize our data by filter and remove all existing noise in our financial time series.
Stock market prediction
Dataset description
The closing prices provide a useful marker for investors to assess changes in stock prices over time—the closing price of one day can be compared to the previous closing price in order to measure market sentiment for a given security over a trading day [29].
Thus, the closing price is selected as our prediction target of the original data set. The data were daily collected by IAM during the period from 2004 to 2016. Our data set has 6 attributes and 2840 samples with a size of 112 ko. They are Date, Open price, Close price, High price, Low price, and Volume. The goal is to predict Close price for different amount of time in the future.
Our hybrid approach
The regression analysis focuses on the Close price on the (t + 1)th day changes when the Open price, Close price, High price, Low price and Volume on the ith day vary.
Before performing the regression, we need to use Hodrick–Prescott filter to filter noise and normalize the data value on each attribute separately.
The goal of Hodrick–Prescott filter is to decompose the time series into several series with common frequencies. We want to decompose the data into the trend and the cyclical components.
Experimental results
The regression performances vary with different selection of four important parameters: (1) the kernel function (2) penalty parameter c (3) kernel parameter g and (4) degree of the kernel function d. Fourstage grid search is used to find the best combination of parameters for each filter.
In prediction experiments, the data are divided into two subsets. The data from December 2004 to December 2012 was employed as training set used for training the models of the algorithms.

The data from January 2013 to December 2013 are employed as first fold of testing set.

The data from January 2013 to December 2014 are employed as second fold of testing set.

The data from January 2013 to December 2015 are employed as third fold of testing set.

The data from January 2013 to July 2016 are employed as fourth fold of testing set.
Kernel parameters setting
Regressive model  Kernel  C  G  D 

SVR  rbf  250  0.01  3 
SVR + HP  rbf  275  0.1  3 
SVR + CF  rbf  150  0.01  3 
SVR + BK  rbf  250  0.1  3 
The error rate is computed between the actual and predicted stock prices come from the experiments. To calculate the error rate, Mean average percentage error (MAPE) is used in this study.
MAPE (Error rate) of the predictive models
Years  SVR  SVR + HP  SVR + CF  SVR + BK 

2013  0.27  0.10  0.15  0.21 
2013–2014  0.25  0.08  0.13  0.175 
2013–2015  0.24  0.065  0.12  0.17 
2013–2016  0.21  0.05  0.09  0.15 
Where HP is the Hodrick–Prescott filter, CF is the Christiano Fitzgerald filter and BK is the BaxterKing filter.
Evaluation of our model
In this section, we evaluate the effectiveness of our model by conducting additional experiments based on our hybrid approach on different financial time series with different sizes.
Datasets description
Data sets description
Dataset  Period  Attributes  Samples  Size (ko) 

Addoha holding  10/2006–08/2017  6  2750  108 
Auto haul  11/2007–08/2017  6  2021  71 
BCP bank  07/2004–08/2017  6  3210  121 
BMCE bank  07/2000–08/2017  6  4126  149 
Delta holding  05/2008–08/2017  6  2227  82 
Total macro  05/2015–08/2017  6  510  20 
Wafa assurance  11/2000–08/2017  6  3250  122 
CIH bank  09/2000–08/2017  6  3808  141 
The time series data were daily collected by different companies during different periods. Our datasets has six attributes. They are Date, Open price, Close price, High price, Low price, and Volume. The goal is to predict the Close price using 4 different models including: SVR, SVR + HP, SVR + CF and SVR + BK.
Methodology

Decomposing the time series into trend and cyclical components using three different filter techniques.

Implementing the regressive model using the support vector regression algorithm on the trend component of the time series.
The regression analysis focuses on the Close price on the (t + 1)th day changes when the Open price, Close price, High price, Low price and Volume on the ith day vary.
Experimental results
Before performing the regression, we need to use a different filter technique for each experiment to filter the noise and normalize the data values on each attribute separately. These filter techniques include: Hodrick Prescott filter, Christiano Fitzgerald filter, Baxter King filter.
The goal of these filters is to decompose the time series into several series with common frequencies. We want to decompose the data into the trend and the cyclical components
We have implemented our regressive models using Python as a programming language with different Python libraries and packages including: Scikitlearn and Statesmodels.

The kernel function f

The penalty parameter c

The kernel parameter g and

The degree of the kernel function d.
Kernel parameters setting
Regressive model  Kernel  C  G  D 

SVR  rbf  175  0.01  3 
SVR + HP  rbf  275  0.01  3 
SVR + CF  rbf  250  0.1  3 
SVR + BK  rbf  200  0.01  3 
The error rate is computed between the actual and predicted stock prices come from the experiments. To calculate the error rate, Mean average percentage error (MAPE) is used in this study.
MAPE (Error rate) of the predictive models
Dataset  Model  

SVR  SVR + HP  SVR + CF  SVR + BK  
Addoha holding  0.21  0.09  0.12  0.16 
Auto haul  0.26  0.12  0.15  0.18 
BCP bank  0.25  0.13  0.16  0.21 
BMCE bank  0.21  0.07  0.11  0.19 
Delta holding  0.22  0.12  0.14  0.17 
Total macro  0.19  0.13  0.15  0.16 
Wafa assurance  0.24  0.11  0.18  0.22 
CIH bank  0.19  0.08  0.13  0.15 
Where HP is the Hodrick–Prescott filter, CF is the Christiano Fitzgerald filter and BK is the BaxterKing filter.
Discussion of the experimental results
To assess the performance of our proposed approach we have conducted several experiments for predicting stock price with different Support Vector Regression models using different kind of filters.
The HP method is a twosided filter capable of providing a smooth estimate of the longterm trend component of a series, as well as the corresponding cyclical component. The HP method minimizes the variance of a series around a parameter that approaches a linear trend [16, 17]. The CF and BK methods are both bandpass or frequency filters and are capable of isolating the cyclical component of a time series. These linear filters utilize a twosided weighted moving average of the data in which the cycles, within some ‘‘band’’, are extracted and remaining cycles are filtered out [19–21].
After our filtering analysis we have found that the trend component produced by the Hodrick Prescott filter preserves the time series curve, unlike the trend component produced by the CF filter or the BK filter which tends to slightly modify the time series curve. That is why we have noticed the small difference in performance produced by these filters.
On the other hand, the prediction results change from a dataset to another due to the sample size and the time series structure. Therefore, we have found different optimum parameters of the support vector regression model suited to each case of study.
The objective was to verify that the combination of Support Vector Regression model and Hodrick–Prescott filter provide the best results of stock price prediction compared to the other filters.
Effectively, the combination of Support Vector Regression model and Hodrick–Prescott filter provide the best results since the MAPE error given by this model is the lowest among all proposed error rate.
According to our experimental results we can clearly conclude that the proposed framework using our hybrid approach which combines the Support Vector Regression model and the Hodrick–Prescott filter is a powerful predictive tool for stock market price and financial time series.
Conclusion and direction for future research
Predicting stock market prices is a major factor in stock market prediction and has been paid much attention. Therefore, the applications of regression model in financial field are a meaningful attempt.
In this research work, we proposed a novel hybrid approach of predicting stock price based on machine learning and filtering techniques which combines Support Vector Regression algorithm and Hodrick–Prescott filter in order to optimize the stock price prediction.
To assess the performance of this proposed approach, several experiments have been conducted using real world datasets. The objective was to verify that the combination of Support Vector Regression model and Hodrick–Prescott filter provide the best results of stock price prediction compared to the other filters. The experimental results show that compared with the SVR, SVR + CF and SVR + BK models, the proposed model is an effective method for predicting stock price, which greatly improves the accuracy of forecasting. Therefore we can confirm that the proposed model is a powerful predictive solution for the stock market prices.
Our proposed model provides a very good accuracy of stock market price prediction with a very minimalist execution time. The proposed model which combines the SVR model and the HP filter outperforms the standard SVR model and the other optimized model of SVR checked in this research work.
However, the stock market price not only depends on historical data but also greatly influenced by the macroeconomic factors and important news in the world. These limitations lead us to some problems to be solved.
Going forward, we have several additional avenues which we would like to explore. We plan to study the impact of the macroeconomic factors and some big news on the stock market performance in the Moroccan context. This study will allow us to identify the most relevant factors that can be incorporated into our predictive model in order to improve our financial prediction results. We also plan to test our methodology for different industries and check the results on different sets of real world data.
