Implementation of Long Short-Term Memory and Gated Recurrent Units on grouped time-series data to predict stock prices accurately

Stocks are an attractive investment option because they can generate large profits compared to other businesses. The movement of stock price patterns in the capital market is very dynamic. Therefore, accurate data modeling is needed to forecast stock prices with a low error rate. Forecasting models using Deep Learning are believed to be able to predict stock price movements accurately with time-series data input, especially the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) algorithms. Unfortunately, several previous studies and investigations of LSTM/GRU implementation have not yielded convincing performance results. This paper proposes eight new architectural models for stock price forecasting by identifying joint movement patterns in the stock market. The technique is to combine the LSTM and GRU models with four neural network block architectures. Then, the proposed architectural model is evaluated using three accuracy measures obtained from the loss function Mean Absolute Percentage Error (MAPE), Root Mean Squared Percentage Error (RMSPE), and Rooted Mean Dimensional Percentage Error (RMDPE). The three accuracies, MAPE, RMSPE, and RMDPE, represent lower accuracy, true accuracy, and higher accuracy in using the model.

. Therefore, forecasting models using technical factors must be careful, thorough, and accurate, to reduce risk appropriately [3].
There are many stock trading prediction models have been proposed, and mostly using technical factor on daily stock trading as the data features, i.e., high, low, open, close, volume and change prices. The high and low prices are, respectively the achievement of the highest and lowest prices in a day. The open and close prices are the opening and closing prices of the day, respectively. Volume is the number of exchanges traded, and change is the percentage of price movements over time [4,5].
Nowadays, the development of computing technology to support Deep Learning (DL) is growing very rapidly, one of which is the use of the Graphics Processing Unit (GPU) that supports data learning. The data training process will be many times faster when using a GPU compared to a regular processor [6]. Recurrent Neural Network (RNN) is one of the DL prediction models on time-series data such as stock price movements. The RNN algorithm is a type of neural network architecture whose processing is called repeatedly to process input which is usually sequential data. Therefore, it is very suitable for predicting stock price movements [6,7]. There are two most widely used RNN development architectures, i.e., Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU).
Several previous studies predicted stock prices with various approaches including conventional statistics, heuristic algorithms and also Machine Learning. Predictions generally use four value features, i.e., open, close, high and low values, unfortunately the highest accuracy can only be achieved at 73.78%. Thus the results were less realistic and not in accordance with the actual stock price [8]. Meanwhile, another study used Deep Learning approach of the LSTM neural network to estimate financial time series on returns data from three stock indices of different market sizes, i.e., the large NYSE S&P 500 market in the US, the emerging market Bovespa 50 in Brazil, and OMX 30 as a small market in Sweden. They showed the output of the LSTM neural network is very similar to the conventional time series model of ARMA(1,1)-GJRGARCH(1,1) with regression approach. However, when trading strategies are implemented based on the direction of change, deep LSTM networks far outperform time series models. This indicated the weak form of the efficient market hypothesis does not apply to the Swedish market, while it does to the US and Brazilian markets. It also suggested the American and the Brazilian markets are more data driven compared to the Swedish market [9]. This paper proposes eight new architectural models for stock price forecasting by identifying joint movement patterns in the stock market as the main contribution. The technique is to combine the LSTM and GRU models with four neural network block architectures. The pattern of movement along with stock prices on the stock exchange can be known by first letting the LSTM/GRU learning model work independently to determine the predicted value of each company. Then, the output value from all companies is accepted as a flattened array input into a concatenation model which produces an output shape according to the GRU/LSTM learning model used. The output shape is then processed in the proposed LSTM/GRU models as usual before being distributed to the LSTM/GRU model of each company in parallel to predict the stock price. The proposed architectural model is evaluated using three accuracy measures obtained from the loss function Mean Absolute Percentage Error (MAPE), Root Mean Squared Percentage Error (RMSPE), and Rooted Mean Dimensional Percentage Error (RMDPE). The three accuracies, MAPE, RMSPE, and RMDPE, represent lower accuracy, true accuracy, and higher accuracy in using the model.

Related works
Several studies based on traditional machine learning in forecasting stock price trends through cumulative ARIMA combined with the least squares support vector machine model (ARIMA-LS-SVM) are given in [10]. Another machine learning method uses data for dimensionality reduction and uses kNNC to predict stock trends [11]. The development of machine learning methods combined with statistical methods, namely ARMA-GARCH-NN, is able to capture intraday patterns from the [12] stock mark. In recent years, machine learning methods have been greatly developed for the purpose of predicting stock prices [13,14]. Table 1 presents a summary of three machine learning methods combined with statistical methods for forecasting stock prices with time-series data.
Another investigation using LSTM to predict the stock market under different fixed conditions is given in [15]. Their experimental results show that LSTM has a high predictive accuracy [15,16]. In addition, several papers combine deep learning and denoising methods to improve the predictability of deep learning models. The other results presented an improved RNN using efficient discrete wavelet transform (DWT) to predict high-frequency time series, and it was concluded that the highorder B-spline wavelet model d (BSd-RNN) performed well [17]. Proposed stock market forecasting model based on deep learning taking into account investor sentiment, and combined with LSTM to predict the stock market is given in [18,19]. Table 2 recapitulates the implementation of deep learning methods to predict stock prices.
A number of studies also have concentrated on transfer learning for stock prediction. Nguyen and Yoon presented a novel framework, namely deep transfer with related stock information (DTRSI), which took advantage of a deep neural network and transfer learning to solve the problem of insufficient training samples [20]. Other transfer learning methods were presented to solve the problem of poor performance of applying deep learning models in short time series [21]. Another paper proposed an algorithm to solve the problem of insufficient training data and differences in the distribution of new and old data [22]. Overall, we found that the majority of studies concentrated on transfer learning aimed to solve the problem of insufficient training data or differences in data distribution. Table 3 presents three models for transfer learning using RNN variants. As mentioned above, some studies use traditional machine learning methods and various hybrid models to predict stock price, and some use deep learning models to predict stock price. However, these models are almost trained by the stock data feature only, without consideration of introducing external useful information through transfer learning. In particular, the interaction between stocks with upstream and downstream information is not considered. At the same time, most of the transfer learning mainly aims to solve the problem of insufficient training data or data distribution differences, and there is no research on introducing external information of upstream and downstream. And the deep learning model is superior to the traditional machine learning algorithm in many time series prediction work. Therefore, this study proposes an appropriate method, which can better predict the trend of stock price.

Proposed method
In general, the proposed investigation method mainly consists of three stages, i.e., the pre-processing or data preparation, data processing or model building, and finally the post-processing or performance evaluation. The method workflow is depicted in Fig. 1 and its stages are explained in the following sub-sections.

Data source
The data source used in this experimental investigation is a collection of historical data on company stock prices obtained from the Yahoo Finance website https://finance.  yahoo.com/, a provider of stock market financial data. We investigated the collection of time series stock data of four companies coded AMZN, GOOGL, BLL and QCOM for 12 years between January 4, 2010, and February 3, 2022. Each company has 15,220 datasets consisting of four price data and one volume data that make up 60,880 data. Fig. 2 shows an example of AMZN stock price time series data for two weeks, 08 Feb 2022 -22 Feb 2022. It should be noted that the stock market does not trade on any holidays,  including Saturdays and Sundays. In the figure there is no stock data for holidays, i.e., there is no data for 11-12 Feb 2022 and 19-21 Feb 2022. Since the close, open, high, and low price positions in one trading day are almost the same, the data analysis is focused on the close price feature, i.e., the daily closing price for each stock. Moreover, the close price is the most important price in conducting technical analysis between open, high, and low prices. The closing price also reflects all information available to all market participants at the end of the stock trading.

LSTM and GRU algorithms
Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are two variants of Recurrent Neural Networks (RNN) that enable long-term memory. The RNN learns by re-propagating the gradient when looking for the optimal value. However, the gradient may disappear or diverge if t becomes longer. This happens because ordinary RNNs do not adequately train long-term memory that relies on sequential data. LSTM and GRU are proposed as algorithms to cope with this issue. RNN has only one activation function in the intermediate layer, whereas LSTM and GRU have multiple activation functions with complex advanced operations performed on various gates. [23][24][25][26].
LSTM has variable C t for long-term information storage in its cells or blocks. The old information is removed or new information is updated to the C t to activate the corresponding long-term memory. The arithmetic portion in the intermediate layer of the LSTM is called the cell or block [27]. The structure of the LSTM block and its gates is given in Fig. 3(a) following a brief description its gates and the respective computations according to the purpose of their operation.
1. Input Gate. The candidate long-term memory in the current cell state C t and the storage rate i t are calculated using Eq. 1 and 2, respectively.
(1) 2. Forgetting Gate. This gate controls to forget information from long-term memory.
The storage rate f t is calculated using Eq. 3.
3. Output Gate. The output value o t and h t are, respectively, computed using Eqs. 4 and 5.
4. Memory Update. The latest long-term memory C t is updated using Eq. 6.
The GRU is another RNN that enables long-term memory with more simple structure than LSTM. Fig. 3(b) depicts the GRU block structure and its two gates.
1. Reset Gate. The memory rate r t is calculated using Eq. 7 to control the forgotten or retained long-term memory.
where x t and h t−1 are the current data and the previous memory, respectively. 2. Update Gate.The long-term memory h t is updated using Eqs. 8, 9 and 10, then it is passed to the next state.

Proposed LSTM and GRU architectural models
The joint-movements pattern of the stock price on the stock exchange can be identified by first allowing the LSTM/GRU learning model to work independently to determine the predicted value of each company. Then, the output value from all companies is accepted as a flattened array input into a concatenation model which produces an output shape according to the GRU/LSTM learning model used. The output shape is then processed in the proposed LSTM/GRU models as usual before being distributed to the LSTM/GRU model of each company in parallel to predict the stock price.
We propose four LSTM/GRU model architectures, which lie between the post-concatenating value stage and before the individual stock price forecasting stage, i.e., an architectural model that processes the output shape of 40 blocks with an array of 640 values. (3)

Model-1: direct model
This model directly distributes the output shape of (None, 40, 640) to the four LSTM/ GRU models to predict the stock price of each company. The architectural model of this Model-1 is depicted in Fig. 4.

Model-2: downsizing model
The model downsizes the output shape of (None, 40, 640) to become 160 values and after that, it distributes the downsized shape to the four LSTM/GRU models to predict the stock price of each company. The architectural model of this Model-2 is depicted in Fig. 5.

Model-3: tuned downsizing model
The model downsizes the output shape of (None, 40, 640) to become 160 values, and then it tunes the parameters by applying dropout. After that, it distributes the shape to the four LSTM/GRU models to predict the stock price of each company. The architectural model of this Model-3 is depicted in Fig. 6.

Model-4: stabilized downsizing model
The model downsizes the output shape of (None, 40, 640) to become 160 values and then stabilizes its values by applying another LSTM/GRU. Finally, it distributes the shape to the four LSTM/GRU models to predict the stock price of each company. The architectural model of this Model-4 is depicted in Fig. 7.

Performance measurement
In the context of predictive model optimization, the function used to evaluate the performance model is the loss error function or the difference between the actual and predictive of the response/label values. The loss function in this paper uses Mean Absolute Percentage Error (MAPE), Rooted Mean Square Percentage Error (RMSPE), and Rooted Mean Dimensional Percentage Error (RMDPE). Equations 11 and 12 give the calculation formulas for MAPE and RMSPE values, respectively [28][29][30]. We define another new loss function called RMDPE as given in Eq. 13 based on the Minkowski distance.
where n, y i and ŷ i are the number of data, the actual and prediction of the i th data, respectively. The validation and accuracy metrics of the model are determined by the error value based on the RMSE and MAPE by extracting them from 1. The proposed models are evaluated using the accuracy measures obtained from the loss functions MAPE, RMSPE and RMDPE. The three accuracies represent measures to suggest the level of risk and opportunity in using the model. The accuracy obtained from MAPE represents the upper limit of the accuracy model, which can be the highest percentage of opportunities achieved using the forecasting model. MAPE is the yield from the normalized absolute distance (Manhattan distance), producing the closest distance to the origin/actual value. The accuracy obtained from RMDPE represents the lower limit of the accuracy model that can be meant as the lowest risk percentage that can be achieved using the forecasting model. RMDPE is generated from the normalized Mankowski distance, resulting in the furthest distance from the actual value.

Preprocessing result
Preprocessing or data preparation is a very important stage to make the raw data into quality data that is ready to be processed according to model development needs to model evaluation. It is the initial data processing in advance to be trained in building the model while being validated up to data testing to evaluate the performance of the built model. The following are four sequential steps in the data preprocessing stage.

Company grouping
There were four selected companies with complete time-series data to be grouped into two groups based on the company's stock prices; i.e., the higher and lower stock price. The selection of the four companies is considered to represent the same technical price behavior that occurs in the NasdaqGS stock market which is downloaded from the website Yahoo! Finance. Company codes in the higher stock group are AMZN and GOOGL, whereas company codes with lower stock price are BLL and QCOM. Fig. 8 clearly shows the price differences between the two groups of companies.

Data normalization
The normalization is meant to rescale all data into the same specified data range. The purpose of data normalization is to avoid the influence of the dominant pattern of stock price behavior that has a greater value over the features of smaller stocks. The use of the same value range will provide a pattern of actual stock price behavior that generally applies or occurs in a stock exchange market [31]. This process scales the stock data values into a value range from 0 to 1. The Min-Max normalization method of Eq. 14 is applied to keep the stock prices follows the actual price pattern.
We use the Min-Max normalization method because it guarantees that all stock price values will have the exact same scale and has a significant decrease. Fig. 9 visualizes the normalized value of stock prices in 0 to 1 interval of four selected companies data using the Min-Max method. (14) x

Data segmentation
Segmentation is the process of separating and grouping data, from raw data into grouped data with a predicted value [6]. At this stage, the data is grouped into many timestep data with a size of 40 historically ordered data and the 41st data being forecast data for the model. The timestep grouping always shifts to the right one step until it reaches the last timestep. Illustration of data segmentation is given in Fig. 10. The process of segmenting the data works as follows. The input vector of the timestep data x is 40 consecutive data, and the output is a single value of the next 41st data. Therefore, the segmentation process prepares 40 ordered data, which is used to predict the next 41st data. This step is iterated until it reaches the last timestep data. Segmentation of 40 consecutive data reveals two months of data which is ideal data for forecasting. Variations of 20 consecutive data (one-month data), 60 data (threemonth data), or consecutive segmented data with other numbers are interesting to investigate and we leave as future work.

Data splitting
The segmented data were divided into training and testing data. The ratio of the distribution of training and testing data were 4:1, i.e., 80% for training and 20% for testing of all available data. The training data is the first 9 years and 9 months of the company's time-series stock price data, and the testing data is the last 2.5 years. The  result of data segmentation produces 3,004 data that are divided into training data and testing data. The data training used to build the model is 2,403 data (which is also model validation data) from 2010-03-03 to 2019-09-17, whereas the testing data is 601 data used for evaluate the accuracy from 2019-09-18 to 2022-02-03.

Building trained models
Implementation of both the designed LSTM and GRU architectural models are constructed according to the four architectural models explained in the section . The results of 2,403 training data from March 3, 2010 to September 17, 2019 were validated using the LSTM and GRU trained models are given in Fig. 11. Visualization of the LSTM and GRU trained models showing the time series of the four selected companies using training data. In Fig. 11 Fig. 12 shows the accuracy/validation measurements evaluation of the respective model to training data using the percentage error results of MAPE, RMSPE, and RMSDE. All trained models (four models, respectively, both for LSTM and GRU) gave  The implementation of eight architectural models using LSTM and GRU blocks is written using the Python programming language. All code has been committed to the Github repository with the URL https://github.com/armin-lawi/ForcastingStockPricewith-Grouped-Dataset. Stock price data is taken directly from Yahoo! Finance.

Performance evaluation of the proposed models
The remaining 601 vector data as test data were used to evaluate the forecasting performance of the four models respectively both for LSTM and GRU. In Fig. 13, all trained models provide accurate performance in predicting test data. Data forecasted using GRU is always superior to LSTM. Moreover, the LSTM forecast data slightly deviates from the actual data. Figure 14 describes the evaluation results of all three accuracy measures for the four companies for both LSTM and GRU.     Figure 15 shows the Boxplot-Whisker for the accuracy distribution pattern of MAPE, RMSPE, and RMDPE for each LSTM and GRU trained model architecture for all companies. Figure 16 shows the Boxplot-Whisker for the accuracy distribution pattern of MAPE, RMSPE, and RMDPE for each LSTM and GRU validation model architecture for all companies.