This paper uses a mixed model to deal with different factors, integrate the forecasting results of different factors, and get the final forecasting results.
The mixed model can be divided into two parts: weekly price change forecasting model and price change warning model. Weekly price change forecasting model includes time factor forecasting model (4.1), space factor forecasting model (4.2) and time–space integrated model (4.3), respectively dealing with the season factor, the space factor (the influence of price change in other markets) and the integration of outputs of sub-models [25, 26]. Price change warning model deals with exogenous variables (4.4). This paper uses different data preprocessing methods according to different sub-models to obtain better forecasting results.
The frame of the overall model is shown in Fig. 2.
Time factor forecasting model
Most papers forecast agricultural commodity prices based on time series models. These models do not require data of any other variables and the feasibility has been proved. Therefore time series models are still an important part in the mixed model of this paper.
Data preprocessing
Time series models are good at analyzing and forecasting long-term data, which has clear trend and regular fluctuation. Therefore, this paper uses weekly price in time series models, by calculating the average daily prices in 1 week [27]. The purpose is to raise forecasting accuracy, by avoiding the influence of fluctuation and abnormal amplitude.
ARIMA (p, d, q) model
This paper forecasts agricultural commodity weekly prices with ARIMA model as time factor forecasting model. ARIMA model is a classical and widely-used model. Parameters p, d, q respectively represents the order of auto-regression, the difference time of smoothing the time series, and the order of moving average.
The mathematical form of ARIMA model is:
$$\theta_{i}^{\left( 1 \right)} \left( t \right) = \mu + \rho \left( B \right)\varphi (B)^{ - 1} \varepsilon (t)$$
θ
(1)
i
(t) is a series of random variables, in this model is the weekly price change of time t. t represents for time. μ represents for mean value. B is backward shift operator, B(W(t)) = W(t − 1). ρ(B) is moving average operator, ρ(B) = 1 − ρ
1(B) − …ρ
q
(B). φ(B) is auto-regression operator, φ(B) = 1 − φ
1(B) − … − φ
p
(B). ɛ(t) is independent disturbance, or random error.
In this model, we first put the data set to ADF stationarity test (augmented DF stationarity test). If the data set fails the test, difference the data set until it can pass the test or abandon this group of data [28]. In fact, most agricultural commodity prices can pass ADF test within one order of difference, therefore we assign d = 1. The values of p and q are chosen by AIC (Akaike information criterion) test. Set the range of p and q within 1 to 10. Then put the training set to AIC test, and find out p and q of the least AIC value. It takes a long time to figure out p and q for each agricultural commodity and an alternative solution is to directly take p = 10, q = 8. The forecasting results are accurate.
Space factor forecasting model
This paper forecasts prices of all agricultural markets in one city. In this part, the paper mainly considers the influence of the price changes in other agricultural markets. The consideration of this factor is based on consumers’ behavior that price changes will affect consuming behavior in the same city.
Data preprocessing
Consumers will not react to price changes within the same day. Therefore there is a time lag in the influence of price changes in other markets. This paper takes weekly average value of price difference, in this way to retain the trend of price changes, and leave enough time for reaction time lag.
Besides the time lag, the relevance between different wholesale markets is another difficulty in model designing, as most methods in regression analysis require variables to be mutually independent. The purpose of the model designing in this part is to evaluate the influence intensity between agricultural markets. Therefore, this paper uses partial least squares (PLS) method to forecast prices based on the space factor [29].
PLS model
Partial least squares method includes one procedure which is similar to principal component analysis (PCA), therefore can be used on variables with multiple correlations. For an agricultural commodity in market i, we want to forecast weekly price change θ
i
(t) at time t. Independent variables are the price changes of other markets (θ
1(t − 1), …θ
i−1(t − 1), θ
i+1(t − 1), …, θ
n
(t − 1)) at time t − 1. We preprocess the training set with procedures above and put it in PLS model, and obtain regression relations between the price changes of target market and the price changes of other markets at the last time point. Finally we get the forecasting value θ
(2)
i
(t) of space model by the regression relations.
Through PLS model, we can obtain regression coefficients between each pair of agricultural markets, which to some extent reflect influential relationship between agricultural markets.
Furthermore, here we use PLS instead of directly using multivariate ARIMA model because multivariate ARIMA model require variables to be co-integrated. However, the price changes of agricultural markets in China indicate that the price change series in different agricultural markets have different stationarity. Therefore, different markets, failing in co-integration test, are not co-integrated. So we consider about using PLS method, a more general method, which can deal with all types of multivariate series.
Mixed forecasting model of weekly prices
After the preprocessing of two models above, we can get two groups of data, which is forecasting difference of time and space model of the next week (of the last week in the training set). Based on the analysis above, we’ve already known that weekly price changes are influenced both by seasonal factor and space factor, yet we don’t know the detail how two factors work together. There are two ways to figure out the relationship of the two factors: one is by economic analysis, the other is to test several possible model with historical data and choose the best one.
The integration method in this paper is to stimulate the forecasting of two models based on the training set, then put the forecasting results as independent variables and real weekly price changes into a linear regression. Linear model is an effective and relatively simple model, besides it can reveal the weight of each factor in the relationship. Also, it is proper as currently no relative research about weight of two factors is published and the amount of sample we have is small. Finally we can get the regression relations of two factors affecting price changes in different weights,
$$\theta_{i} \left( t \right) = \alpha_{1} \theta_{i}^{\left( 1 \right)} \left( t \right) + \alpha_{2} \theta_{i}^{\left( 2 \right)} \left( t \right)$$
θ
(1)
i
(t) is the forecasting value of ARIMA model of market i. θ
(2)
i
(t) is the forecasting value of PLA regression model of market i. We obtain α
1 and α
2 through regression of historical data. We can put the forecasting results of two sub-models into the regression equations and get the final forecasting values of weekly price changes.
Warning model
As is mentioned above, agricultural commodity prices tend to change after keeping constant for a while. No apparent rule is observed, thus the exact moment of the price change is quite hard to predict. The solution of this paper is to preprocess the data and obtain weekly prices. It is important for consumers to know the possible price changes of each single day [30]. Therefore, this paper proposes a price change warning model to quantify the intensity of possible price changes by the output values.
Hypothesis of price fluctuation
First, this model proposes a hypothesis that besides fluctuation around the mean value, all price changes are caused by the change of exogenous variables.
The agricultural commodity is a component of market economy. Its price is irreversibly influenced by other economic variables and exogenous variables including weather and price changes. This kind of change is definitely not a fluctuation around the mean value [31–33]. Therefore, it’s a reasonable hypothesis.
The influence brought by exogenous variables will accumulate as time goes by. Due to the uncertainty of the influence, analysis of the influence at a single moment has a huge error. Therefore the next section will propose several methods to deal with exogenous variables, in this way to synchronize the price changes with the accumulation of exogenous variables.
Definition of urgency and sample calculation
The preprocessing of price data in the warning model follows hypotheses raised above, meanwhile we expect to gain daily data with the trend maintained, which means to keep relevant information of every single day. Therefore we use the following way to deal with data:
-
1.
Smoothing. Smooth daily price data, thus we can keep the price trend and eliminate meaningless fluctuation. We use moving average smoothing method on the historical data. Take the parameter value as 15. The price \(\theta_{i} \left( t \right)\) of an agricultural commodity in the market i at the moment t is:
$$\theta_{i} \left( 1 \right) = \theta_{i} \left( 1 \right) .$$
$$\theta_{i} \left( 2 \right) = \frac{{\theta_{i} \left( 1 \right) + \theta_{i} \left( 2 \right) + \theta_{i} \left( 3 \right)}}{3}$$
$$\theta_{i} \left( 8 \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{15} \theta_{i} \left( i \right)}}{15}$$
$$\theta_{i} \left( {\text{t}} \right) = \frac{{\mathop \sum \nolimits_{i = t - 7}^{t + 7} \theta_{i} \left( i \right)}}{15}$$
-
2.
Clustering. In order to synchronize price changes with the accumulative changes of exogenous variables, meanwhile ignoring slight fluctuation, this paper uses cluster analysis in the data preprocessing. Here we use K-means unidimensional clustering. Set c to be the cluster number [34]:
$${\text{c}} = { \hbox{min} }\{ {\text{the number of different prices}}, 7\}$$
We set c no more than 7, thus we can divide one cluster to be the median value, three clusters higher and three clusters lower, to reflect the stability and huge fluctuation of the price, as showed in Fig. 3. There will be no more than 7 different values of agricultural commodity prices after clustering.
-
3.
Raising the dimension. To the price p
i
(t) of an agricultural commodity in the market i at the moment t, set the nearest future price change to be δ
i
(t), which happens N(t) days from now, therefore we can get a new daily data \((\delta_{i} \left( {\text{t}} \right), {\text{N}}_{i} ({\text{t}}))\).
-
4.
Obtaining new variables. After last three steps, we get \((\delta_{i} \left( {\text{t}} \right), {\text{N}}_{i} ({\text{t}}))\). Now we define some new variables. This paper expects to quantify the range of possible price changes from the values of δ
i
(t) and N
i
(t), therefore defines a variable of urgency U
i
(t). Suppose that price θ
i
(t) lasts for time T
i
(t). Based on experimental effect and quantification purpose, we define the U
i
(t) as:
$${\text{U}}_{i} (t) = \frac{{(T_{i} \left( t \right) - N_{i} (t))\cdot\delta_{i} \left( t \right)}}{{T_{i} \left( t \right)\cdot N_{i} (t)^{{\frac{1}{4}}} }}$$
If N
i
(t) < 3, take N
i
(t) = 3, in order to prevent the urgency from sudden change which makes training and forecasting difficult.
From the definition of U
i
(t), we can see that the bigger the price rise is or the sooner the change happens, the stronger the urgency is. So U
i
(t) can quantify the urgency degree of price changes and send warning messages. The urgency change of honeydew price in some market is shown as Fig. 4.
The transformation of exogenous variables and sample calculation
Some exogenous variables have their own change trends, therefore showing no conspicuous relationship with the urgency change trends of agriculture commodities. Meanwhile these variables are random. So it is inappropriate to directly use the daily data of these exogenous variables in fitting and forecasting. Because we cannot avoid the random volatility of exogenous variables and the influences caused by their own features.
Therefore we need to find out factors to better reflect how price changes are influenced. Based on this consideration, this paper processes exogenous variables in following four steps:
-
1.
Averaging. We take the average values of last 2 months’.
-
2.
Accumulating. We take the accumulating values since the last price change. When the price changes, all these variables are set to 0.
-
3.
Taking the value of that day. We directly assign the real values to the variables.
-
4.
Recording the maximum/minimum values. We assign the maximum/minimum values since the last price change to the variables.
This paper takes urgency as the independent variable, respectively takes the accumulating value of temperature change, whether snowy, foggy or stormy, takes the accumulated maximum values, average values and each day’s values of crude oil prices and the exchange rate and thus obtain 14 derivative exogenous variables.
Warning model based on neural networks
This paper builds a BP neural network model [35] to research on exogenous variables and urgency.
The choice of BP neural networks is based on 2 considerations:
-
1.
The relationship between 14 exogenous variables remains unknown, and no research has been conducted about quantifying the exact relationship of exogenous variables and price changes of agricultural commodities. Neural networks have flexible function form consisting of linear and non-linear functional relationship, thus have unique advantage in the forecasting required in this paper. Multi-factor analysis based on neural networks turned out to be effective in some applications in [25].
-
2.
The relationship between exogenous variables and agricultural commodity prices may fluctuation as time goes by. Neural networks model can be updated according to up-to-date historical data.
Set the number of hidden layers to be 1. We choose the node number of hidden layer by mean square error (MSE), and choose LM method as the training algorithm. After the parameters are determined, we can train the training set using neural networks [36, 37].
In fact, the purpose of urgency is to reflect the accumulated effect of exogenous variables. From the definitions of 14 derivative variables, we can see that some of them are monotone as time goes by, and some of them are accumulating.
Trained neural networks can adjust to the urgency every day. The definition of urgency indicates that urgency measures the trend of price changes. High urgency doesn’t indicate a certain price change. Instead, it indicates a wider range of price change (if the price really changes).
Considering the asynchronism of price changes and the accumulating of exogenous variables, this paper is conservative with the forecasting value of urgency. We consider the urgency value of the last week’s forecasting. To the forecasting value U
i
(t) at time t, the adjusted value U
’
i
(t) is defined as:
$$U_{i}^{'} \left( t \right) = \frac{{med\left\{ {{\text{U}}_{i} \left( s \right), s = t - 6, \ldots t - 1,t} \right\} + {\text{U}}_{i} \left( t \right) }}{2}$$
Here, \(med\left\{ {{\text{U}}_{i} \left( s \right), s = t - 6, \ldots t - 1,t} \right\}\) is the median of the urgency values from day t − 6 to day t.