Short-term photovoltaic power production forecasting based on novel hybrid data-driven models

Alrashidi, Musaed; Rahman, Saifur

doi:10.1186/s40537-023-00706-7

Research
Open access
Published: 02 March 2023

Short-term photovoltaic power production forecasting based on novel hybrid data-driven models

Musaed Alrashidi¹ &
Saifur Rahman²

Journal of Big Data volume 10, Article number: 26 (2023) Cite this article

3951 Accesses
4 Citations
Metrics details

Abstract

The uncertainty associated with photovoltaic (PV) systems is one of the core obstacles that hinder their seamless integration into power systems. The fluctuation, which is influenced by the weather conditions, poses significant challenges to local energy management systems. Hence, the accuracy of PV power forecasting is very important, particularly in regions with high PV penetrations. This study addresses this issue by presenting a framework of novel forecasting methodologies based on hybrid data-driven models. The proposed forecasting models hybridize Support Vector Regression (SVR) and Artificial Neural Network (ANN) with different Metaheuristic Optimization Algorithms, namely Social Spider Optimization, Particle Swarm Optimization, Cuckoo Search Optimization, and Neural Network Algorithm. These optimization algorithms are utilized to improve the predictive efficacy of SVR and ANN, where the optimal selection of their hyperparameters and architectures plays a significant role in yielding precise forecasting outcomes. In addition, the proposed methodology aims to reduce the burden of random or manual estimation of such paraments and improve the robustness of the models that are subject to under and overfitting without proper tuning. The results of this study exhibit the superiority of the proposed models. The proposed SVR models show improvements compared to the default SVR models, with Root Mean Square Error between 12.001 and 50.079%. Therefore, the outcomes of this research work can uphold and support the ongoing efforts in developing accurate data-driven models for PV forecasting.

Introduction

The tendency toward embracing emission-free energy from different renewable energy technologies, such as solar photovoltaic (PV), has resulted in necessary changes in the distribution system operation. These operational obstacles are due to the intermittency nature of the power coming from the sun, which requires additional ancillary services to control the variability in the PV system generations [1]. However, these services are economically unfeasible, and adopting them may discourage installing PV systems in the distribution networks [2]. Therefore, an accurate prediction of the amount of energy from the PV system would facilitate mitigating the technical issues of these PV systems [3]. There are various forecasting objectives in electrical power systems, such as electrical load consumption [4,5,6], wind power [7,8,9], solar irradiance [10,11,12], and electricity market forecasts [3]. In this study, the solar PV power forecast is of focus.

PV power output is highly correlated with meteorological variables, such as solar irradiance, wind speed, humidity, and temperature. These variables depend mainly on the geographical location and the climate condition at the site in question. In terms of the PV power forecasting horizon, four main categories are considered: very short-term forecasting (1 s–< 1 h), short-term forecasting (1 h–24 h), medium-term forecasting (1 week–1 month), and long-term forecasting (1 month–1 year). According to [13, 14], the PV output prediction horizon should be identified before choosing the forecasting technique because the forecasting accuracy decreases as the forecasting time increases. Furthermore, the choice of forecasting time depends on the desired application. For instance, very short-term forecasting can be applied for power smoothing, real-time dispatch and control, and regulation services, while short-term is primarily focused on load-following and zone-control purposes [15]. For medium-term forecasting, it is useful for persevering the power system planning and maintenance schedule, whereas long-term forecast assists in generation planning, energy bidding, and security operation [13].

Concerning the forecasting techniques, physical, statistical, and hybrid-based prediction models can be employed for PV power production. Physical approaches are mathematical models that use weather forecast data attained from numerical weather prediction (NWP), while statistical methods utilize historical data to predict future behavior without prior knowledge about the system state [16]. The hybrid method combines two independent forecasting methods to overcome each other drawbacks and strengthen the advantages by adding some optimization algorithms [3]. For the statistical methods, they are divided into $(i)$ time series models, i.e., autoregressive, autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA), and $(ii)$ machine learning methods, i.e., artificial neural network (ANN), support vector regression (SVR), and extreme learning machine.

A systematic literature review of PV power production forecast can be found in [17]. The authors in [18] compare statistical approaches, namely ARMA, ARIMA, and seasonal ARIMA, with six different ANN to forecast the output power of a PV plant. Eight-time delays in power production from a PV plant are used as the input variables to generate the forecasting results. The analysis shows that ANN performs better than time series models with less computation time. The paper in [19] uses ANN and NWP data to predict the power output of a PV system located in Puglia, Italy. They use temperature and solar irradiation as predictors of the forecasting algorithm. Results show that the proposed model provides good prediction results with a 10% error value.

The authors in [20] present a methodology for PV power forecasts using machine learning algorithms and statistical post-processing. They use as forecasting methods the ANN and linear regression correction to enhance the accuracy of forecasting. Results show that the proposed model has good accuracy values as the Mean Absolute Percentage Error ($MAPE$) was 4.7% using the historical dataset. The study in [21] examines the performance of different machine learning algorithms to forecast the hourly production of a PV system, including k-nearest neighbor (kNN), multiple regression (MLR), and decision tree regression (DTR). They employ weather data as input variables to the forecasting models such as solar irradiance and temperature. Results exhibit that the kNN has superior performance compared to MLR and DTR with a Root Mean Square Error ($RMSE$) of 18.68%, Mean Absolute Error ($MAE$) of 80.6%, and a normalized $RMSE$ error ($nRMSE$) of 13.2%. The recent study in [22] compares 24 machine learning algorithms for a day-ahead power forecast using numerical weather predictions (NWP). The study concludes by stating that the selection of input variables and hyperparameter tuning is more important than the model selection. In their study, the model that considers the sun position angles and irradiance reading after statistical processing results in a 13.1% decrease in RMSE compared to the basic case (Global Horizontal Irradiance (GHI), temperature, and wind speeds).

Machine learning algorithms proved their effectiveness in different forecasting objectives as they can capture and deal with the nonlinearity in forecasting problems compared to other forecasting methods. The statistical approaches have the advantage of handling high data volume [23]. The SVR, on the other hand, has superior forecasting performance with small data samples [24]. In addition, ANN can conduct any non-linear mapping using the learning process [25]. This makes SVR and ANN favorable to be employed. However, the main drawback of applying SVR and ANN algorithms is that they are sensitive to specific parameters. For instance, SVR depends highly on Kernel function hyperparameters, namely the error penalty parameter $\left(C\right)$ and the width $(\gamma )$. Also, ANN performance is greatly influenced by the number of hidden layers and neurons at each hidden layer. Therefore, the hybrid models have been investigated by the literature recently to overcome the overmentioned disadvantages of SVR and ANN.

Metaheuristic optimization algorithms (MOA), such as Simulated Annealing (SA), Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and Grasshopper Optimization Algorithm (GOA), have been used to select the appropriate parameters. For example, authors in [26] use GA to tune the SVR parameters to forecast the price of electricity in Australia. For the PV power generation forecast, a hybrid model is created in [27] between GA and SVR (GASVR) to optimize different Kernel function parameters. Study results demonstrate that GASVR is more accurate than the conventional SVR, with improvements in $RMSE$ value of 669.624 and 98.7648% in the $MAPE$. In addition, the study by Netsanet et al. [28] proposes a hybrid PV power forecasting model using variation mode decomposition with ANN and Ant Colony Optimization (ACO). The role of ACO is to improve the performance of ANN by optimizing its weight and biases during the training phase. The proposed model shows high-accuracy outcomes with the coefficient of determination, ${R}^{2},$ of 0.9768.

Motivation and contributions of the study

From the above discussion, the hybrid forecasting methods have shown a good performance compared to other methods. In addition, different MOA methods have been applied in the literature to improve the SVR and ANN prediction performance. The primary objective of such optimization algorithms is to determine the optimal parameters of SVR and ANN. However, there is no clear consensus on which algorithm should be used to estimate these parameters. Therefore, in this study, hybrid PV forecasting models are proposed based on machine learning algorithms, which utilize SVR and ANN optimized with four MOA, namely Social Spider Optimization (SSO), PSO, Cuckoo Search Optimization (CSO), and Neural Network Algorithm (NNA). These algorithms are used to improve the predictive efficacy of the selected algorithms, where the optimal selection of their hyperparameters and architectures plays a significant role in yielding precise forecasting outcomes. Hence, the following are the primary contributions of this study to the field of PV power forecasting:

1.
The SVR and ANN are machine learning algorithms used in this study to exploit the underlying big data patterns and forecast future values of PV power outputs.
2.
As the prediction performance of SVR and ANN depends highly on their hyperparameters and architectures, respectively, an intelligent framework is proposed in this study to facilitate the burden of manual parameter setting and expedite the forecasting process.
3.
As the optimal selection of their hyperparameters and architectures plays a significant role in yielding precise forecasting outcomes, this paper uses four MOA, namely SSO, PSO, CSO, and NNA, to improve the predictive efficacy of the selected algorithms.
4.
This paper uses different independent combinations of variables as inputs to identify the suitable variables that give the best PV power forecasting outcomes. This will help overcome the computational burden and complexity that may exist in the input features. These variables are time, weather, and historical data of the PV power generation.
5.
Despite that this work aims to forecast the output power of a PV system located in Riyadh city, Saudi Arabia, the proposed framework is useful for determining the best forecasting models in various locations.

The rest of the paper is organized as follows: In “Methodology” section, the study framework is described together with SVR and ANN algorithms and the MOA methods, including SSO, PSO, CSO, and NNA. The main findings and the comparison outcomes among the prediction models and MOA approaches are in “Results and discussion” section. Finally, “Conclusion and future work” section contains the conclusion of this study.

Methodology

This section presents the proposed hybrid forecasting techniques and other forecasting algorithms used in this study to predict the PV power output. The proposed forecasting methods include a hybrid method between SVR and backpropagation neural network (PBNN) with four MOA. These algorithms are SSO, PSO, CSO, and NNA. Initially, the forecasting framework is highlighted. After that, the fundamental of the BPNN and SVR are explained together with the MOA. Finally, the criteria to evaluate the forecasting models’ accuracy are described.

Framework of the proposed forecasting models

Sixteen hybrid and three default models have been developed to enhance the accuracy of the prediction. In this study, therefore, the forecasting approaches used are as follows:

SVR based on RB function with SSO, PSO, CSO, and NNA—$(SSO-SV{R}_{RB})$, ($PSO- SV{R}_{RB}$), ($CSO- SV{R}_{RB}$), and ($NNA- SV{R}_{RB}$).
SVR based on linear function with SSO, PSO, CSO, and NNA—($SSO- SV{R}_{linear}$), ($PSO- SV{R}_{linear}$), ($CSO- SV{R}_{linear}$), and ($NNA- SV{R}_{linear}$).
BPNN model with one hidden layer with SSO, PSO, CSO, and NNA—$\left(SSO-BPN{N}^{1}\right)$, PSO $\left(PSO-BPN{N}^{1}\right)$, $\left(CSO-BPN{N}^{1}\right)$, and $\left(NNA-BPN{N}^{1}\right).$
Hybrid Model 10: BPNN model with two hidden layers with SSO, PSO, CSO, and NNA—$\left(SSO-BPN{N}^{2}\right)$, $\left(PSO-BPN{N}^{2}\right)$, $\left(CSO-BPN{N}^{2}\right)$, and $\left(NNA-BPN{N}^{2}\right)$.
Default SVR model based on RB function $\left(SV{R}_{RB}^{D}\right).$
Default SVR model based on linear function $\left(SV{R}_{linear}^{D}\right).$
Default BPNN model $\left(BPN{N}^{D}\right).$

This study implements SVR-based kernel functions and BPNN by employing MATLAB R2020a and LIBSVM tools [29]. The framework that explains the proposed PV power output forecast is depicted in Fig. 1. This framework can be used for any forecasting objective in other countries. The process is described as follows:

Step 1::: Data preparation: input data are initially collected, checked, cleaned, and normalized to reduce the numerical burden during the training phase by the forecasting algorithms and the searching process of the parameters.
Step 2::: Correlation values: the importance of data features are investigated against the output feature (PV power). The Pearson Correlation Coefficient is used in this study; see “Feature combinations” section.
Step 3::: Data splitting: the input data are divided into training and testing datasets. The training data are used to train the forecasting algorithms, while testing data are used to test the forecasting models' performance. To validate the stability of the forecasting model, tenfold cross-validation is used. The cross-validation process is described in “Cross-validation” section.
Step 4::: Parameters tuning: SSO, PSO, CSO, and NNA algorithms are applied to determine the SVR best hyperparameters and BPNN best network configurations for all the considered feature combinations. SVR parameters are $C$ and $\gamma $ for the RB function and $C$ for the linear function. BPNN parameters are the number of neurons at each hidden layer. In this study, one and two hidden layers are assumed.
Step 5::: Building the forecasting models: by using the best parameters mentioned in Step 3, sixteen hybrid models are generated for each of the considered feature combinations, namely SSO-$SV{R}_{RB},$ $PSO-SV{R}_{RB}, CSO-SV{R}_{RB}$, $NNA-SV{R}_{RB}$, SSO-$SV{R}_{linear}$, PSO-$SV{R}_{linear}$, CSO-$SV{R}_{linear}$, $NNA- SV{R}_{linear}$, $SSO-BPN{N}^{1}$, $PSO-BPN{N}^{1}$, $CSO-BPN{N}^{1}$, $NNA-BPN{N}^{1}$, $SSO-BPN{N}^{2}$, $PSO-BPN{N}^{2}$, $CSO-BPN{N}^{2}$, and $NNA-BPN{N}^{2}$.
Step 6::: Generating results: the hybrid forecasting models created in Step 4 are tested under the testing dataset determined in Step 2. Their output is the prediction results.
Step 7::: Results comparison: the forecasting models are then compared with the actual values of the PV power output utilizing $RMSE$, $nRMSE$, $MAE$ and normalized $MAE$ s $(nMAE)$. The results are compared and then analyzed.

Detail description of each process is explained in the subsections below.

Study site and dataset

The datasets of the PV power output are collected from a rooftop PV system placed on a mosque in Riyadh city, Saudi Arabia. Five PV-inverters are installed on this site, making the PV system have a capacity of 120kWp. This location is operated by both King Abdelaziz City for Science and Technology and Saudi Electricity Company. The PV output data gathered from the unit are in 1-h intervals for the period between June 03rd, 2017, and August 31st, 2018. The maximum power production from the system was found to be on March 25th, 2018, with a total active power production of 105.09285 kW at 11:00 A.M. The hourly data show numerical readings from the PV system at night hours when no irradiance is expected. To deal with such data, all data below 100 W are omitted and set to zero, implying that there is no output power from the PV system. after sunset and before the sun rises. The metrological weather data used in this study are recorded hourly at the same location as the PV system. They are collected from a solar station operated by the King Abdullah City for Atomic and Renewable Energy (K.A.CARE). Figure 2 shows the solar map of Saudi Arabia and the site.

Data preparation

The weather and PV power data are required to be prepared. Two main steps are necessary for data preparation. These steps are data cleaning and data normalization. Each of these steps is described below:

Data cleaning

Data cleaning is a very significant step in creating a successful forecasting model. Since we are dealing with historical data from different sources, these data could be imprecise, impacting the performance of the forecasting models. This step removes all the missing PV power data with the associated time and weather variables.

Data normalization

Input data normalization is critical for preparing the data before investigating the performance of forecasting models. This step aims to reduce the likelihood that features with high numerical values will outnumber those with lower ones [31]. The input data listed in Table 1 are normalized between 0 and 1 using Eq. (1).

$${v}_{i}^{n}=\frac{{v}_{i}-{v}_{min} }{{v}_{max}- {v}_{min}}$$

(1)

where ${x}_{i}$ is the observed value; ${x}_{i}^{n}$ is the reading value after normalization, while ${x}_{max}$ and ${x}_{min}$ are the maximum and minimum values corresponding to the observed dataset, respectively.

Table 1 List of input variables used to forecast PV power output

Full size table

Forecasting models input variables

In this work, the forecasting objective is a one-hour ahead forecasting of PV power generation from a PV panel located in Riyadh city, Saudi Arabia. Therefore, to obtain the best forecasting model of PV power output, the proposed models are trained and tested using three types of variables considered at the study location. As mentioned in the literature, the PV output is greatly influenced by time, weather, and historical data of PV power generation. Table 1 lists the input variables used in this study.

The variable ${(v}_{i}^{4})$, for example, is the air temperature $(\mathrm{^\circ{\rm C} })$, and $i$ is the temperature value at each hour. On each day, we have 24 values of air temperature. After that, the dataset ($V$) is split into two groups, namely: the training dataset, ${v}_{train}$, and the test dataset, ${v}_{test}$, such that $V={v}_{train}\cup {v}_{test}$. In this paper, 80% of the data are considered for the training phase, while 20% are used in the testing phase. The Cross-Validation technique is utilized to tune the hyperparameters of the SVR models and the BPNN network configuration.

Feature combinations

To forecast the PV output (${P}_{out})$, various independent combinations of variables are used as inputs. As more input variables do not always indicate good forecasting outcomes [32], the primary goal of combining different sets of input variables is to identify the suitable variables that give the best forecasting results at this site. In this study, Pearson Correlation Coefficient is used to measure the importance of each variable with the observed values of the PV power. Pearson correlation formula is in Eq. (2), where cov is the covariance, σ_features and σ_PV-power are the standard deviations of input variables, x_features, and the PV power readings, x_PV-power, respectively. Figure 3 displays the correlation results between the input variables and the PV power readings at the study site.

$${\rho }_{{x}_{features}^{ }, {x}_{PV-power}}= \frac{cov({x}_{features}^{ },{ x}_{PV-power})}{{\sigma }_{features}.{\sigma }_{PV-power}}$$

(2)

Table 2 contains the variables for each feature combination. Considering the feature combination (12), for example, the ${P}_{out} ({v}_{i}^{1}, {v}_{i}^{2},{v}_{i}^{3},{v}_{i}^{8},{v}_{i}^{10})$ is a function of Month, Day, Hour, global horizontal irradiance, GHI, (Wh/m²), and PV power output at the same hour on the previous day (kW).

Table 2 The input variables associated with each feature combination

Full size table

Cross-validation

To evaluate the performance of the proposed forecasting models, it is not ideal for conducting this evaluation based on one test set. Therefore, to examine the forecasting model performance over different test data, $k$-fold Cross-Validation should be employed. Cross-validation is a procedure in which the data are split into more $k$-subsets [33]; see Fig. 4. These $k$-subsets are further divided into testing and training groups. In the training group, a single subset is used as a validation data set, while the remaining $k$-subsets are used as training subsets. This technique is repeated $k$ times until the entire $k$-subsets are used as a validation set. Hence, the overall result is independent of only one training set, which may affect the robustness of the forecasting models [34].

It is worth mentioning that the $k$-fold Cross-Validation procedure is conducted in the absence of the testing dataset. The primary goal of Cross-Validation is to examine the generalization of a forecasting model. In this study, tenfold cross-validation is used. In other words, the training dataset is divided into 10-subsets. One subset is considered the test set, while the remaining nine subsets are utilized for training the forecasting model. This process is repeated ten times resulting in ten training and testing folds, where the $nRMSE$ is recorded for each of them. The average of the tenfold $nRMSE$ results is then reported.

Backpropagation neural network

The artificial neural network (ANN) has been used in various forecasting applications. ANN is an information computing system. ANN mimics approaches that the human brain analyzes information [35]. ANN is created similar to the human brain, where a huge number of neuron nodes are interconnected to tackle problems that represent the uniqueness of this network. Backpropagation is one of the most widely used ANN methods in the learning process. Figure 5 depicts a multilayer feed-forward neural network.

Three different layers are the main construction of the ANN, namely input, hidden, and output layers, such as the input layer ${\left[{x}_{1}, {x}_{2}, \dots ,{x}_{N}\right]}^{T}$, the hidden layer ${\left[{h}_{1}, {h}_{2}, \dots ,{h}_{N}\right]}^{T}$ and the output layer ${\left[{y}_{1}, {y}_{2}, \dots ,{y}_{N}\right]}^{T}$. The model output, therefore, can be calculated by Eq. (3) [36]:

$$ y_{i} = \alpha _{0} + \sum\limits_{{j = 1}}^{n} {\alpha _{j} } f\left( {\sum\limits_{{i = 1}}^{m} {\beta _{{ij}} } y_{{t - i}} + \beta _{{0j}} } \right) + \varepsilon _{t} $$

(3)

where $m$ is the number of nodes at the input layer, while $n$ is the number of nodes at the hidden layer. $f$ is a sigmoid transfer function, which will be the logistic function in this study,$f\left(x\right)=\frac{1}{1+\mathrm{exp}(-x)}$. $\{{\alpha }_{j},j = \mathrm{0,1}, ...,n\}$ is the weights vector that links the hidden layer and output layer and $\{{\beta }_{ij}, i =1, 2, .. .,m;j = \mathrm{0,1}, ...,n\}$ are the weights that link the input nodes with the hidden nodes.${\alpha }_{0}$ and ${\beta }_{0j}$ are weights magnitude of arcs leading from the bias terms, which have values equal to 1.

The number of nodes in each hidden layer is optimized using SSO, PSO, CSO, and NNA. This study identifies the multilayer perceptron (MLP) for the BPNN model, while the Levenberg–Marquardt method is chosen as the training function.

Support vector regression

Support vector machine (SVM) is a supervised learning approach utilized for classification, regression problems, or outliers' detection. When two classes cannot be separated, a kernel function is employed to map the input space to another high dimension space [37]. In that new space, the input space can be separated linearly. There are three known kernel functions to conduct the separation: linear, polynomial, and radial kernel functions [38]. Hence, SVR inherently employs some of the SVM properties. However, unlike SVM, SVR conducts the classification based on the regression process error measures based on the predefined threshold, see Fig. 6 [39].

The leading optimization can be formulated in Eq. (4), while the kernels used with the SVR are provided in Eqs. (5) and (6).

The SVR requires solving the following optimization problem:

$$ \begin{aligned} & minimize\frac{1}{2}{\Vert w\Vert }^{2}+C\sum_{i=1}^{\mathcal{l}}\left({\xi }_{i}+{\xi }_{i}^{*}\right) \\ & subject \; to \; \left\{\begin{array}{l} {y}_{i}-({w}^{T}\phi \left({x}_{i}\right)+b)\le \varepsilon +{{\xi }_{i}}^{*}\\ ({w}^{T}\phi \left({x}_{i}\right)+b)-{y}_{i}\le \varepsilon +{\xi }_{i}\\ {\xi }_{i},{\xi }_{i}^{*} \ge 0\end{array}\right. \end{aligned}$$

(4)

where $C > 0$ is a constant that identifies the trade-off between the flatness of $f$ and assesses the tolerated amount of deviation to values larger than $\varepsilon $.

As we have mentioned, our input space represented by the input features, or the training dataset, is transferred into a new space with high dimensions, where the function $\phi $ is used. This is known as the kernel trick $\left({x}_{i},{x}_{j}\right)=$ $\phi {\left({x}_{i}\right)}^{T}\left({x}_{j}\right)$. This research work uses kernel functions, namely radial basis ($RB$) and the linear ($linear$). They can be written as [40]:

$$\left(RB\right){:} \; K\left({x}_{i},{x}_{j}\right)= {e}^{-\gamma \left({\Vert {x}_{i}-{x}_{j}\Vert }^{2}\right)}$$

(5)

$$\left(linear\right){:} \; K\left({x}_{i},{x}_{j}\right)= ({x}_{i}^{T}{x}_{j})$$

(6)

where, $\gamma \left(Gamma\right)$ is the kernel parameter and is estimated by the study optimization algorithms.

The choice of the two hyperparameters, $C$ and $\gamma $, is critical in enhancing the accuracy of the forecasting models. The parameter $C$ governs the empirical risk of SVR, while parameter $\gamma $ controls the width of the radial basis function [41]. Researchers are accustomed to determining these parameters either by their insights, prior knowledge from other studies [42], or by using approaches such as grid search [39]. Hence, $C$ and $\gamma $ are optimally selected by utilizing SSO, PSO, CSO, and NNA to build the hybrid models. This is described in the next Sections.

Metaheuristic optimization algorithms

The setup of metaheuristic optimization algorithms, including SSO, PSO, CSO, and NNA, is described in this section. The evaluation function of these algorithms tries to minimize the $nRMSE$; see Eq. (8).

SSO, PSO CSO, and NNA are explained in [43,44,45,46], respectively. The considered optimization algorithms are initiated with 50 maximum iterations. For the linear function, the upper and lower bounds for $C$ are between $[\mathrm{1,10000}],$ and for the RB function, the boundaries are in the range of $[\mathrm{1,10000}]$ and $[\mathrm{0.01,3}]$ for $C$ and $\gamma $, respectively. For the BPNN models, the upper and lower bounds of neurons at each hidden layer are set to be [1,50]. For CSO, the following paraments are set: $h$ = 20 and $p$ = 0.25.

Figure 7 depicts the hybrid forecasting algorithm that consists of ANN with PSO. During the algorithms, the optimal number of nodes at each hidden layer is developed, and their values are obtained until the lowest $nRMSE$ are attained. For parameter tuning, tenfold cross-validation is used. Similar steps are used with other optimization algorithms and SVR.

Model accuracy criteria

The forecasting methods under consideration are evaluated for accuracy and efficiency using the following statistical indicators: $RMSE$, $nRMSE$, $MAE$, and $nMAE$. These metrics show how close the measured values are to the predicted PV power output produced by the proposed models. These metrics are defined in the following equations [19]:

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}({y}_{i}- {{f}_{i})}^{2}}$$

(7)

$$nRMSE= \frac{\sqrt{\frac{1}{n}\sum_{i=1}^{n}({y}_{i}- {{f}_{i})}^{2}}}{{y}_{i, max}}$$

(8)

$$MAE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}|{y}_{i}-{f}_{i}|}$$

(9)

$$nMAE= \frac{\sqrt{\frac{1}{n}\sum_{i=1}^{n}|{y}_{i}-{f}_{i}|}}{{y}_{i, max}}$$

(10)

where $n$ is the number of testing datasets; ${y}_{i}$ is the observed value of the PV power; ${y}_{i, max}$ is the maximum value in the testing dataset and ${f}_{i}$ is the forecasted value generated by the forecasting models. $RMSE$ measures the deviation between observed PV power readings and predicted values [47], while the $MAE$ is the mean of absolute value of the residuals (forecasting errors).

Results and discussion

The BPNN model was built using a multilayer perceptron (MLP) and the backpropagation algorithm, with the Levenberg–Marquardt method as the training function. Regarding the number of layers, this study assumes three cases for BPNN:

Case 1: $BPN{N}^{1}$—one input layer, one hidden layer, and one output layer.
Case 2: $BPN{N}^{2}$—one input layer, two hidden layers, and one output layer.
Case 3: $BPN{N}^{D}$—default BPNN.

The number of neurons (nodes) in the hidden layers in Case 1 ($BPN{N}^{1})$ and Case 2 ($BPN{N}^{2}$) are obtained based on the optimal number of nodes generated by SSO, PSO, CSO, and NNA. For Case 3 ($BPN{N}^{D})$ is assumed to have one input layer, one hidden layer, and one output layer. In $BPN{N}^{D}$, the number of neurons selected equals the number of input features listed in Table 2. For example, the number of nodes is one with the input feature combination (F1), which consists of one input feature. Similarly, five nodes are set for the feature combination (F12), which has five input features. The input data in the $BPNN$ are the same as those used in $SVR$ models. Table 4 summarizes the $BPNN$ best models’ configuration by using different algorithms.

For SVR models, the default parameters are selected based on the default values used in the LIBSVM tool. The default value of the parameter $C$ is set to 1 for the radial basis and linear functions, while parameter $\gamma $ is equal to (1/number of features). After that, the $SVR$ and $BPNN$ models with the optimal parameters and network configurations were employed to forecast the PV power generation. To evaluate the level of agreement between the predicted data and measured data, the models are examined based on $RMSE$, $nRMSE$, $MAE$, and $nMAE$. Table 3 compares the performance indices of the SVR and BPNN models.

Table 3 Statistical errors of the best-proposed forecasting models using $SSO$, $PSO$, $CSO$ and NNA

Full size table

Analysis of the forecasting models

In this section, the forecasting models are compared according to some criteria to examine their performance to predict the PV power output from the solar system. Results of the best forecasting models, corresponding optimized parameters of$SV{R}_{RB}$,$SV{R}_{Linear}$,$BPN{N}^{1}$, and $BPN{N}^{2}$ models and the statistical errors of the forecasting models are shown in Tables 3, 4, and 5, respectively. Figures 8 and 9 display graphical representations of the goodness of fit tests of $RMSE$ (in Fig. 8) and $MAE$ (in Fig. 9) in the form of heat maps. These figures compare all the 323 models considered at the study site. These are four forecasting models, $SV{R}_{RB}$,$SV{R}_{Linear}$,$BPN{N}^{1}$, and $BPN{N}^{2},$ which obtain their parameters for each of the 17 feature combinations using four optimization approaches, SSO, PSO, CSO, and NNA, in addition to the three default models $SV{R}_{RB}^{D}$,$SV{R}_{Linear}^{D}$, and$BPN{N}^{D}$.

Table 4 Models’ parameters for SVR models and BPNN network configuration (SVR models: $\varepsilon =0.001$)

Full size table

Table 5 Best proposed forecasting models vs. default models based on SVR and ANN

Full size table

The statistical errors are reported for the best feature combination, which is F14 in Table 2.

The SVR parameters and BPNN network configurations are reported for the best feature combination, which is F14 in Table 2.

Hybrid forecasting models vs. default forecasting models

Tables 3, 5, Figs. 8, and 9 show that: $(i)$ Overall, the proposed forecasting models optimized by SSO, PSO, CSO, and NNA outperform the default forecasting models in predicting the PV power output with low $RMSE$ and $MAE$ values. Regarding models fitting accuracy with the $SV{R}_{RB}$ models, the proposed models with the optimized hyperparameters show improvements compared to default models, where $RMSE$ improved between 12.001 and 50.079% and $MAE$ improved between 1.80291 and 59.8847%. Similarly, the prediction models with the $BPN{N}^{1}$ and $BPN{N}^{2}$ using the proposed models with the optimal network configurations have better performance with 1.883–46.964% and 2.0576–47.007% improvement in the $RMSE$ and $MAE$ values, respectively, compared to the $BPN{N}^{D}$ models.

Using $SV{R}_{RB}$ models with different optimization algorithms and different feature schemes, Table 6 and Fig. 8 show that $RMSE$ values are $\le 23.12 \; \mathrm{ kW}$, while $RMSE$ values with the default models are $\le 28.24 \; \mathrm{ kW}$. With the feature combination (12), for example, the value of $RMSE$ with the best model ($SSO- SV{R}_{RB}$) is 4.7500 kW, and $MAE$ is 2.7617 kW. On the other hand, the $SV{R}_{RB}^{D}$ gives an $RMSE$ value of 9.206 kW and $MAE$ of 5.269 kW. Similarly, the reduction in the error metrics values has been attained with the proposed $BPNN$ models. For instance, considering the feature combination (9), the value of $nRMSE$ using $BPN{N}^{1}$ and $BPN{N}^{2}$ are 5.767% and 5.804%, respectively, while the $BPN{N}^{D}$ generates an error value of 7.25%. Nevertheless, the degree of improvement is somewhat low in the $SV{R}_{Linear}$ models with the optimized parameters as a comparison to the $SV{R}_{Linear}^{D}$ models. This can be attributed to the value of the optimized hyperparameter, $C$, which has close values to the default ones.

Table 6 The Statistical error results of different sets of feature combinations

Full size table

In addition, Figs. 8, 9, and Table 5 show that the BPNN with the default models has better performance than the default models of the SVR with the radial base and the linear functions. This is due to the default parameters that are selected for the SVR models. Hence, the associated parameters should be chosen appropriately to obtain the best forecasting performance of the SVR models. The BPNN default models, on the other hand, show good performance compared to the proposed models. For instance, the best model using $SSO-BPN{N}^{1}$ and $CSO-BPN{N}^{2}$ give $RMSE$ values of 4.8460 kW and 4.5692 kW, respectively, while the $BPN{N}^{D}$ model generates an error value of 5.2289 kW.

Performance analysis using proposed models

From Tables 3, 6, Figs. 8, and 9, and by comparing different forecasting models using the proposed models, $SV{R}_{RB}$ models can be considered the best prediction model to estimate the PV power generation in the study site. $SV{R}_{RB}$ has better error metrics values than $SV{R}_{Linear}$, $BPN{N}^{1}$, and $BPN{N}^{2}$ with low $RMSE$ and $MAE$ values. For instance,$PSO- SV{R}_{RB}$ models are better than $PSO- SV{R}_{Linear}$, $PSO-BPN{N}^{1}$, and $PSO-BPN{N}^{2}$ for all the considered feature combinations. This result is also found with other optimization methods. $BPNN$ models, on the other hand, have a better performance than the $SVR$ models with the linear function and promising performance compared to the RB models. As a comparison between the proposed $BPNN$ models, overall $BPN{N}^{2}$ models have led to better prediction capability than the $BPN{N}^{1}$. $PSO-BPN{N}^{2}$ with the feature combination (11), for example, gave an $nRMSE$ of 3.251% and $nMAE$ values of 3.251%, while $PSO-BPN{N}^{1}$ resulted in error values of 6.297% and 3.819%, respectively. This implies that using more than one hidden layer with optimized node numbers leads to higher forecasting accuracy than a single hidden layer.

Performance analysis of optimization algorithms

Furthermore, as a comparison between the performance of the optimization algorithms, more than one optimization approach has the same accuracy and performance in estimating the parameters of the forecasting models. $\mathrm{PSO}$ and $\mathrm{SSO}$ methods have a similar or negligible difference in terms of estimating parameter values. Nevertheless, the three optimization algorithms show different performances in obtaining the $C$ parameter of the linear function. Furthermore, linear models performed the worst because of their limited ability to deal with nonlinearity in input data. Figure 10 shows the performance of all four optimization algorithms with all the forecasting models. This figure proves that the proposed hybrid forecasting models where the hyperparameters of $SV{R}_{Linear}$ and $SV{R}_{RB}$ and the configuration of $BPN{N}^{2}$ are selected optimally can track the actual values of the PV power output precisely compared to default models.

Best feature combination

From Tables 3, 6, Figs. 8, and 9, and by comparing different feature combinations, we can see that the best forecasting model is attained with the feature combination (F14). This combination includes the month in the year, the day of the month, the hour of the day, air temperature $(\mathrm{^\circ{\rm C} })$, global horizontal irradiance, GHI (Wh/m²), and PV power output at the same hour on the previous day (kW). Results show that the best forecasting outcomes for all considered models are obtained using this feature combination. Regarding other feature combinations, using only global horizontal irradiance (F3), PV power output at the same hour on the previous day (kW) (F4), or a mix between them could lead to satisfactory forecasting results. GHI provided good accuracy results due to its significant impact on the PV system production, while the lag power observation is due to the nature of solar radiation in the study site. In Riyadh city, the nature of the weather is less variable, and there are two seasons in the year, summer and winter. Therefore, the power output of the previous day may influence the production of the next day. This is depicted in Fig. 11 through the scatter plots of the measured vs. predicted PV power output values acquired by the $SV{R}_{RB}$ with the CSO algorithm. The subplot in green displayed in Fig. 12 indicates the best prediction model with the best feature combination ($CSO- SV{R}_{RB}$ with F14).

Furthermore, the Decision Tree (DT) algorithm for feature selection is used to validate the conclusion on the combination of the optimal features [48]. Figure 11 displays the scores for input features according to how relevant they are to predicting the PV power output. Figure 11 reveals that the features: PV power output at the same hour on the previous day (kW), global horizontal irradiance, GHI $(W\mathrm{h}/{m}^{2})$, Direct normal irradiance, DNI $(W\mathrm{h}/{m}^{2})$, the hour of the day, and air temperature $(\mathrm{^\circ{\rm C} })$ have the best five scores among the other features. This correspond to the best feature combination obtained in this study.

Conclusion and future work

Support Vector Regression (SVR) with radial basis and linear kernel functions and Backpropagation Nural Network (BPNN) models were investigated in this study to predict the PV output power of the rooftop PV unit placed on a mosque located in Riyadh city, Saudi Arabia. The penalty factor ($C)$ and kernel parameter ($\gamma )$ of the SVR models with the radial and linear functions and the number of hidden nodes of the artificial neural network were optimized using four optimization algorithms. These algorithms are Social Spider Optimization (SSO), Particle Swarm Optimization (PSO), Cuckoo Search Optimization (CSO), and Neural Network Algorithm (NNA). Different combinations of input variables are used in this study to select the optimal set of input features. By analyzing the results of the best forecasting model and the performance of the estimation algorithms, the conclusion can be summarized as follows:

1.
According to the model accuracy criteria, the proposed hybrid PV power forecasting models outperform the default models using SVR with the RB, linear functions, and BPNN algorithms. Overall, results indicate that the proposed models with the optimized hyperparameter of the SVR with radial basis outperform other models in forecasting PV power output at the study site.
2.
Regarding the model fitting accuracy with the $SV{R}_{RB}$, the proposed models show improvements compared to the default models, where $RMSE$ improved between 12.001 and 50.079% and $MAE$ improved between 1.80291 and 50.8847%. Similarly, the prediction models with the $BPN{N}^{1}$ and $BPN{N}^{2}$ using the proposed models, with optimal network configurations, have better performance with 1.883–46.964% and 2.0576–47.007% improvement in the $RMSE$ and $MAE$ values, respectively, compared to the default $BPNN$ models.
3.
The proposed BPNN models exhibit a good forecasting outcome that can be compared with the SVR radial basis models. On the other hand, the SVR based on the linear function showed limited forecasting performance due to its limited capability to capture the nonlinearity in the input dataset.
4.
As a comparison between the estimation algorithms, the four optimization algorithms almost have the same performance, demonstrating their capacity to select SVR parameters and BPNN network configurations.

Finally, the framework proposed in this study can be used to forecast the PV power output in other countries. However, there is still room for further investigation to develop a model that provides high-accuracy results to predict the PV power forecast. Furthermore, even though the current work primarily focuses on the possible improvement of SVR and ANN by optimizing their parameters, the parameters of other algorithms can also be investigated, such as random forests and decision trees. Another possible direction is to use dimensionality reduction models to select the features for our input vector. In this study, different feature combinations are formulated based on their correlation with the output power. Thus, other researchers can examine the performance of some dimensionality reduction models, such as the Monte Carlo algorithm, Boruta feature selection algorithm, and grouping genetic algorithm, to obtain the optimal set of features. Finally, further forecasting approaches can be applied, including the currently popular deep learning methods based on neural networks, such as recurrent neural networks and long-term short memory.

Availability of data and materials

The data that support the findings of this study are available from King Abdelaziz City for Science and Technology and Saudi Electricity Company but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of King Abdelaziz City for Science and Technology and Saudi Electricity Company.

Abbreviations

PV:: Photovoltaics
NWP:: Numerical weather prediction
ARMA:: Autoregressive moving average
ARIMA:: Autoregressive integrated moving average
ANN:: Artificial Neural networks
BPNN:: Backpropagation Neural Network
SVR:: Support Vector Regression
kNN:: K-nearest neighbor
MLR:: Multiple regression
DTR:: Decision tree regression
GHI:: Global Horizontal Irradiance
MOA:: Metaheuristic optimization algorithms
SSO:: Social Spider Optimization
PSO:: Particle Swarm Optimization
CSO:: Cuckoo Search Optimization
NNA:: Neural Network Algorithm
RMSE:: Root Mean Square Error
nRMSE:: Normalized Root Mean Square Error
MAE:: Mean Absolute Error
nMAE:: Normalized Mean Absolute Error

References

M. E. I. (MITEI). Managing large-scale penetration of intermittent renewables. 2011.
Haque MM, Wolfs P. A review of high PV penetrations in LV distribution networks: present status, impacts and mitigation measures. Renew Sustain Energy Rev. 2016;62:1195–208.
Article Google Scholar
Antonanzas J, Osorio N, Escobar R, Urraca R, Martinez-De-Pison FJ, Antonanzas-Torres F. Review of photovoltaic power forecasting. Sol Energy. 2016;136:78–111.
Article Google Scholar
Wu J, Wang Y-G, Tian Y-C, Burrage K, Cao T. Support vector regression with asymmetric loss for optimal electric load forecasting. Energy. 2021;223:119969.
Article Google Scholar
Fathi S, Srinivasan R, Fenner A, Fathi Rinker Sr SM. Machine learning applications in urban building energy performance forecasting: a systematic review. Renew Sustain Energy Rev. 2020;133:110287.
Article Google Scholar
Cai M, Pipattanasomporn M, Rahman S. Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl Energy. 2019;236:1078–88.
Article Google Scholar
Ferreira M, Santos A, Lucio P. Short-term forecast of wind speed through mathematical models. Energy Rep. 2019;5:1172–84.
Article Google Scholar
Dhiman HS, Deb D, Guerrero JM. Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renew Sustain Energy Rev. 2019;108:369–79.
Article Google Scholar
Doucoure B, Agbossou K, Cardenas A. Time series prediction using artificial wavelet neural network and multi-resolution analysis: application to wind speed data. Renew Energy. 2016;92:202–11.
Article Google Scholar
Alrashidi M, Alrashidi M, Rahman S. Global solar radiation prediction: application of novel hybrid data-driven model. Appl Soft Comput. 2021;112:107768.
Article Google Scholar
Alfadda A, Rahman S, Pipattanasomporn M. Solar irradiance forecast using aerosols measurements: a data driven approach. Sol Energy. 2018;170:924–39.
Article Google Scholar
Ghofrani M, Ghayekhloo M, Azimi R. A novel soft computing framework for solar radiation forecasting. Appl Soft Comput. 2016;48:207–16.
Article Google Scholar
Akhter MN, Mekhilef S, Mokhlis H, Shah NM. Review on forecasting of photovoltaic power generation based on machine learning and metaheuristic techniques. IET Renew Power Gener. 2019;13(7):1009–23.
Article Google Scholar
Ahmed R, Sreeram V, Mishra Y, Arif D. A review and evaluation of the state-of-the-art in PV solar power forecasting: techniques and optimization. Renew Sustain Energy Rev. 2020;124:109792.
Article Google Scholar
Sampath Kumar D, Gandhi O, Rodríguez-Gallegos CD, Srinivasan D. Review of power system impacts at high PV penetration Part II: Potential solutions and the way forward. Sol Energy. 2020;210:202–21.
Article Google Scholar
Sobri S, Koohi-Kamali S, Rahim NA. Solar photovoltaic generation forecasting methods: a review. Energy Convers Manag. 2017;156:459–97.
Article Google Scholar
de Freitas Viscondi G, Alves-Souza SN. Sustainable energy technologies and assessments. A systematic literature review on big data for solar photovoltaic electricity generation forecasting. Sustain Energy Technol Assess. 2018;31:54–63.
Google Scholar
Sharadga H, Hajimirza S, Balog RS. Time series forecasting of solar power generation for large-scale photovoltaic plants. Renew Energy. 2020;150:797–807.
Article Google Scholar
Gómez JL, Martínez AO, Pastoriza FT, Garrido LF, Álvarez EG, García JAO. Photovoltaic power prediction using artificial neural networks and numerical weather data. Sustainability. 2020;12(10295):1–19.
Google Scholar
Theocharides S, Makrides G, Livera A, Theristis M, Kaimakis P, Georghiou GE. Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl Energy. 2020;268:115023.
Article Google Scholar
Abubakar Mas’ud A. Comparison of three machine learning models for the prediction of hourly PV output power in Saudi Arabia. Ain Shams Eng J. 2022;13(4):101648.
Article Google Scholar
Markovics D, Mayer MJ. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew Sustain Energy Rev. 2022;161:112364.
Article Google Scholar
Fan GF, Qing S, Wang H, Hong WC, Li HJ. Support vector regression model based on empirical mode decomposition and auto regression for electric load forecasting. Energies. 2013;6(4):1887–901.
Article Google Scholar
Sch B, Williamson RC, Bartlett PL. New support vector algorithms. Neural Comput. 2000;12:1207–45.
Article Google Scholar
Almeida MP, Muñoz M, de la Parra I, Perpiñán O. Comparative study of PV power forecast using parametric and nonparametric PV models. Sol Energy. 2017;155:854–66.
Article Google Scholar
Saini LM, Aggarwal SK, Kumar A. Parameter optimisation using genetic algorithm for support vector machine-based price-forecasting model in National electricity market. IET Gener Transm Distrib. 2010;4(1):36.
Article Google Scholar
VanDeventer W, et al. Short-term PV power forecasting using hybrid GASVM technique. Renew Energy. 2019;140:367–79.
Article Google Scholar
Netsanet S, Zheng D, Zhang W, Teshager G. Short-term PV power forecasting using variational mode decomposition integrated with Ant colony optimization and neural network. Energy Rep. 2022;8:2022–35.
Article Google Scholar
Chang C-C, Lin C-J. Libsvm. ACM Trans Intell Syst Technol. 2011;2(3):1–27.
Article Google Scholar
Solar resource maps and GIS data | Solargis. https://solargis.com/maps-and-gis-data/download/saudi-arabia. Accessed 03 Oct 2020.
Hsu, C-W, Chang C-C, Lin C-J. A practical guide to support vector classification.
Niu D, Wang K, Sun L, Wu J, Xu X. Short-term photovoltaic power generation forecasting based on random forest feature selection and CEEMD: a case study. Appl Soft Comput. 2020;93:106389.
Article Google Scholar
Miraftabzadeh SM, Longo M, Foiadelli F. A-day-ahead photovoltaic power prediction based on long short term memory algorithm. In: SEST 2020—3rd international conference on smart energy systems and technologies. 2020. p. 1–6.
Konstantinou M, Peratikou S, Charalambides AG. Solar photovoltaic forecasting of power outputusing LSTM networks. Atmosphere. 2021;12(1):124.
Article Google Scholar
Faraji J, Abazari A, Babaei M, Muyeen SM, Benbouzid M. Day-ahead optimization of prosumer considering battery depreciation and weather prediction for renewable energy sources. Appl Sci. 2020;10(8):1–22.
Article Google Scholar
Leva S, Dolara A, Grimaccia F, Mussetta M, Ogliari E. Analysis and validation of 24 hours ahead neural network forecasting of photovoltaic output power. Math Comput Simul. 2017;131:88–100.
Article MathSciNet MATH Google Scholar
Tesfaye Eseye A, Zhang J, Zheng D. Short-term photovoltaic solar power forecasting using a hybrid wavelet-PSO-SVM model based on SCADA and meteorological information. Renew Energy. 2017;118:357–67.
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Math Intell. 2001;27(2):83–5.
MATH Google Scholar
Abuella M, Chowdhury B. Solar power forecasting using support vector regression. In: Proceedings of the American Society for Engineering Management 2016.
Smola AJ, Scholkopf B. A tutorial on support vector regression. Stat Comput. 2004;14:199–222.
Article MathSciNet Google Scholar
Hong W-C. Electric load forecasting by support vector model. Appl Math Model. 2009;33:2444–54.
Article MATH Google Scholar
Wang J, Li L, Niu D, Tan Z. An annual load forecasting model based on support vector regression with differential evolution algorithm. Appl Energy. 2012;94:65–70.
Article Google Scholar
Cuevas E, Cienfuegos M, Zaldívar D, Pérez-Cisneros M. A swarm optimization algorithm inspired in the behavior of the social-spider. Expert Syst Appl. 2013;40:6374–84.
Article Google Scholar
Kennedy J, Eberhart R. Particle swarm optimization. In: IEEE international conference on, neural networks, 1995, proceedings. vol. 4, 1995. p. 1942–8.
Yang X-S, Deb S. Cuckoo search via levy flights. 2010.
Sadollah A, Sayyaadi H, Yadav A. A dynamic metaheuristic optimization model inspired by biological nervous systems: neural network algorithm. Appl Soft Comput. 2018;71:747–82.
Article Google Scholar
Renno C, Petito F, Gatto A. Artificial neural network models for predicting the solar radiation as input of a concentrating photovoltaic system. Energy Convers Manag. 2015;106:999–1012.
Article Google Scholar
Zhou HF, Zhang JW, Zhou YQ, Guo XJ, Ma YM. A feature selection algorithm of decision tree based on feature weight. Expert Syst Appl. 2021;164:113842.
Article Google Scholar

Download references

Acknowledgements

The author extends his appreciation to the Deputyship for Research & Innovation, Ministry of Education, Saudi Arabia for funding this research work through the project number (QU-IF-4-3-3-31464). The authors also thank to Qassim University for technical support.

Funding

The author extends his appreciation to the Deputyship for Research& Innovation, Ministry of Education, Saudi Arabia for funding this research work through the project number (QU-IF-4–3-3–31464). The authors also thank to Qassim University for technical support.

Author information

Authors and Affiliations

Department of Electrical Engineering, College of Engineering, Qassim University, Buraidah, 52571, Saudi Arabia
Musaed Alrashidi
Bradley Department of Electrical and Computer Engineering, Advanced Research Institute, Virginia Tech, Arlington, VA, 22203, USA
Saifur Rahman

Authors

Musaed Alrashidi
View author publications
You can also search for this author in PubMed Google Scholar
Saifur Rahman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.S.: conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review and editing, visualization. S.R.: validation, formal analysis, supervision, project administration. All authors read and approved the final manuscript.

Authors information

Saifur Rahman (Life Fellow, IEEE) received the B.Sc. degree in electrical engineering from Bangladesh University of Engineering & Technology, Dhaka, Bangladesh, in 1973, M.S. degree in electrical engineering from State University of New York, New York, NY. USA, in 1975, and the Ph.D. degree in electrical engineering from Virginia Tech, Blacksburg, VA, USA, in 1978. He is the Founding Director with the Advanced Research Institute, Virginia Tech, Arlington, VA, USA, where he is the J. R. Loring Professor of Electrical and Computer Engineering. He also directs the Center for Energy and the Global Environment. He has published over 140 journal papers and has over 400 conference and invited presentations. He is a Distinguished Lecturer for the PES and has lectured on renewable energy, energy efficiency, smart grid, energy Internet, blockchain, and IoT sensor integration in over 30 countries.

Prof. Rahman is 2022 IEEE President-Elect. He is an IEEE Millennium Medal Winner. He was the Founding Editor-in Chief of IEEE Electrification Magazine and the IEEE TRANSACTIONS ON SUSTAINABLE ENERGY. He served as the Chair of the U.S. National Science Foundation Advisory Committee for International Science and Engineering from 2010 to 2013. He was the President of the IEEE Power and Energy Society for 2018 and 2019.

Musaed Alrashidi received the Ph.D. degree in electrical and computer engineering from Virginia Tech University, Blacksburg, VA, USA in 2021, and the M.SC degree in Electrical Engineering from The School of Engineering & Applied Science at the George Washington University, DC, USA in 2016. He is currently an Assistant Professor at Electrical Engineering Department, Qassim University, Saudi Arabia. His research interests lie in renewable energy resources, smart grids, machine learning algorithms, operation of distribution networks, and advanced intelligent optimization techniques.

Corresponding author

Correspondence to Musaed Alrashidi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alrashidi, M., Rahman, S. Short-term photovoltaic power production forecasting based on novel hybrid data-driven models. J Big Data 10, 26 (2023). https://doi.org/10.1186/s40537-023-00706-7

Download citation

Received: 13 August 2022
Accepted: 23 February 2023
Published: 02 March 2023
DOI: https://doi.org/10.1186/s40537-023-00706-7

Short-term photovoltaic power production forecasting based on novel hybrid data-driven models

Abstract

Introduction

Motivation and contributions of the study

Methodology

Framework of the proposed forecasting models

Study site and dataset

Data preparation

Data cleaning

Data normalization

Forecasting models input variables

Feature combinations

Cross-validation

Backpropagation neural network

Support vector regression

Metaheuristic optimization algorithms

Model accuracy criteria

Results and discussion

Analysis of the forecasting models

Hybrid forecasting models vs. default forecasting models

Performance analysis using proposed models

Performance analysis of optimization algorithms

Best feature combination

Conclusion and future work

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors information

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords