A comparison of machine learning methods for ozone pollution prediction

Journal of Big Data

Table 4 Stage 2, the comparison of machine learning methods on ozone prediction, where the value before slash represents the loss value in the training dataset and the latter in the testing dataset (training/testing), the best performance is bold and the line colored grey means over-fitting

Model	rMSE \(\downarrow\)	MAE \(\downarrow\)	MAPE \(\downarrow\)	R-square \(\uparrow\)	J2 \(\downarrow\)	Time (min)
Linr	9.847/11.958	7.634/10.116	0.484/1.052	0.597/-0.111	1.475	0.01
Linr_l2	9.847/11.956	7.635/10.114	0.484/1.051	0.597/-0.11	1.474	0
Lasso	11.035/13.558	8.867/12.345	0.77/2.111	0.494/0.379	1.51	0.01
PLSR	9.956/12.148	7.761/10.339	0.496/1.067	0.588/0.013	1.489	0.01
GRP_Expo	7.558/9.968	5.696/7.383	0.31/1.051	0.763/0.302	1.739	9.04
GRP_DotProd	9.848/11.94	7.635/10.09	0.482/1.049	0.597/-0.111	1.47	132.69
GRP_Matern	0.0/20.868	0.0/17.887	0.0/1.0	1.0/0.0	inf	25.77
SVR_linear	9.901/12.106	7.582/9.98	0.464/1.075	0.593/-0.247	1.495	0.01
SVR_poly	9.937/11.656	7.663/9.096	0.56/0.949	0.59/-0.141	1.376	6.82
SVR_rbf	8.54/9.19	6.437/7.302	0.38/1.047	0.697/0.499	1.158	9.24
SVR_sigmoid	9.947/12.026	7.69/10.106	0.484/1.052	0.589/-0.163	1.462	10.24
MLP_1	8.821/9.505	6.703/7.642	0.41/1.056	0.677/0.416	1.161	1.33
MLP_2	8.809/9.985	6.704/8.022	0.397/1.23	0.678/0.457	1.285	1.42
RF	8.059/9.921	6.153/7.874	0.357/0.953	0.73/0.436	1.515	0.06
Bagging	2.919/10.164	1.94/7.99	0.097/0.918	0.965/0.377	12.124	0.1
GBoost	8.191/9.856	6.218/7.926	0.366/0.863	0.721/0.394	1.448	0.26
AdaBoost	9.413/13.832	7.46/12.113	0.561/1.758	0.638/0.445	2.159	0.11
HistGBoost	7.214/9.509	5.458/7.53	0.305/0.84	0.784/0.419	1.737	8.42
LightGBM	9.712/12.974	7.721/11.671	0.69/2.167	0.608/0.586	1.785	0.01

In addition, the \(R^2\) of the orange lines is less than 0, representing that the performance of the corresponding model is worse than that of directly using the mean value. In addition, the \(\uparrow\) means the model is better when the value is larger, and the \(\downarrow\) means the model is better when the value is smaller