Skip to main content

Table 4 Stage 2, the comparison of machine learning methods on ozone prediction, where the value before slash represents the loss value in the training dataset and the latter in the testing dataset (training/testing), the best performance is bold and the line colored grey means over-fitting

From: A comparison of machine learning methods for ozone pollution prediction

Model

rMSE \(\downarrow\)

MAE \(\downarrow\)

MAPE \(\downarrow\)

R-square \(\uparrow\)

J2 \(\downarrow\)

Time (min)

Linr

9.847/11.958

7.634/10.116

0.484/1.052

0.597/-0.111

1.475

0.01

Linr_l2

9.847/11.956

7.635/10.114

0.484/1.051

0.597/-0.11

1.474

0

Lasso

11.035/13.558

8.867/12.345

0.77/2.111

0.494/0.379

1.51

0.01

PLSR

9.956/12.148

7.761/10.339

0.496/1.067

0.588/0.013

1.489

0.01

GRP_Expo

7.558/9.968

5.696/7.383

0.31/1.051

0.763/0.302

1.739

9.04

GRP_DotProd

9.848/11.94

7.635/10.09

0.482/1.049

0.597/-0.111

1.47

132.69

GRP_Matern

0.0/20.868

0.0/17.887

0.0/1.0

1.0/0.0

inf

25.77

SVR_linear

9.901/12.106

7.582/9.98

0.464/1.075

0.593/-0.247

1.495

0.01

SVR_poly

9.937/11.656

7.663/9.096

0.56/0.949

0.59/-0.141

1.376

6.82

SVR_rbf

8.54/9.19

6.437/7.302

0.38/1.047

0.697/0.499

1.158

9.24

SVR_sigmoid

9.947/12.026

7.69/10.106

0.484/1.052

0.589/-0.163

1.462

10.24

MLP_1

8.821/9.505

6.703/7.642

0.41/1.056

0.677/0.416

1.161

1.33

MLP_2

8.809/9.985

6.704/8.022

0.397/1.23

0.678/0.457

1.285

1.42

RF

8.059/9.921

6.153/7.874

0.357/0.953

0.73/0.436

1.515

0.06

Bagging

2.919/10.164

1.94/7.99

0.097/0.918

0.965/0.377

12.124

0.1

GBoost

8.191/9.856

6.218/7.926

0.366/0.863

0.721/0.394

1.448

0.26

AdaBoost

9.413/13.832

7.46/12.113

0.561/1.758

0.638/0.445

2.159

0.11

HistGBoost

7.214/9.509

5.458/7.53

0.305/0.84

0.784/0.419

1.737

8.42

LightGBM

9.712/12.974

7.721/11.671

0.69/2.167

0.608/0.586

1.785

0.01

  1. In addition, the \(R^2\) of the orange lines is less than 0, representing that the performance of the corresponding model is worse than that of directly using the mean value. In addition, the \(\uparrow\) means the model is better when the value is larger, and the \(\downarrow\) means the model is better when the value is smaller