Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Journal of Big Data

Table 10 R-squared values for extrapolation on size

Workload f(S)	Wordcount linear	SVM quadratic	Pagerank linear	Kmeans linear	Graph (NWeight) quadratic
Amdhal equation (1)	0.998 ± 0.000	0.965 ± 0.001	0.994 ± 0.000	0.997 ± 0.001	0.937 ± 0.006
Gustafson equation (2)	0.996 ± 0.001	0.949 ± 0.004	0.994 ± 0.000	0.996 ± 0.001	0.913 ± 0.008
ERNEST equation (3)	0.996 ± 0.001	0.958 ± 0.002	0.990 ± 0.000	0.998 ± 0.001	0.921 ± 0.008
2D plate equation (4)	0.997 ± 0.001	0.951 ± 0.003	0.993 ± 0.000	0.997 ± 0.001	0.940 ± 0.005
Connected graph equation (5)	0.257 ± 0.061	0.981 ± 0.001	0.996 ± 0.000	0.996 ± 0.001	0.940 ± 0.006
Con. graph \(c=1\) equation (6)	0.997 ± 0.001	0.978 ± 0.001	0.996 ± 0.000	0.998 ± 0.001	0.940 ± 0.006
Kernel ridge regression	0.836 ± 0.011	0.745 ± 0.004	0.836 ± 0.011	0.836 ± 0.011	0.904 ± 0.043
Gradient boost regression	0.875 ± 0.005	0.690 ± 0.003	0.875 ± 0.005	0.875 ± 0.005	0.775 ± 0.009

The bold data in each column indicates the largest R-squared values in the corresponding column