Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Journal of Big Data

Table 11 R-squared values for Extrapolation on Number of Executors

Workload f(S)	Wordcount linear	SVM quadratic	Pagerank linear	Kmeans linear	Graph (NWeight) quadratic
Amdhal equation (1)	0.997 ± 0.000	0.878 ± 0.002	0.994 ± 0.000	0.994 ± 0.001	0.932 ± 0.001
Gustafson equation (2)	0.995 ± 0.000	0.728 ± 0.000	0.988 ± 0.000	0.921 ± 0.001	0.922 ± 0.000
ERNEST equation (3)	0.996 ± 0.000	0.822 ± 0.002	0.991 ± 0.000	0.992 ± 0.001	0.930 ± 0.007
2D plate equation (4)	0.997 ± 0.000	0.893 ± 0.004	0.994 ± 0.000	0.992 ± 0.002	0.945 ± 0.004
Connected graph equation (5)	0.996 ± 0.000	0.853 ± 0.072	0.994 ± 0.000	0.995 ± 0.001	0.917 ± 0.002
Con. graph \(c=1\) equation (6)	0.997 ± 0.000	0.850 ± 0.072	0.994 ± 0.000	0.995 ± 0.001	0.916 ± 0.002
Kernel ridge regression	0.786 ± 0.012	0.619 ± 0.012	0.917 ± 0.014	0.917 ± 0.014	0.150 ± 0.060
Gradient boost regression	0.535 ± 0.001	0.408 ± 0.007	0.510 ± 0.011	0.510 ± 0.011	0.371 ± 0.015

The bold data in each column indicates the largest R-squared values in the corresponding column