Skip to main content

Table 11 R-squared values for Extrapolation on Number of Executors

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Workload f(S) Wordcount linear SVM quadratic Pagerank linear Kmeans linear Graph (NWeight) quadratic
Amdhal equation (1) 0.997 ± 0.000 0.878 ± 0.002 0.994 ± 0.000 0.994 ± 0.001 0.932 ± 0.001
Gustafson equation (2) 0.995 ± 0.000 0.728 ± 0.000 0.988 ± 0.000 0.921 ± 0.001 0.922 ± 0.000
ERNEST equation (3) 0.996 ± 0.000 0.822 ± 0.002 0.991 ± 0.000 0.992 ± 0.001 0.930 ± 0.007
2D plate equation (4) 0.997 ± 0.000 0.893 ± 0.004 0.994 ± 0.000 0.992 ± 0.002 0.945 ± 0.004
Connected graph equation (5) 0.996 ± 0.000 0.853 ± 0.072 0.994 ± 0.000 0.995 ± 0.001 0.917 ± 0.002
Con. graph \(c=1\) equation (6) 0.997 ± 0.000 0.850 ± 0.072 0.994 ± 0.000 0.995 ± 0.001 0.916 ± 0.002
Kernel ridge regression 0.786 ± 0.012 0.619 ± 0.012 0.917 ± 0.014 0.917 ± 0.014 0.150 ± 0.060
Gradient boost regression 0.535 ± 0.001 0.408 ± 0.007 0.510 ± 0.011 0.510 ± 0.011 0.371 ± 0.015
  1. The bold data in each column indicates the largest R-squared values in the corresponding column