Skip to main content

Table 11 R-squared values for Extrapolation on Number of Executors

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Workload f(S)

Wordcount linear

SVM quadratic

Pagerank linear

Kmeans linear

Graph (NWeight) quadratic

Amdhal equation (1)

0.997 ± 0.000

0.878 ± 0.002

0.994 ± 0.000

0.994 ± 0.001

0.932 ± 0.001

Gustafson equation (2)

0.995 ± 0.000

0.728 ± 0.000

0.988 ± 0.000

0.921 ± 0.001

0.922 ± 0.000

ERNEST equation (3)

0.996 ± 0.000

0.822 ± 0.002

0.991 ± 0.000

0.992 ± 0.001

0.930 ± 0.007

2D plate equation (4)

0.997 ± 0.000

0.893 ± 0.004

0.994 ± 0.000

0.992 ± 0.002

0.945 ± 0.004

Connected graph equation (5)

0.996 ± 0.000

0.853 ± 0.072

0.994 ± 0.000

0.995 ± 0.001

0.917 ± 0.002

Con. graph \(c=1\) equation (6)

0.997 ± 0.000

0.850 ± 0.072

0.994 ± 0.000

0.995 ± 0.001

0.916 ± 0.002

Kernel ridge regression

0.786 ± 0.012

0.619 ± 0.012

0.917 ± 0.014

0.917 ± 0.014

0.150 ± 0.060

Gradient boost regression

0.535 ± 0.001

0.408 ± 0.007

0.510 ± 0.011

0.510 ± 0.011

0.371 ± 0.015

  1. The bold data in each column indicates the largest R-squared values in the corresponding column