Skip to main content

Table 6 R-squared estimates for all the workloads

From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Workloads

Size (MB/GB)

R-squared proposed model

R-squared Amdahl

R-squared Gustafson

WordCount (Fig. 14)

313.6 MB

0.945

0.743

0.344

940 MB

0.874

0.937

0.641

5.9 GB

0.999

0.992

0.956

8.8 GB

0.996

0.995

0.981

19.2 GB

0.997

0.997

0.995

SVM (Fig. 16)

34 MB

0.958

0.586

0.175

60 MB

0.962

0.596

0.184

1.2 GB

0.971

0.662

0.249

1.8 GB

0.962

0.611

0.198

2 GB

0.956

0.925

0.827

NWeight (Fig. 15)

37 MB

0.823

0.912

0.706

73 MB

0.973

0.997

0.917

119 MB

0.886

0.936

0.970

155 MB

0.893

0.890

0.843

211 MB

0.967

0.966

0.934

K-means (large job) (Fig. 21)

19 GB

0.943

0.912

0.974

56 GB

0.999

0.998

0.979

94 GB

0.999

0.999

0.986

130 GB

0.997

0.997

0.971

168 GB

0.999

0.998

0.997

K-means (small job) (Fig. 18)

1.3 MB

0.670

0.233

0.007

2.73 MB

0.941

0.338

0.024

4 MB

0.803

0.750

0.425

5.30 MB

0.087

0.346

0.400

13.3 MB

0.649

0.840

0.849

Pagerank (large job) (Fig. 20)

507 MB

0.992

0.994

0.997

1.6 GB

0.974

0.983

0.997

2.8 GB

0.991

0.990

0.990

4 GB

0.995

0.995

0.995

5 GB

0.996

0.996

0.990

Pagerank (small job) (Fig. 17)

3.8 MB

0.664

0.137

0.001

5.7 MB

0.535

0.372

0.105

8 MB

0.897

0.670

0.253

10 MB

0.541

0.730

0.474

12.2 MB

0.668

0.144

0.000