From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters
Workloads | Size (MB/GB) | R-squared proposed model | R-squared Amdahl | R-squared Gustafson |
---|---|---|---|---|
WordCount (Fig. 14) | 313.6 MB | 0.945 | 0.743 | 0.344 |
940 MB | 0.874 | 0.937 | 0.641 | |
5.9 GB | 0.999 | 0.992 | 0.956 | |
8.8 GB | 0.996 | 0.995 | 0.981 | |
19.2 GB | 0.997 | 0.997 | 0.995 | |
SVM (Fig. 16) | 34 MB | 0.958 | 0.586 | 0.175 |
60 MB | 0.962 | 0.596 | 0.184 | |
1.2 GB | 0.971 | 0.662 | 0.249 | |
1.8 GB | 0.962 | 0.611 | 0.198 | |
2 GB | 0.956 | 0.925 | 0.827 | |
NWeight (Fig. 15) | 37 MB | 0.823 | 0.912 | 0.706 |
73 MB | 0.973 | 0.997 | 0.917 | |
119 MB | 0.886 | 0.936 | 0.970 | |
155 MB | 0.893 | 0.890 | 0.843 | |
211 MB | 0.967 | 0.966 | 0.934 | |
K-means (large job) (Fig. 21) | 19 GB | 0.943 | 0.912 | 0.974 |
56 GB | 0.999 | 0.998 | 0.979 | |
94 GB | 0.999 | 0.999 | 0.986 | |
130 GB | 0.997 | 0.997 | 0.971 | |
168 GB | 0.999 | 0.998 | 0.997 | |
K-means (small job) (Fig. 18) | 1.3 MB | 0.670 | 0.233 | 0.007 |
2.73 MB | 0.941 | 0.338 | 0.024 | |
4 MB | 0.803 | 0.750 | 0.425 | |
5.30 MB | 0.087 | 0.346 | 0.400 | |
13.3 MB | 0.649 | 0.840 | 0.849 | |
Pagerank (large job) (Fig. 20) | 507 MB | 0.992 | 0.994 | 0.997 |
1.6 GB | 0.974 | 0.983 | 0.997 | |
2.8 GB | 0.991 | 0.990 | 0.990 | |
4 GB | 0.995 | 0.995 | 0.995 | |
5 GB | 0.996 | 0.996 | 0.990 | |
Pagerank (small job) (Fig. 17) | 3.8 MB | 0.664 | 0.137 | 0.001 |
5.7 MB | 0.535 | 0.372 | 0.105 | |
8 MB | 0.897 | 0.670 | 0.253 | |
10 MB | 0.541 | 0.730 | 0.474 | |
12.2 MB | 0.668 | 0.144 | 0.000 |