Skip to main content

Table 3 Spark HiBenchmark workload considered for this study

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Benchmark categories Application Input data size Input samples
Multiple-Exec. Single-Exec.
Micro benchmark WordCount 313 MB, 940 MB, 5.9 GB, 8.8 GB, and 19.2 GB 3 GB, 5 GB, 7 GB, 10 GB, 12.8 GB, 14.4 GB, 16 GB, 18 GB, and 21.6 GB
Machine learning kmeans 19 GB, 56 GB, 94 GB, 130 GB, and 168 GB 1 GB, 38 GB, 75 GB, 113 GB, 149 GB, and 187 GB 10, 30, 50, 70, and 90 (million samples)
SVM 34 MB, 60 MB, 1.2 GB, 1.8 GB and 2 GB 200 MB, 400 MB, 600 MB, 800 MB, 1.35 GB, 2 GB, 2.3 GB, and 2.5 GB 2100, 2600, 3600, 4100, and 5100 (samples)
Web search Pagerank 507 MB, 1.6 GB, 2.8 GB, 4 GB, and 5 GB 100 MB, 250 MB, 750 MB, 6 GB, 7 GB, 8 GB, 9 GB, and 10 GB 1, 3, 5, 7, and 9 (million of pages)
Graph NWeight 37 MB, 70 MB, 129 MB, 155 MB, and 211 MB 20 MB, 55 MB, 99 MB, 141 MB, 175 MB, 214 MB, 247 MB, 262 MB, and 286 MB 1, 2, 4, 5, and 7 (million of edges)