Skip to main content

Table 3 Spark HiBenchmark workload considered for this study

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Benchmark categories

Application

Input data size

Input samples

Multiple-Exec.

Single-Exec.

Micro benchmark

WordCount

313 MB, 940 MB, 5.9 GB, 8.8 GB, and 19.2 GB

3 GB, 5 GB, 7 GB, 10 GB, 12.8 GB, 14.4 GB, 16 GB, 18 GB, and 21.6 GB

–

Machine learning

kmeans

19 GB, 56 GB, 94 GB, 130 GB, and 168 GB

1 GB, 38 GB, 75 GB, 113 GB, 149 GB, and 187 GB

10, 30, 50, 70, and 90 (million samples)

SVM

34 MB, 60 MB, 1.2 GB, 1.8 GB and 2 GB

200 MB, 400 MB, 600 MB, 800 MB, 1.35 GB, 2 GB, 2.3 GB, and 2.5 GB

2100, 2600, 3600, 4100, and 5100 (samples)

Web search

Pagerank

507 MB, 1.6 GB, 2.8 GB, 4 GB, and 5 GB

100 MB, 250 MB, 750 MB, 6 GB, 7 GB, 8 GB, 9 GB, and 10 GB

1, 3, 5, 7, and 9 (million of pages)

Graph

NWeight

37 MB, 70 MB, 129 MB, 155 MB, and 211 MB

20 MB, 55 MB, 99 MB, 141 MB, 175 MB, 214 MB, 247 MB, 262 MB, and 286 MB

1, 2, 4, 5, and 7 (million of edges)