Skip to main content

Table 3 Spark HiBenchmark workload considered in this study

From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Benchmark categories

Application

Input data size

Input samples

Micro Benchmark

WordCount

313 MB, 940 MB, 5.9 GB, 8.8 GB, and 19.2 GB

-

Machine learning

K-means (small job)

1.3 MB, 2.7 MB, 4 MB, 5.3 MB, and 13.3 MB

3000, 5000, 7000 (sample), 1 and 3 (million samples)

K-means (large job)

19 GB, 56 GB, 94 GB, 130 GB, and 168 GB

10, 30, 50, 70, and 90 (million samples)

SVM

34 MB, 60 MB, 1.2 GB, 1.8 GB, and 2 GB

2100, 2600, 3600, 4100, and 5100 (samples)

Web search

PageRank (small job)

3.8 MB, 5.7 MB, 8 MB, 10 MB, and 12.2 MB

1, 15, 20, 25, and 30 (thousand of samples)

PageRank (large job)

507 MB, 1.6 GB, 2.8 GB, 4 GB, and 5 GB

1, 3, 5, 7, and 9 (million of pages)

Graph

Nweight

37 MB, 70 MB, 129 MB, 155 MB, and 211 MB

1, 2, 4, 5, and 7 (million of edges)