From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters
Benchmark categories | Application | Input data size | Input samples |
---|---|---|---|
Micro Benchmark | WordCount | 313 MB, 940 MB, 5.9 GB, 8.8 GB, and 19.2 GB | - |
Machine learning | K-means (small job) | 1.3 MB, 2.7 MB, 4 MB, 5.3 MB, and 13.3 MB | 3000, 5000, 7000 (sample), 1 and 3 (million samples) |
K-means (large job) | 19 GB, 56 GB, 94 GB, 130 GB, and 168 GB | 10, 30, 50, 70, and 90 (million samples) | |
SVM | 34 MB, 60 MB, 1.2 GB, 1.8 GB, and 2 GB | 2100, 2600, 3600, 4100, and 5100 (samples) | |
Web search | PageRank (small job) | 3.8 MB, 5.7 MB, 8 MB, 10 MB, and 12.2 MB | 1, 15, 20, 25, and 30 (thousand of samples) |
PageRank (large job) | 507 MB, 1.6 GB, 2.8 GB, 4 GB, and 5 GB | 1, 3, 5, 7, and 9 (million of pages) | |
Graph | Nweight | 37 MB, 70 MB, 129 MB, 155 MB, and 211 MB | 1, 2, 4, 5, and 7 (million of edges) |