From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters
Workloads | Stages | Parallel Stages | Collect | Serialization | Deserialization | Shuffle | Aggregate |
---|---|---|---|---|---|---|---|
WC | 2 | Yes | Yes | – | – | Yes | – |
SVM | 209 | Yes | Yes | No | Yes | Yes | Yes |
Nweight | 9 | Yes | – | No | Yes | Yes | – |
K-means | 20 | Yes | Yes | Yes | Yes | Yes | – |
Pagerank | 5 | Yes | – | No | Yes | Yes | – |