Skip to main content

Table 5 Workload application characteristics

From: A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters

Workloads

Stages

Parallel Stages

Collect

Serialization

Deserialization

Shuffle

Aggregate

WC

2

Yes

Yes

Yes

SVM

209

Yes

Yes

No

Yes

Yes

Yes

Nweight

9

Yes

No

Yes

Yes

K-means

20

Yes

Yes

Yes

Yes

Yes

Pagerank

5

Yes

No

Yes

Yes