Skip to main content

Table 4 Spark configuration parameters

From: A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

Configuration parameters category

Spark

Tuned values

Resource utilization

num-executors

50

 

executor-cores

4

 

executor-memory

8 GB

Input split

spark.hadoop.MapReduce.input .fileinputformat.split.minsize

128 MB (default), 256MB, 512MB, 1024MB

Shuffle

spark.shuffle.file.buffer

16 k, 32 k (default), 48 k, 64 k

spark.reducer.maxSizeInFlight

32 M, 48 M (default), 64 M, 96 M

spark.hadoop.dfs.replication

1

spark.default.parallelism

80, 100, 200, 300