Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Journal of Big Data

Table 5 Description of selected Spark configuration parameters selected as the input of the proposed model

Parameters	Default	Range	Value used in the experiment	Description
Spark.executor.memory	1	1–12	12	Amount of memory to use per executor process, in GB
Spark.executor.cores	1	2–14	2–14	The number of cores to use on each executor
Spark.driver.memory	1	1–4	4	Amount of memory to use for the driver process, in GB
Spark.driver.cores	1	1–3	3	The number of cores to u for the driver process
Spark.shuffle.file.buffer	32	32–48	48	Size of the in-memory buffer for each shuffle file output stream, in KB
Spark.reducer. maxSizeInFlight	48	48–96	96	Maximum size of map outputs to fetch simultaneously from each reduce task, in MB
Spark.memory.fraction	0.6	0.1–0.4	0.4	Fraction of heap space used for execution and storage
Spark.memory. storageFraction	0.5	0.1–0.4	0.4	Amount of storage memory immune to eviction expressed as a fraction of the size of the region
Spark.task.maxFailures	4	4–5	5	Number of failures of any particular task before giving up on the job
Spark.speculation	False	True/false	–	If set to “true”, performs speculative execution of tasks
Spark.rpc.message. maxSize	128	128–256	256	Maximum message size to allow in “control plane” communication, in MB
Spark.io.compression. codec	Snappy	lz4/lzf/snappy	Snappy	Compress map output files
Spark.io.compression. snappy.blockSize	32	32–128	32	Block size in snappy compression, in KB