Skip to main content

Table 5 Description of selected Spark configuration parameters selected as the input of the proposed model

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Parameters Default Range Value used in the experiment Description
Spark.executor.memory 1 1–12 12 Amount of memory to use per executor process, in GB
Spark.executor.cores 1 2–14 2–14 The number of cores to use on each executor
Spark.driver.memory 1 1–4 4 Amount of memory to use for the driver process, in GB
Spark.driver.cores 1 1–3 3 The number of cores to u for the driver process
Spark.shuffle.file.buffer 32 32–48 48 Size of the in-memory buffer for each shuffle file output stream, in KB
Spark.reducer. maxSizeInFlight 48 48–96 96 Maximum size of map outputs to fetch simultaneously from each reduce task, in MB
Spark.memory.fraction 0.6 0.1–0.4 0.4 Fraction of heap space used for execution and storage
Spark.memory. storageFraction 0.5 0.1–0.4 0.4 Amount of storage memory immune to eviction expressed as a fraction of the size of the region
Spark.task.maxFailures 4 4–5 5 Number of failures of any particular task before giving up on the job
Spark.speculation False True/false If set to “true”, performs speculative execution of tasks
Spark.rpc.message. maxSize 128 128–256 256 Maximum message size to allow in “control plane” communication, in MB codec Snappy lz4/lzf/snappy Snappy Compress map output files snappy.blockSize 32 32–128 32 Block size in snappy compression, in KB