Skip to main content

Table 5 Description of selected Spark configuration parameters selected as the input of the proposed model

From: Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models

Parameters

Default

Range

Value used in the experiment

Description

Spark.executor.memory

1

1–12

12

Amount of memory to use per executor process, in GB

Spark.executor.cores

1

2–14

2–14

The number of cores to use on each executor

Spark.driver.memory

1

1–4

4

Amount of memory to use for the driver process, in GB

Spark.driver.cores

1

1–3

3

The number of cores to u for the driver process

Spark.shuffle.file.buffer

32

32–48

48

Size of the in-memory buffer for each shuffle file output stream, in KB

Spark.reducer. maxSizeInFlight

48

48–96

96

Maximum size of map outputs to fetch simultaneously from each reduce task, in MB

Spark.memory.fraction

0.6

0.1–0.4

0.4

Fraction of heap space used for execution and storage

Spark.memory. storageFraction

0.5

0.1–0.4

0.4

Amount of storage memory immune to eviction expressed as a fraction of the size of the region

Spark.task.maxFailures

4

4–5

5

Number of failures of any particular task before giving up on the job

Spark.speculation

False

True/false

If set to “true”, performs speculative execution of tasks

Spark.rpc.message. maxSize

128

128–256

256

Maximum message size to allow in “control plane” communication, in MB

Spark.io.compression. codec

Snappy

lz4/lzf/snappy

Snappy

Compress map output files

Spark.io.compression. snappy.blockSize

32

32–128

32

Block size in snappy compression, in KB