Skip to main content

Table 2 Node configurations [28]

From: Estimating runtime of a job in Hadoop MapReduce

Parameters

Description

Notation

Default value

dfs.blocksize

Size of the blocks or splits

BS

128 MB

mapreduce.task.io.sort.mb

The total amount of buffer memory to use while sorting files, in megabytes.

Sort.mb

100

mapreduce.task.io.sort.factor

The number of streams to merge at once while sorting files. This determines the number of open file handles.

Sort.factor

10

mapreduce.map.sort.spill.percent

A thread will begin to spill the contents to disk in the background.

Spill.percent

0.80

io.sort.record.percent

This parameter determines the percentage of sort.mb used to store map output’s metadata.

Record.percent

0.05

mapreduce.reduce.shuffle.parallelcopies

The default number of parallel transfers run by reduce during the copy (shuffle) phase.

parallelcopies

5

mapreduce.reduce.shuffle.merge.percent

The usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.

shuffle.merge.percent

0.66

mapreduce.reduce.shuffle.input.buffer.percent

The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.

Shuffle. buffer.percent

0.70

Max Heap size of reduce task

Max heap size that can be used by

reduce task

Heapr

1024 MB