From: Estimating runtime of a job in Hadoop MapReduce
Parameters
Notation
Input data size
Dinput
Sample data size
Dsample
Map run time in sample data
Tm-sample
Reduce run time in sample data
Tr-sample
Output to input ratio in Map stage
Selm
Output to input ratio in Reduce stage
Selr