Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

Journal of Big Data

Table 1 Difference between MapReduce and Apache Tez on the basis of different parameters

Parameters	MapReduce	Apache Tez
Types of queries	MapReduce supports batch oriented queries [7]	Apache Tez supports interactive queries
Usability	MapReduce is the backbone of hadoop ecosystem and Apache Pig relies on this framework	Apache Tez also works for Apache Pig but it is very useful in interactive scenarios
Processing model	MapReduce always requires a map phase before the reduce phase	A single Map phase and we may have multiple reduce phases
Hadoop version	MapReduce is backbone of hadoop available in all hadoop versions	Apache Tez is available in Apache Hadoop 2.0 and above
Response time	Slower due to the access of HDFS after every Map and Reduce phase	High due to lesser job splitting and HDFS access
Temporary data storage	Stores temporary data into HDFS after every map and reduce phase [8]	Apache Tez doesn’t write data into HDFS, so it is more efficient
Usage of hadoop containers	MapReduce divide the task into more jobs. So more containers required for more jobs	Apache Tez reduces this inefficiency by dividing the task into lesser no of jobs and also by using existing containers