Skip to main content

Table 1 Difference between MapReduce and Apache Tez on the basis of different parameters

From: Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

Parameters MapReduce Apache Tez
Types of queries MapReduce supports batch oriented queries [7] Apache Tez supports interactive queries
Usability MapReduce is the backbone of hadoop ecosystem and Apache Pig relies on this framework Apache Tez also works for Apache Pig but it is very useful in interactive scenarios
Processing model MapReduce always requires a map phase before the reduce phase A single Map phase and we may have multiple reduce phases
Hadoop version MapReduce is backbone of hadoop available in all hadoop versions Apache Tez is available in Apache Hadoop 2.0 and above
Response time Slower due to the access of HDFS after every Map and Reduce phase High due to lesser job splitting and HDFS access
Temporary data storage Stores temporary data into HDFS after every map and reduce phase [8] Apache Tez doesn’t write data into HDFS, so it is more efficient
Usage of hadoop containers MapReduce divide the task into more jobs. So more containers required for more jobs Apache Tez reduces this inefficiency by dividing the task into lesser no of jobs and also by using existing containers