Skip to main content

Table 1 Difference between MapReduce and Apache Tez on the basis of different parameters

From: Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

Parameters

MapReduce

Apache Tez

Types of queries

MapReduce supports batch oriented queries [7]

Apache Tez supports interactive queries

Usability

MapReduce is the backbone of hadoop ecosystem and Apache Pig relies on this framework

Apache Tez also works for Apache Pig but it is very useful in interactive scenarios

Processing model

MapReduce always requires a map phase before the reduce phase

A single Map phase and we may have multiple reduce phases

Hadoop version

MapReduce is backbone of hadoop available in all hadoop versions

Apache Tez is available in Apache Hadoop 2.0 and above

Response time

Slower due to the access of HDFS after every Map and Reduce phase

High due to lesser job splitting and HDFS access

Temporary data storage

Stores temporary data into HDFS after every map and reduce phase [8]

Apache Tez doesn’t write data into HDFS, so it is more efficient

Usage of hadoop containers

MapReduce divide the task into more jobs. So more containers required for more jobs

Apache Tez reduces this inefficiency by dividing the task into lesser no of jobs and also by using existing containers