From: Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud
Parameters | MapReduce | Apache Tez |
---|---|---|
Types of queries | MapReduce supports batch oriented queries [7] | Apache Tez supports interactive queries |
Usability | MapReduce is the backbone of hadoop ecosystem and Apache Pig relies on this framework | Apache Tez also works for Apache Pig but it is very useful in interactive scenarios |
Processing model | MapReduce always requires a map phase before the reduce phase | A single Map phase and we may have multiple reduce phases |
Hadoop version | MapReduce is backbone of hadoop available in all hadoop versions | Apache Tez is available in Apache Hadoop 2.0 and above |
Response time | Slower due to the access of HDFS after every Map and Reduce phase | High due to lesser job splitting and HDFS access |
Temporary data storage | Stores temporary data into HDFS after every map and reduce phase [8] | Apache Tez doesn’t write data into HDFS, so it is more efficient |
Usage of hadoop containers | MapReduce divide the task into more jobs. So more containers required for more jobs | Apache Tez reduces this inefficiency by dividing the task into lesser no of jobs and also by using existing containers |