Skip to main content

Table 1 Spark and Hadoop MapReduce comparison

From: A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment

 

Hadoop MapReduce

Apache Spark

Definition

Open source big data framework wich deals with structured and unstructured data that are stored in HDFS, Hadoop MapReduce is designed in a way to process a large amount of data on a cluster

Open source big data framework, it’s a flexible in-memory framework that allows it to handle batch and real-time analytic and data processing workloads. Spark is basically designed for fast computation

Speed

Reading and writing from/to the file system and disk slows down the processing speed

100 times faster in memory and 10 times faster even when runing on disk than hadoop MapReduce. Because of run computation in memory

Easy of use

In Hadoop MapReduce, developers need to code each operation and require abstractions, so it is difficult to easily program each problem

Spark is easier to use than Hadoop, because it has whole of high-level operators with RDDs

Real-time analysis

No

Yes

Execution model

Batch

Batch, streaming

In-memory

No

Yes