| Hadoop MapReduce | Apache Spark |
---|---|---|
Definition | Open source big data framework wich deals with structured and unstructured data that are stored in HDFS, Hadoop MapReduce is designed in a way to process a large amount of data on a cluster | Open source big data framework, it’s a flexible in-memory framework that allows it to handle batch and real-time analytic and data processing workloads. Spark is basically designed for fast computation |
Speed | Reading and writing from/to the file system and disk slows down the processing speed | 100 times faster in memory and 10 times faster even when runing on disk than hadoop MapReduce. Because of run computation in memory |
Easy of use | In Hadoop MapReduce, developers need to code each operation and require abstractions, so it is difficult to easily program each problem | Spark is easier to use than Hadoop, because it has whole of high-level operators with RDDs |
Real-time analysis | No | Yes |
Execution model | Batch | Batch, streaming |
In-memory | No | Yes |