A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment

Journal of Big Data

Table 1 Spark and Hadoop MapReduce comparison

	Hadoop MapReduce	Apache Spark
Definition	Open source big data framework wich deals with structured and unstructured data that are stored in HDFS, Hadoop MapReduce is designed in a way to process a large amount of data on a cluster	Open source big data framework, it’s a flexible in-memory framework that allows it to handle batch and real-time analytic and data processing workloads. Spark is basically designed for fast computation
Speed	Reading and writing from/to the file system and disk slows down the processing speed	100 times faster in memory and 10 times faster even when runing on disk than hadoop MapReduce. Because of run computation in memory
Easy of use	In Hadoop MapReduce, developers need to code each operation and require abstractions, so it is difficult to easily program each problem	Spark is easier to use than Hadoop, because it has whole of high-level operators with RDDs
Real-time analysis	No	Yes
Execution model	Batch	Batch, streaming
In-memory	No	Yes