Skip to main content

Table 1 Data processing engines for Hadoop

From: A survey of open source tools for machine learning with big data in the Hadoop ecosystem

  Current stable release (as of June 1, 2015) Execution model Supported languages Associated ML tools In-memory processing Low latency Fault tolerance Enterprise support
MapReduce 2.7.0 Batch Java Mahout × × ×
Spark 1.3.1 Batch, streaming Java, Python, R, Scala MLlib, Mahout, H2O
Flink 0.8.1 Batch, streaming Java, Scala Flink-ML, SAMOA ×
Storm 0.9.4 Streaming Any SAMOA ×
H2O Batch Java, Python, R, Scala H2O, Mahout, MLlib