Skip to main content

Table 1 Data processing engines for Hadoop

From: A survey of open source tools for machine learning with big data in the Hadoop ecosystem

 

Current stable release (as of June 1, 2015)

Execution model

Supported languages

Associated ML tools

In-memory processing

Low latency

Fault tolerance

Enterprise support

MapReduce

2.7.0

Batch

Java

Mahout

×

×

×

Spark

1.3.1

Batch, streaming

Java, Python, R, Scala

MLlib, Mahout, H2O

Flink

0.8.1

Batch, streaming

Java, Scala

Flink-ML, SAMOA

×

Storm

0.9.4

Streaming

Any

SAMOA

×

H2O

3.0.0.12

Batch

Java, Python, R, Scala

H2O, Mahout, MLlib