Skip to main content

Table 2 Overview of machine learning toolkits

From: A survey of open source tools for machine learning with big data in the Hadoop ecosystem

 

Mahout

MLlib

H2O

SAMOA

Interface language

Java

Java, Python, Scala

Java, Python, R, Scala

Java

Associated platform

MapReduce, spark (H2O and flink in progress)

Spark, H2O

H2O, Spark, MapReduce

Storm, S4, Samza

Current version (as of June 1, 2015)

0.10.1

1.3.1

3.0.0.12

0.2.0

Graphical user interface

Classification and regression algorithms

 Decision tree

a

 Logistic regression

b

a

 Naïve Bayes

 Support vector machine

 Gradient boosted trees

 Random forest

 Adaptive model rules

a

 Generalized linear model

 Linear regression

a

Clustering algorithms

 k-Means

 Fuzzy k-means

 Streaming k-means

a

 Power iteration

 Spectral clustering

 CluStream

a

Collaborative filtering (cf) algorithms

 User-based CF

 Item-based CF

 Alternating least squares

Dimensionality reduction and feature selection tools

 Principal component analysis

 QR decomposition

 Singular value decomposition

 Chi squared

Additional algorithms

 Association rule learning

a

 Deep learning

 Topic modeling

  1. aReal-time streaming implementation
  2. bSingle machine, trained using Stochastic gradient descent