Skip to main content


Table 2 Overview of machine learning toolkits

From: A survey of open source tools for machine learning with big data in the Hadoop ecosystem

  Mahout MLlib H2O SAMOA
Interface language Java Java, Python, Scala Java, Python, R, Scala Java
Associated platform MapReduce, spark (H2O and flink in progress) Spark, H2O H2O, Spark, MapReduce Storm, S4, Samza
Current version (as of June 1, 2015) 0.10.1 1.3.1 0.2.0
Graphical user interface
Classification and regression algorithms
 Decision tree a
 Logistic regression b a
 Naïve Bayes
 Support vector machine
 Gradient boosted trees
 Random forest
 Adaptive model rules a
 Generalized linear model
 Linear regression a
Clustering algorithms
 Fuzzy k-means
 Streaming k-means a
 Power iteration
 Spectral clustering
 CluStream a
Collaborative filtering (cf) algorithms
 User-based CF
 Item-based CF
 Alternating least squares
Dimensionality reduction and feature selection tools
 Principal component analysis
 QR decomposition
 Singular value decomposition
 Chi squared
Additional algorithms
 Association rule learning a
 Deep learning
 Topic modeling
  1. aReal-time streaming implementation
  2. bSingle machine, trained using Stochastic gradient descent