From: A survey of open source tools for machine learning with big data in the Hadoop ecosystem
Mahout | MLlib | H2O | SAMOA | |
---|---|---|---|---|
Interface language | Java | Java, Python, Scala | Java, Python, R, Scala | Java |
Associated platform | MapReduce, spark (H2O and flink in progress) | Spark, H2O | H2O, Spark, MapReduce | Storm, S4, Samza |
Current version (as of June 1, 2015) | 0.10.1 | 1.3.1 | 3.0.0.12 | 0.2.0 |
Graphical user interface | – | – | ✓ | – |
Classification and regression algorithms | ||||
Decision tree | – | ✓ | – | ✓a |
Logistic regression | ✓b | ✓a | ✓ | – |
Naïve Bayes | ✓ | ✓ | ✓ | – |
Support vector machine | – | ✓ | – | – |
Gradient boosted trees | – | ✓ | ✓ | – |
Random forest | ✓ | ✓ | ✓ | – |
Adaptive model rules | – | – | – | ✓a |
Generalized linear model | – | – | ✓ | – |
Linear regression | – | ✓a | ✓ | – |
Clustering algorithms | ||||
k-Means | ✓ | ✓ | ✓ | – |
Fuzzy k-means | ✓ | – | – | – |
Streaming k-means | ✓ | ✓a | – | – |
Power iteration | – | ✓ | – | – |
Spectral clustering | ✓ | – | – | – |
CluStream | – | – | – | ✓a |
Collaborative filtering (cf) algorithms | ||||
User-based CF | ✓ | – | – | – |
Item-based CF | ✓ | – | – | – |
Alternating least squares | ✓ | ✓ | – | – |
Dimensionality reduction and feature selection tools | ||||
Principal component analysis | ✓ | ✓ | ✓ | – |
QR decomposition | ✓ | – | – | – |
Singular value decomposition | ✓ | ✓ | – | – |
Chi squared | ✓ | – | – | – |
Additional algorithms | ||||
Association rule learning | ✓ | ✓ | – | ✓a |
Deep learning | – | – | ✓ | – |
Topic modeling | ✓ | ✓ | – | – |