From: Programming big data analysis: principles and solutions
System | Advantages | Disadvantages |
---|---|---|
Hadoop | Fault tolerance, low cost, very large open source community | Verbosity, batch processing only, small files issues, inefficiency with iterative applications |
Spark | In-memory computing, ease-of-use, flexibility, libraries for advanced analytics, scalable machine learning support | No automatic optimization process, small files issues, high memory consumption |
Storm | Multi-language support, low-latency response time | Message ordering not guaranteed |
Hama | Many Distributed FS supported, general-purpose computing on GPUs, conflicts and deadlines avoidance | Single point of failure (BSP Master), low flexibility of partitioning policies, small community |
MPI | Efficiency, portability, shared or distributed memory | Hard to debug, bottleneck in network communication |
Hive | Large distributed datasets querying, SQL-like language, UDFs for advanced data analysis | Support only for OLAP, real-time data access not supported |
Pig | High-level procedural language, UDFs for advanced data analysis, easy learning and development | Small community, hard to tune performance |