Skip to main content

Table 3 Advantages and disadvantages of the systems

From: Programming big data analysis: principles and solutions

System

Advantages

Disadvantages

Hadoop

Fault tolerance, low cost, very large open source community

Verbosity, batch processing only, small files issues, inefficiency with iterative applications

Spark

In-memory computing, ease-of-use, flexibility, libraries for advanced analytics, scalable machine learning support

No automatic optimization process, small files issues, high memory consumption

Storm

Multi-language support, low-latency response time

Message ordering not guaranteed

Hama

Many Distributed FS supported, general-purpose computing on GPUs, conflicts and deadlines avoidance

Single point of failure (BSP Master), low flexibility of partitioning policies, small community

MPI

Efficiency, portability, shared or distributed memory

Hard to debug, bottleneck in network communication

Hive

Large distributed datasets querying, SQL-like language, UDFs for advanced data analysis

Support only for OLAP, real-time data access not supported

Pig

High-level procedural language, UDFs for advanced data analysis, easy learning and development

Small community, hard to tune performance