Skip to main content

Table 9 Comparison of big data streaming tools and technologies

From: Big data stream analysis: a systematic literature review

Tools and technology

Database support

Execution model

Workload

Fault tolerance

Latency

Throughput

Reliability

Operating system

Implementation/supported languages

Application

BlockMon

Cassandra, MongoDB, XML

Streaming

Multi-slice memory allocation and batch allocations

Checkpoint, rollback

Very low

High

At least once

Linux

C ++11, Python

Anomaly detection, network optimization, multimedia content delivery, financial market analysis, web analytics

Spark Streaming

Kafka, HBase, Hive Flume, HDF/S3, Kinesis, TCP sockets, Twitter, SQL

Batch, Iterative, Streaming

CPU/memory intensive

RDD based Check-pointing, parallel recovery, replication

Low

High

Exactly once

Windows, macOS, Linux

Scala, Python, Java, R

Event detection, streaming machine learning, fog computing, interactive analysis, multimedia analysis, cluster analysis, filtering, re-processing, cache invalidation

Apache Storm

Spout, HBase, Hive, SQL, Cassandra, Memcached

Streaming

CPU/memory intensive

Replication, checkpoint, data recovery, Upstream backup, record-level acknowledgement, stateless management

Very low

Low

At least once

Windows, macOS, Linux

Clojure, Java, Scala, Clojure, non-JVM languages

Internet of things, streaming machine learning, multimedia analysis

Yahoo! S4

MySQL, NoSQL, Rich Data Format

Streaming

CPU/memory intensive

Replication, checkpoint, data recovery

Low

Low

Exactly once

Linux

Java, Python, C++, Perl

Online analytics, monitoring, fraud detection, financial data processing, web personalization and session modelling

Apache Samza

Kafka, HDFS, Kinesis, Stream consumer, Key-value stores

Streaming, batch processing

Memory intensive

Checkpoint

Very low

High

At least once

Linux, Windows

Java, Scala, JVM languages

Filtering, re-processing, cache invalidation

Apache Flink

Kafka, Flume, HDF/S3, Kinesis, TCP sockets, Twitter, Cassandra, Redis, MongoDB, HBase, SQL

Streaming, batch, iterative, interactive

Memory intensive

Stream replay and marker-checkpoint

Very low

High

Exactly once

Linux, MacOS, Windows

Java, Scala, Python

Optimization of e-commerce search result, network/sensor monitoring and error detection, ETL for business intelligence infrastructure, machine learning

Apache Aurora

H2, Java maps, MyBatis, MySQL, PostgreSQL

Streaming

Memory and disk space

Periodic recovery checkpoint and rollback

Low

High

At least once

Linux

Python

Monitoring applications such as financial analysis and military applications

Redis

Key-value stores, rabitmq, MongoDB

Streaming

In-memory but persistent on-disk database

Replica migration, Sentinel

Low

High

At least once

Ubuntu, Linux, OSX

C, C#, Java, PHP, Python

Web analysis, cache, message queues

C-SPARQL

RDF, SQLJ, NoSQL, HDF

Batch, streaming

Low memory usage

Adaptation

Very low

High

Cumulative

Windows, Linux, MacOS, Android

Java, Apache Jena libraries

Real-time reasoning over sensor data, social semantic data, urban computing

SAMOA

HBase, Hive, Cassandra

Streaming

Low memory usage

Upstream backup

Low

High

Exactly once

Linux

Java

Classification, clustering, spam detection, regression, frequent pattern mining

CQELS

RDF, SQLJ, NoSQL, HDF

Batch, streaming

In-memory

Adaptation

Low

High

Cumulative

Windows, Linux, MacOS, Android

Java

Real-time reasoning over sensor data, social semantic data, urban computing

ETALIS

RDF

Streaming

Binarization

Adaptation

Low

Low

Cumulative

Windows, Linux, MacOS, Android

Prolog, Java, C, SPARQL, C#, ETALIS Language for Events (ELE)

Event detection, reasoning over streaming events

XSEQ

XML

Batch, streaming

In-memory with buffering

checkpoint

Low

High

At least once

Windows, Linux

Java, Apache Xerces

Biological data, social networks, user behaviour, financial data analysis, filtering

IBM InfoSphere streams

Pig, Hive, Jaql, HBase Flume, Lucene, Avro, ZooKeeper, Oozie, Oracle Database, DB2, Netezza, MySQL, Aster, Informix.

Streaming

Capture database workloads and replay them in a test database environment

Automatic recovery

Low

High

Exactly once, At least once, At most once

Linux, CentOS

C ++

Java

SPL

Space weather prediction, physiological data streams analysis, traffic management, real-time predictions, event detection, visualisation

Google MillWheel

BigTable, Spanner

Streaming

In-memory and bloom filtering

Uncoordinated periodic, checkpoint, upstream backup

Low

High

Exactly once

Linux

Virtually any programming language

Anomaly detection, health monitoring, image processing, network switch management

Infochimps cloud

SQL, NoSQL, Hive, Pig Wukong, Hadoop, RDBMS, Virtually any data format

Batch, streaming

In-memory

Upstream backup

Low

High

Exactly once

Linux

Java

Disaster discovery, text analysis, complex event processing, visualisation

Microsoft StreamInsight

SQL Server

Streaming

In-memory

Replication, checkpoint, data recovery

Very low

High

Exactly once

Windows

.NET, C#, LINQ, Rx

Manufacturing process monitoring and control, financial data analysis, operation analytics, web analytics, event pattern detection

TIBCO StreamBase

Oracle database, SQL Server, Impala

Batch, Streaming

In-memory

Synchronization, replication, rollback

Very low

High

At least once/at most once/exactly once

Windows, MacOS, Linux

R, Java

Mission critical analysis, IoT analysis, click-stream analysis, predictive analytics, workflow optimization, risk avoidance

Lambda Architecture

RDBMS, Cassandra, Kafka, Data Warehouses, Kinesis Data Stream, HDFS, HBase

Batch, Streaming

In-memory/disk database

Replication, checkpoint

Low

Low

Exactly once

Ubuntu, Windows, Linux

Java, C#, Python, Pig Latin

IoT analysis, tracking real-time updates, financial risk management, click-stream analysis