Skip to main content

Advertisement

Table 9 Comparison of big data streaming tools and technologies

From: Big data stream analysis: a systematic literature review

Tools and technology Database support Execution model Workload Fault tolerance Latency Throughput Reliability Operating system Implementation/supported languages Application
BlockMon Cassandra, MongoDB, XML Streaming Multi-slice memory allocation and batch allocations Checkpoint, rollback Very low High At least once Linux C ++11, Python Anomaly detection, network optimization, multimedia content delivery, financial market analysis, web analytics
Spark Streaming Kafka, HBase, Hive Flume, HDF/S3, Kinesis, TCP sockets, Twitter, SQL Batch, Iterative, Streaming CPU/memory intensive RDD based Check-pointing, parallel recovery, replication Low High Exactly once Windows, macOS, Linux Scala, Python, Java, R Event detection, streaming machine learning, fog computing, interactive analysis, multimedia analysis, cluster analysis, filtering, re-processing, cache invalidation
Apache Storm Spout, HBase, Hive, SQL, Cassandra, Memcached Streaming CPU/memory intensive Replication, checkpoint, data recovery, Upstream backup, record-level acknowledgement, stateless management Very low Low At least once Windows, macOS, Linux Clojure, Java, Scala, Clojure, non-JVM languages Internet of things, streaming machine learning, multimedia analysis
Yahoo! S4 MySQL, NoSQL, Rich Data Format Streaming CPU/memory intensive Replication, checkpoint, data recovery Low Low Exactly once Linux Java, Python, C++, Perl Online analytics, monitoring, fraud detection, financial data processing, web personalization and session modelling
Apache Samza Kafka, HDFS, Kinesis, Stream consumer, Key-value stores Streaming, batch processing Memory intensive Checkpoint Very low High At least once Linux, Windows Java, Scala, JVM languages Filtering, re-processing, cache invalidation
Apache Flink Kafka, Flume, HDF/S3, Kinesis, TCP sockets, Twitter, Cassandra, Redis, MongoDB, HBase, SQL Streaming, batch, iterative, interactive Memory intensive Stream replay and marker-checkpoint Very low High Exactly once Linux, MacOS, Windows Java, Scala, Python Optimization of e-commerce search result, network/sensor monitoring and error detection, ETL for business intelligence infrastructure, machine learning
Apache Aurora H2, Java maps, MyBatis, MySQL, PostgreSQL Streaming Memory and disk space Periodic recovery checkpoint and rollback Low High At least once Linux Python Monitoring applications such as financial analysis and military applications
Redis Key-value stores, rabitmq, MongoDB Streaming In-memory but persistent on-disk database Replica migration, Sentinel Low High At least once Ubuntu, Linux, OSX C, C#, Java, PHP, Python Web analysis, cache, message queues
C-SPARQL RDF, SQLJ, NoSQL, HDF Batch, streaming Low memory usage Adaptation Very low High Cumulative Windows, Linux, MacOS, Android Java, Apache Jena libraries Real-time reasoning over sensor data, social semantic data, urban computing
SAMOA HBase, Hive, Cassandra Streaming Low memory usage Upstream backup Low High Exactly once Linux Java Classification, clustering, spam detection, regression, frequent pattern mining
CQELS RDF, SQLJ, NoSQL, HDF Batch, streaming In-memory Adaptation Low High Cumulative Windows, Linux, MacOS, Android Java Real-time reasoning over sensor data, social semantic data, urban computing
ETALIS RDF Streaming Binarization Adaptation Low Low Cumulative Windows, Linux, MacOS, Android Prolog, Java, C, SPARQL, C#, ETALIS Language for Events (ELE) Event detection, reasoning over streaming events
XSEQ XML Batch, streaming In-memory with buffering checkpoint Low High At least once Windows, Linux Java, Apache Xerces Biological data, social networks, user behaviour, financial data analysis, filtering
IBM InfoSphere streams Pig, Hive, Jaql, HBase Flume, Lucene, Avro, ZooKeeper, Oozie, Oracle Database, DB2, Netezza, MySQL, Aster, Informix. Streaming Capture database workloads and replay them in a test database environment Automatic recovery Low High Exactly once, At least once, At most once Linux, CentOS C ++ Java SPL Space weather prediction, physiological data streams analysis, traffic management, real-time predictions, event detection, visualisation
Google MillWheel BigTable, Spanner Streaming In-memory and bloom filtering Uncoordinated periodic, checkpoint, upstream backup Low High Exactly once Linux Virtually any programming language Anomaly detection, health monitoring, image processing, network switch management
Infochimps cloud SQL, NoSQL, Hive, Pig Wukong, Hadoop, RDBMS, Virtually any data format Batch, streaming In-memory Upstream backup Low High Exactly once Linux Java Disaster discovery, text analysis, complex event processing, visualisation
Microsoft StreamInsight SQL Server Streaming In-memory Replication, checkpoint, data recovery Very low High Exactly once Windows .NET, C#, LINQ, Rx Manufacturing process monitoring and control, financial data analysis, operation analytics, web analytics, event pattern detection
TIBCO StreamBase Oracle database, SQL Server, Impala Batch, Streaming In-memory Synchronization, replication, rollback Very low High At least once/at most once/exactly once Windows, MacOS, Linux R, Java Mission critical analysis, IoT analysis, click-stream analysis, predictive analytics, workflow optimization, risk avoidance
Lambda Architecture RDBMS, Cassandra, Kafka, Data Warehouses, Kinesis Data Stream, HDFS, HBase Batch, Streaming In-memory/disk database Replication, checkpoint Low Low Exactly once Ubuntu, Windows, Linux Java, C#, Python, Pig Latin IoT analysis, tracking real-time updates, financial risk management, click-stream analysis