From: Big data stream analysis: a systematic literature review
Tools and technology | Database support | Execution model | Workload | Fault tolerance | Latency | Throughput | Reliability | Operating system | Implementation/supported languages | Application |
---|---|---|---|---|---|---|---|---|---|---|
BlockMon | Cassandra, MongoDB, XML | Streaming | Multi-slice memory allocation and batch allocations | Checkpoint, rollback | Very low | High | At least once | Linux | C ++11, Python | Anomaly detection, network optimization, multimedia content delivery, financial market analysis, web analytics |
Spark Streaming | Kafka, HBase, Hive Flume, HDF/S3, Kinesis, TCP sockets, Twitter, SQL | Batch, Iterative, Streaming | CPU/memory intensive | RDD based Check-pointing, parallel recovery, replication | Low | High | Exactly once | Windows, macOS, Linux | Scala, Python, Java, R | Event detection, streaming machine learning, fog computing, interactive analysis, multimedia analysis, cluster analysis, filtering, re-processing, cache invalidation |
Apache Storm | Spout, HBase, Hive, SQL, Cassandra, Memcached | Streaming | CPU/memory intensive | Replication, checkpoint, data recovery, Upstream backup, record-level acknowledgement, stateless management | Very low | Low | At least once | Windows, macOS, Linux | Clojure, Java, Scala, Clojure, non-JVM languages | Internet of things, streaming machine learning, multimedia analysis |
Yahoo! S4 | MySQL, NoSQL, Rich Data Format | Streaming | CPU/memory intensive | Replication, checkpoint, data recovery | Low | Low | Exactly once | Linux | Java, Python, C++, Perl | Online analytics, monitoring, fraud detection, financial data processing, web personalization and session modelling |
Apache Samza | Kafka, HDFS, Kinesis, Stream consumer, Key-value stores | Streaming, batch processing | Memory intensive | Checkpoint | Very low | High | At least once | Linux, Windows | Java, Scala, JVM languages | Filtering, re-processing, cache invalidation |
Apache Flink | Kafka, Flume, HDF/S3, Kinesis, TCP sockets, Twitter, Cassandra, Redis, MongoDB, HBase, SQL | Streaming, batch, iterative, interactive | Memory intensive | Stream replay and marker-checkpoint | Very low | High | Exactly once | Linux, MacOS, Windows | Java, Scala, Python | Optimization of e-commerce search result, network/sensor monitoring and error detection, ETL for business intelligence infrastructure, machine learning |
Apache Aurora | H2, Java maps, MyBatis, MySQL, PostgreSQL | Streaming | Memory and disk space | Periodic recovery checkpoint and rollback | Low | High | At least once | Linux | Python | Monitoring applications such as financial analysis and military applications |
Redis | Key-value stores, rabitmq, MongoDB | Streaming | In-memory but persistent on-disk database | Replica migration, Sentinel | Low | High | At least once | Ubuntu, Linux, OSX | C, C#, Java, PHP, Python | Web analysis, cache, message queues |
C-SPARQL | RDF, SQLJ, NoSQL, HDF | Batch, streaming | Low memory usage | Adaptation | Very low | High | Cumulative | Windows, Linux, MacOS, Android | Java, Apache Jena libraries | Real-time reasoning over sensor data, social semantic data, urban computing |
SAMOA | HBase, Hive, Cassandra | Streaming | Low memory usage | Upstream backup | Low | High | Exactly once | Linux | Java | Classification, clustering, spam detection, regression, frequent pattern mining |
CQELS | RDF, SQLJ, NoSQL, HDF | Batch, streaming | In-memory | Adaptation | Low | High | Cumulative | Windows, Linux, MacOS, Android | Java | Real-time reasoning over sensor data, social semantic data, urban computing |
ETALIS | RDF | Streaming | Binarization | Adaptation | Low | Low | Cumulative | Windows, Linux, MacOS, Android | Prolog, Java, C, SPARQL, C#, ETALIS Language for Events (ELE) | Event detection, reasoning over streaming events |
XSEQ | XML | Batch, streaming | In-memory with buffering | checkpoint | Low | High | At least once | Windows, Linux | Java, Apache Xerces | Biological data, social networks, user behaviour, financial data analysis, filtering |
IBM InfoSphere streams | Pig, Hive, Jaql, HBase Flume, Lucene, Avro, ZooKeeper, Oozie, Oracle Database, DB2, Netezza, MySQL, Aster, Informix. | Streaming | Capture database workloads and replay them in a test database environment | Automatic recovery | Low | High | Exactly once, At least once, At most once | Linux, CentOS | C ++ Java SPL | Space weather prediction, physiological data streams analysis, traffic management, real-time predictions, event detection, visualisation |
Google MillWheel | BigTable, Spanner | Streaming | In-memory and bloom filtering | Uncoordinated periodic, checkpoint, upstream backup | Low | High | Exactly once | Linux | Virtually any programming language | Anomaly detection, health monitoring, image processing, network switch management |
Infochimps cloud | SQL, NoSQL, Hive, Pig Wukong, Hadoop, RDBMS, Virtually any data format | Batch, streaming | In-memory | Upstream backup | Low | High | Exactly once | Linux | Java | Disaster discovery, text analysis, complex event processing, visualisation |
Microsoft StreamInsight | SQL Server | Streaming | In-memory | Replication, checkpoint, data recovery | Very low | High | Exactly once | Windows | .NET, C#, LINQ, Rx | Manufacturing process monitoring and control, financial data analysis, operation analytics, web analytics, event pattern detection |
TIBCO StreamBase | Oracle database, SQL Server, Impala | Batch, Streaming | In-memory | Synchronization, replication, rollback | Very low | High | At least once/at most once/exactly once | Windows, MacOS, Linux | R, Java | Mission critical analysis, IoT analysis, click-stream analysis, predictive analytics, workflow optimization, risk avoidance |
Lambda Architecture | RDBMS, Cassandra, Kafka, Data Warehouses, Kinesis Data Stream, HDFS, HBase | Batch, Streaming | In-memory/disk database | Replication, checkpoint | Low | Low | Exactly once | Ubuntu, Windows, Linux | Java, C#, Python, Pig Latin | IoT analysis, tracking real-time updates, financial risk management, click-stream analysis |