From: Manufacturing process data analysis pipelines: a requirements analysis and survey
Pipeline stage | Requirement |
---|---|
Ingestion | I1: Native support of a large number of technology connectors |
I2: Can ingest a large variety of formats | |
I3: Supports custom processors and connectors | |
I4: Scales to support a large number of sources and sinks (1000s to 100,000s) | |
I5: Native processors for data validation, transformation, filtration, compression, noise reduction, identification, and integration | |
I6: Supports active (real-time) ingestion | |
I7: Supports passive (batch) ingestion | |
Communication | C1: Scalable It should be able to support a large number of sources (ms poll rate) and sinks. The combined number can range from 1000s to 100,000s |
C2: Secures data in transit | |
C3: Exactly-once message delivery semantics | |
C4: Publish-subscribe communication | |
C5: Efficient bandwidth utilization | |
C6: Supports both real-time data streams and bulk data transfer | |
C7: Pull-based data consumption | |
Storage | S1: Scalable up to 10s GB/day |
S2: Read/Write speed independent of volume of stored data. | |
S3: Large variety of formats and types (structured, semi-structured, and unstructured) | |
S4: Compression features for cost-efficient long-term storage (years) | |
S5: Intolerant of data loss | |
S6: Secures stored data | |
S7: Exports data to relational databases | |
Analysis [16] | A1: Scalable up to 100,000 variables |
A2: Heterogeneous data types | |
A3: Imperfect data | |
A4: Real-time and batch processing required | |
A5: Supports time-series analysis & data mining and machine learning | |
Visualization | V1: Scalable |
V2: Visualization methods for large data volumes, variety, and velocity | |
V3: Dynamic and static visualization | |
V4: Interactive | |
V5: Extensible interfaces |