Skip to main content

Table 2 Platform requirements by function

From: Manufacturing process data analysis pipelines: a requirements analysis and survey

Pipeline stage Requirement
Ingestion I1: Native support of a large number of technology connectors
I2: Can ingest a large variety of formats
I3: Supports custom processors and connectors
I4: Scales to support a large number of sources and sinks (1000s to 100,000s)
I5: Native processors for data validation, transformation, filtration, compression, noise reduction, identification, and integration
I6: Supports active (real-time) ingestion
I7: Supports passive (batch) ingestion
Communication C1: Scalable It should be able to support a large number of sources (ms poll rate) and sinks. The combined number can range from 1000s to 100,000s
C2: Secures data in transit
C3: Exactly-once message delivery semantics
C4: Publish-subscribe communication
C5: Efficient bandwidth utilization
C6: Supports both real-time data streams and bulk data transfer
C7: Pull-based data consumption
Storage S1: Scalable up to 10s GB/day
S2: Read/Write speed independent of volume of stored data.
S3: Large variety of formats and types (structured, semi-structured, and unstructured)
S4: Compression features for cost-efficient long-term storage (years)
S5: Intolerant of data loss
S6: Secures stored data
S7: Exports data to relational databases
Analysis [16] A1: Scalable up to 100,000 variables
A2: Heterogeneous data types
A3: Imperfect data
A4: Real-time and batch processing required
A5: Supports time-series analysis & data mining and machine learning
Visualization V1: Scalable
V2: Visualization methods for large data volumes, variety, and velocity
V3: Dynamic and static visualization
V4: Interactive
V5: Extensible interfaces