Skip to main content

Table 2 Platform requirements by function

From: Manufacturing process data analysis pipelines: a requirements analysis and survey

Pipeline stage

Requirement

Ingestion

I1: Native support of a large number of technology connectors

I2: Can ingest a large variety of formats

I3: Supports custom processors and connectors

I4: Scales to support a large number of sources and sinks (1000s to 100,000s)

I5: Native processors for data validation, transformation, filtration, compression, noise reduction, identification, and integration

I6: Supports active (real-time) ingestion

I7: Supports passive (batch) ingestion

Communication

C1: Scalable It should be able to support a large number of sources (ms poll rate) and sinks. The combined number can range from 1000s to 100,000s

C2: Secures data in transit

C3: Exactly-once message delivery semantics

C4: Publish-subscribe communication

C5: Efficient bandwidth utilization

C6: Supports both real-time data streams and bulk data transfer

C7: Pull-based data consumption

Storage

S1: Scalable up to 10s GB/day

S2: Read/Write speed independent of volume of stored data.

S3: Large variety of formats and types (structured, semi-structured, and unstructured)

S4: Compression features for cost-efficient long-term storage (years)

S5: Intolerant of data loss

S6: Secures stored data

S7: Exports data to relational databases

Analysis [16]

A1: Scalable up to 100,000 variables

A2: Heterogeneous data types

A3: Imperfect data

A4: Real-time and batch processing required

A5: Supports time-series analysis & data mining and machine learning

Visualization

V1: Scalable

V2: Visualization methods for large data volumes, variety, and velocity

V3: Dynamic and static visualization

V4: Interactive

V5: Extensible interfaces