Manufacturing process data analysis pipelines: a requirements analysis and survey

Ismail, Ahmed; Truong, Hong-Linh; Kastner, Wolfgang

doi:10.1186/s40537-018-0162-3

Journal of Big Data

Table 2 Platform requirements by function

From: Manufacturing process data analysis pipelines: a requirements analysis and survey

Pipeline stage	Requirement
Ingestion	I1: Native support of a large number of technology connectors
	I2: Can ingest a large variety of formats
	I3: Supports custom processors and connectors
	I4: Scales to support a large number of sources and sinks (1000s to 100,000s)
	I5: Native processors for data validation, transformation, filtration, compression, noise reduction, identification, and integration
	I6: Supports active (real-time) ingestion
	I7: Supports passive (batch) ingestion
Communication	C1: Scalable It should be able to support a large number of sources (ms poll rate) and sinks. The combined number can range from 1000s to 100,000s
	C2: Secures data in transit
	C3: Exactly-once message delivery semantics
	C4: Publish-subscribe communication
	C5: Efficient bandwidth utilization
	C6: Supports both real-time data streams and bulk data transfer
	C7: Pull-based data consumption
Storage	S1: Scalable up to 10s GB/day
	S2: Read/Write speed independent of volume of stored data.
	S3: Large variety of formats and types (structured, semi-structured, and unstructured)
	S4: Compression features for cost-efficient long-term storage (years)
	S5: Intolerant of data loss
	S6: Secures stored data
	S7: Exports data to relational databases
Analysis [16]	A1: Scalable up to 100,000 variables
	A2: Heterogeneous data types
	A3: Imperfect data
	A4: Real-time and batch processing required
	A5: Supports time-series analysis & data mining and machine learning
Visualization	V1: Scalable
	V2: Visualization methods for large data volumes, variety, and velocity
	V3: Dynamic and static visualization
	V4: Interactive
	V5: Extensible interfaces

Back to article page