Skip to main content

Table 4 Data ingestion discussion

From: An industrial big data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities

Requirement

Discussion

Legacy integration

The simulation illustrates how legacy and smart devices can coexist in the industrial big data pipeline, with legacy integration realised using the ingestion engine to abstract and encapsulate legacy devices and instrumentation relating to the RAT measurement. Both measurements are pushed to the cloud for industrial analytics applications to consume without being aware of their origin (i.e. legacy or smart device)

Cross-network communication

Ingestion engines can operate across different networks due to lack of local dependencies and cloud data processing. The simulation shows that an ingestion engine can be deployed on a network with an active internet connection, access to the site manager, and reachable data sources. Considering the flow of data depicted in the simulation, the network location of the ingestion engine and smart sensor are irrelevant as both measurements reach the same endpoint (i.e. cloud platform)

Fault tolerance

Reliability and resilience can be instilled by adding more ingestion engines to the process and monitoring the status of each. As illustrated by the simulation, the site manager controls the ingestion process by sending data collection instructions to the ingestion engine. In scenarios where an ingestion engine fails to complete its assigned task, the instructions can be reassigned to another ingestion engine. This promotes fault tolerance in the ingestion process by removing a single point of failure, while allowing for different levels of resilience

Extensibility

It is feasible that the data pipeline may need to be extended to support additional data sources. As ingestion engines encapsulate data integration logic they are the logical point of extension. Given the existence of an abstract function that expects data collection instructions from the site manager as input, and outputs JSON encoded measurements, additional data sources could be facilitated by implementing new instances of the function

Scalability

The ingestion process can be scaled horizontally by deploying additional ingestion engines across machines and networks. This provides the data pipeline with a greater work capacity given that more ingestion jobs can run in parallel. The ingestion engine characteristics that make this possible have already been addressed in discussions regarding cross-network communication and fault tolerance

Openness and accessibility

The simulation shows how open standards, such as HTTP and JSON, can be used to facilitate communication and data exchange amongst distributed components in the data pipeline. This is exemplified by communication between the site manager and ingestion engine to relay data collection instructions, as well as the transmission of measurements to the cloud from the ingestion engine and smart sensor