Skip to main content

Table 1 Summary of advantages and disadvantages of the proposed approaches

From: An efficient strategy for the collection and storage of large volumes of data for computation

  Advantage Disadvantage
Approach 1
Data transformation occurs within the data pipeline
Well tested approach: typical scenario in most data analytics platforms Complex: transformation logic is kept in the data pipeline so in the case of data pipeline replacement the transformation logic needs to be re-implemented
Lost data authenticity: the data is transformed by the data pipeline so the raw data is lost
Approach 2
Data transformation occurs within the storage layer
Easy to migrate/replace: the transformation logic is moved to a centralised location so it is easier to migrate or replace the data pipeline
Raw data is intact: meets regulatory standards of storing the raw data both before and after transformation
Complex: an intermediate job is required for transformation
Large storage needed: both raw and transformed data are stored
Approach 3
Data transformation occurs within the analytics jobs
Clean and simple: no complexity added to the data pipeline
Less storage needed: only raw data is stored
Easy to migrate or replace: the transformation logic is moved to a centralised location
Increased execution overhead: the analytics job will transform the data
Repetition: transformation will take place every time an analytics job is executed