Skip to main content

Table 1 Summary of advantages and disadvantages of the proposed approaches

From: An efficient strategy for the collection and storage of large volumes of data for computation

 

Advantage

Disadvantage

Approach 1

Data transformation occurs within the data pipeline

Well tested approach: typical scenario in most data analytics platforms

Complex: transformation logic is kept in the data pipeline so in the case of data pipeline replacement the transformation logic needs to be re-implemented

Lost data authenticity: the data is transformed by the data pipeline so the raw data is lost

Approach 2

Data transformation occurs within the storage layer

Easy to migrate/replace: the transformation logic is moved to a centralised location so it is easier to migrate or replace the data pipeline

Raw data is intact: meets regulatory standards of storing the raw data both before and after transformation

Complex: an intermediate job is required for transformation

Large storage needed: both raw and transformed data are stored

Approach 3

Data transformation occurs within the analytics jobs

Clean and simple: no complexity added to the data pipeline

Less storage needed: only raw data is stored

Easy to migrate or replace: the transformation logic is moved to a centralised location

Increased execution overhead: the analytics job will transform the data

Repetition: transformation will take place every time an analytics job is executed