Skip to main content

Table 5 DQD’s and their related pre-processing activities and functions

From: Big data quality framework: a holistic approach to continuous quality management

PPAF#

DQD

Metric

Data Type

Methods

Results (%)

PPA

PPAF

PPAF Related Actions or Proposals

11

Accuracy/validity

Outliers detection

Num

Rule-based

Outliers Count/Total Rows, List of Obs. with Outliers (Anomaly, Novelty)

Data cleansing

Retention

Use robust classification methods

12

Linear regression model

Winsorizing (Dealing with Outliers)

Replace outliers with closest values

13

High dimensional outlier detection methods

Exclusion, Truncation

Remove related rows

21

Completeness

Available data observation

All

Count the number of not (NA, Null, or any other values that express the Not Availability)

Not NA Count /Total observations (Rows)

Data enrichment

Data correction

Replace with mean

22

Replace with mode

23

Replace with median

24

Data removal

Remove Rows

25

Remove columns

26

Remove rows and cols