From: Big data quality framework: a holistic approach to continuous quality management
PPAF# | DQD | Metric | Data Type | Methods | Results (%) | PPA | PPAF | PPAF Related Actions or Proposals |
---|---|---|---|---|---|---|---|---|
11 | Accuracy/validity | Outliers detection | Num | Rule-based | Outliers Count/Total Rows, List of Obs. with Outliers (Anomaly, Novelty) | Data cleansing | Retention | Use robust classification methods |
12 | Linear regression model | Winsorizing (Dealing with Outliers) | Replace outliers with closest values | |||||
13 | High dimensional outlier detection methods | Exclusion, Truncation | Remove related rows | |||||
21 | Completeness | Available data observation | All | Count the number of not (NA, Null, or any other values that express the Not Availability) | Not NA Count /Total observations (Rows) | Data enrichment | Data correction | Replace with mean |
22 | Replace with mode | |||||||
23 | Replace with median | |||||||
24 | Data removal | Remove Rows | ||||||
25 | Remove columns | |||||||
26 | Remove rows and cols |