From: Big data quality framework: a holistic approach to continuous quality management
Proc # | BDQMF comp # | Description | Input | Output |
---|---|---|---|---|
Start | BDQP | Big Data quality project (BDQR) creation with quality requirements R, data sources (DS) | R, DS | DQP 0 (BDQP(DS, R)) |
1 | 1 | Sampling strategy parameters (sample size, number) | BDQP | DQP 0 Samples set S |
2 | 1 | Data profiling | S | DQP 1 (Data Profile) |
3 | 2 | EQP: Quality rules Proposals scenarios (Sc) based | Sc, S | DQP 2 (QR Proposals) |
4 | 2 | QQE: Best ranked attributes selection lists | S | DQP 2 (Attributes Sets) |
5 | 2 | QQE: Combination of lists of best attributes | S, Sets | DQP 2 (Combined Set) |
6 | 3 | Data quality evaluation scheme specification | R, D | DQP 3 (DQES) |
7 | 4 | Quantitative quality evaluation of dataset samples | S, DQES | DQP 4 (DQES + Scores) |
8 | 4 | Quantitative quality evaluation of preprocessed samples | S’, DQES | DQP 7 (S’DQD Scores) |
9 | 5 | Control of DQES DQD scores | R, DQES  + Scores | DQP 5 (DQD OK, Not) |
10 | 6 | Quality Rules’ discovery based on DQES scores | DQES, PPA_QPREPO | DQP 6 (Quality Rules List) |
11 | 7 | Preprocessing samples using discovered QR | QR List, S | Preprocessed Samples set S’ |
12 | 7 | Quality Rules’ validation | S, S’ DQES + Scores | DQP 7 (Valid, Not Valid Quality Rules) |
13 | 8 | Quality Rule’s optimization | DQP 7 | DQP 8 (QR optimized) |
14 | 9 | Big data preprocessing | Dataset DS | Dataset DS’ |
End/Loop | 10 | Quality monitoring | DS’ samples | DQP 10 Quality Report |