Skip to main content

Table 9 Data quality profile levels dataflow

From: Big data quality framework: a holistic approach to continuous quality management

Proc

#

BDQMF comp

#

Description

Input

Output

Start

BDQP

Big Data quality project (BDQR) creation with quality requirements R, data sources (DS)

R, DS

DQP 0 (BDQP(DS, R))

1

1

Sampling strategy parameters (sample size, number)

BDQP

DQP 0 Samples set S

2

1

Data profiling

S

DQP 1 (Data Profile)

3

2

EQP: Quality rules Proposals scenarios (Sc) based

Sc, S

DQP 2 (QR Proposals)

4

2

QQE: Best ranked attributes selection lists

S

DQP 2 (Attributes Sets)

5

2

QQE: Combination of lists of best attributes

S, Sets

DQP 2 (Combined Set)

6

3

Data quality evaluation scheme specification

R, D

DQP 3 (DQES)

7

4

Quantitative quality evaluation of dataset samples

S, DQES

DQP 4 (DQES + Scores)

8

4

Quantitative quality evaluation of preprocessed samples

S’, DQES

DQP 7 (S’DQD Scores)

9

5

Control of DQES DQD scores

R, DQES

 + Scores

DQP 5 (DQD OK, Not)

10

6

Quality Rules’ discovery based on DQES scores

DQES, PPA_QPREPO

DQP 6 (Quality Rules List)

11

7

Preprocessing samples using discovered QR

QR List, S

Preprocessed Samples set S’

12

7

Quality Rules’ validation

S, S’ DQES + Scores

DQP 7 (Valid, Not Valid Quality Rules)

13

8

Quality Rule’s optimization

DQP 7

DQP 8 (QR optimized)

14

9

Big data preprocessing

Dataset DS

Dataset DS’

End/Loop

10

Quality monitoring

DS’ samples

DQP 10 Quality Report

  1. DS the dataset, R Requirements, DQPx Data quality Profile Level