From: VADA: an architecture for end user informed data preparation
Transducer | Inputs | Outputs | Parameters |
---|---|---|---|
Matching | A data product P and the target schema \(\mathbf T\) | A set of matches between attributes in P and \(\mathbf T\) | Threshold |
Data profiling | A set of data product instances I(P) | A set of candidate keys CK, and a set of inclusion dependencies ID | Overlap threshold |
Mapping generation | A set of data products P, the target schema \(\mathbf T\), a set of matches, a set of candidate keys CK, and a set of inclusion dependencies ID | A set of candidate mappings M | Max mapping size, k |
Examples generation | An instance I(P) of a data product, a set of matches, and the data context \(\mathcal {D}\) | A set of transformation examples E | Min tuple size |
Data transformation | An instance I(P) of a data product, and a set of transformation examples E produced by the examples generation transducer | A transformed instance \(I'(P)\) of the data product | n/a |
Mapping selection | \(\mathcal {U}\), and a set of candidate mappings M | An instance \(I(\mathbf T ) \in \mathcal {P}\) of the target schema \(\mathbf T\), i.e. a candidate end data product | Targeted size |
CFD miner | An instance of a data context resource \(I(R) \in \mathcal {D}\) | A set \(\Phi\) of CFDs | Support size |
Violation detection | An instance of a data product \(I(P) \in \mathcal {P}\), and a set \(\Phi\) of CFDs | A set of violations \(V(\Phi )\) | n/a |
Rule-based repair | A set of violations \(V(\Phi )\) of a set \(\Phi\) of CFDs, and an instance of a data product relation I(R) | A repaired instance of the data product \(I'(R)\) | n/a |
Aggregator | \(\mathcal {U}\) and a set of candidate end data products | The end data product \(I(\mathbf T )\) that best meets \(\mathcal {U}\) | n/a |