Skip to main content

Advertisement

Table 1 Transducer inputs, outputs, and configuration parameters

From: VADA: an architecture for end user informed data preparation

Transducer Inputs Outputs Parameters
Matching A data product P and the target schema \(\mathbf T\) A set of matches between attributes in P and \(\mathbf T\) Threshold
Data profiling A set of data product instances I(P) A set of candidate keys CK, and a set of inclusion dependencies ID Overlap threshold
Mapping generation A set of data products P, the target schema \(\mathbf T\), a set of matches, a set of candidate keys CK, and a set of inclusion dependencies ID A set of candidate mappings M Max mapping size, k
Examples generation An instance I(P) of a data product, a set of matches, and the data context \(\mathcal {D}\) A set of transformation examples E Min tuple size
Data transformation An instance I(P) of a data product, and a set of transformation examples E produced by the examples generation transducer A transformed instance \(I'(P)\) of the data product n/a
Mapping selection \(\mathcal {U}\), and a set of candidate mappings M An instance \(I(\mathbf T ) \in \mathcal {P}\) of the target schema \(\mathbf T\), i.e. a candidate end data product Targeted size
CFD miner An instance of a data context resource \(I(R) \in \mathcal {D}\) A set \(\Phi\) of CFDs Support size
Violation detection An instance of a data product \(I(P) \in \mathcal {P}\), and a set \(\Phi\) of CFDs A set of violations \(V(\Phi )\) n/a
Rule-based repair A set of violations \(V(\Phi )\) of a set \(\Phi\) of CFDs, and an instance of a data product relation I(R) A repaired instance of the data product \(I'(R)\) n/a
Aggregator \(\mathcal {U}\) and a set of candidate end data products The end data product \(I(\mathbf T )\) that best meets \(\mathcal {U}\) n/a