Skip to main content

Table 1 Transducer inputs, outputs, and configuration parameters

From: VADA: an architecture for end user informed data preparation

Transducer

Inputs

Outputs

Parameters

Matching

A data product P and the target schema \(\mathbf T\)

A set of matches between attributes in P and \(\mathbf T\)

Threshold

Data profiling

A set of data product instances I(P)

A set of candidate keys CK, and a set of inclusion dependencies ID

Overlap threshold

Mapping generation

A set of data products P, the target schema \(\mathbf T\), a set of matches, a set of candidate keys CK, and a set of inclusion dependencies ID

A set of candidate mappings M

Max mapping size, k

Examples generation

An instance I(P) of a data product, a set of matches, and the data context \(\mathcal {D}\)

A set of transformation examples E

Min tuple size

Data transformation

An instance I(P) of a data product, and a set of transformation examples E produced by the examples generation transducer

A transformed instance \(I'(P)\) of the data product

n/a

Mapping selection

\(\mathcal {U}\), and a set of candidate mappings M

An instance \(I(\mathbf T ) \in \mathcal {P}\) of the target schema \(\mathbf T\), i.e. a candidate end data product

Targeted size

CFD miner

An instance of a data context resource \(I(R) \in \mathcal {D}\)

A set \(\Phi\) of CFDs

Support size

Violation detection

An instance of a data product \(I(P) \in \mathcal {P}\), and a set \(\Phi\) of CFDs

A set of violations \(V(\Phi )\)

n/a

Rule-based repair

A set of violations \(V(\Phi )\) of a set \(\Phi\) of CFDs, and an instance of a data product relation I(R)

A repaired instance of the data product \(I'(R)\)

n/a

Aggregator

\(\mathcal {U}\) and a set of candidate end data products

The end data product \(I(\mathbf T )\) that best meets \(\mathcal {U}\)

n/a