Multi-sample $$\zeta $$ -mixup: richer, more realistic synthetic samples from a p-series interpolant

Journal of Big Data

Table 8 Ablation study to analyze the contribution of both the components of $\zeta $-mixup when used in isolation on CIFAR-100: mixing more than 2 samples and using weights from a normalized p-series for the mixing

Mixes $>2$ samples	Uses normalized p-series weights for mixing	Method name	CIFAR-100 ERR
✗ $T=2$	✗ weights from a Beta($\alpha $, $\alpha $) distribution; $\alpha = 1$	mixup	$21.85 \pm 0.07$
✗ $T=2$	✓ weights from a normalized p-series	-	$21.77 \pm 0.17$
✓ $T=m$	✗ weights from a Dirichlet($\alpha $) distribution; $\alpha = 1$	-	$94.69 \pm 0.08$
✓ $T=m$	✓ weights from a normalized p-series	$\zeta $-mixup	$\varvec{21.35 \pm 0.02}$

Note that since $\zeta $-mixup is a generalization of mixup (Theorem 2), $\zeta $-mixup without both these components reduces to mixup (first row). Next, modifying mixup to use $\zeta $-mixup ’s weighting scheme but only for 2 samples (second row) outperforms mixup, but is inferior to $\zeta $-mixup. On the other hand, mixing the entire batch ($T=m$) but with a Dirichlet distribution leads to extremely poor performance (third row). Finally, using both of these components, i.e., $\zeta $-mixup, leads to the best performance

Mixes \(>2\) samples	Uses normalized p-series weights for mixing	Method name	CIFAR-100 ERR
✗ \(T=2\)	✗ weights from a Beta(\(\alpha \), \(\alpha \)) distribution; \(\alpha = 1\)	mixup	\(21.85 \pm 0.07\)
✗ \(T=2\)	✓ weights from a normalized p-series	-	\(21.77 \pm 0.17\)
✓ \(T=m\)	✗ weights from a Dirichlet(\(\alpha \)) distribution; \(\alpha = 1\)	-	\(94.69 \pm 0.08\)
✓ \(T=m\)	✓ weights from a normalized p-series	\(\zeta \)-mixup	\(\varvec{21.35 \pm 0.02}\)