Skip to main content

Table 8 Ablation study to analyze the contribution of both the components of \(\zeta \)-mixup when used in isolation on CIFAR-100: mixing more than 2 samples and using weights from a normalized p-series for the mixing

From: Multi-sample \(\zeta \)-mixup: richer, more realistic synthetic samples from a p-series interpolant

Mixes \(>2\) samples

Uses normalized p-series weights for mixing

Method name

CIFAR-100 ERR

✗

\(T=2\)

✗

weights from a Beta(\(\alpha \), \(\alpha \))

distribution; \(\alpha = 1\)

mixup

\(21.85 \pm 0.07\)

✗

\(T=2\)

✓

weights from a

normalized p-series

-

\(21.77 \pm 0.17\)

✓

\(T=m\)

✗

weights from a Dirichlet(\(\alpha \))

distribution; \(\alpha = 1\)

-

\(94.69 \pm 0.08\)

✓

\(T=m\)

✓

weights from a

normalized p-series

\(\zeta \)-mixup

\(\varvec{21.35 \pm 0.02}\)

  1. Note that since \(\zeta \)-mixup is a generalization of mixup (Theorem 2), \(\zeta \)-mixup without both these components reduces to mixup (first row). Next, modifying mixup to use \(\zeta \)-mixup ’s weighting scheme but only for 2 samples (second row) outperforms mixup, but is inferior to \(\zeta \)-mixup. On the other hand, mixing the entire batch (\(T=m\)) but with a Dirichlet distribution leads to extremely poor performance (third row). Finally, using both of these components, i.e., \(\zeta \)-mixup, leads to the best performance