Skip to main content

Table 9 Benchmarking various methods for training models on CIFAR-100, STL-10, Imagenette, and for augmenting a batch of 32 RGB images of \(224 \times 224\) spatial resolution

From: Multi-sample \(\zeta \)-mixup: richer, more realistic synthetic samples from a p-series interpolant

Method

CIFAR-100 (200 epochs)

STL-10

(200 epochs)

Imagenette (80 epochs)

[32, 3, 224, 224]

torch.Tensor

Wall Time

mixup [36]

1h 20m ± 23s

24m 59s ± 16.9s

45m 39s ± 8.5s

745\(\mu \)s ± 9.55\(\mu \)s

 

\(\zeta \)-mixup

1h 20m ± 17s

24m 58s ± 4.6s

45m 34s ± 14.1s

345\(\mu \)s ± 2.53\(\mu \)s

 

CutMix [39]

1h 22m ± 13s

†

†

176\(\mu \)s ± 1.4\(\mu \)s

 

CutMix [39] + \(\zeta \)-mixup

1h 22m ± 9s

†

†

169\(\mu \)s ± 757ns

 

Manifold Mixup [41]

16h 15m

(2000 epochs)

‡

‡

‡‡

 

Co-Mixup [96]

16h 35m

(300 epochs)

‡

‡

‡‡

 

Local synthetic

instances [57]

§

§

§

38.7ms ± 1.33 ms

  1. Note that mixup, \(\zeta \)-mixup, CutMix, and CutMix + \(\zeta \)-mixup require 200 epochs of training for CIFAR-100, whereas Manifold Mixup and Co-Mixup require 2000 and 300 epochs respectively. CutMix experiments were performed on CIFAR-10 and CIFAR-100, and training times on STL-10 and Imagenette were not available from the original paper either (†). Similarly, given the large computational cost for Manifold Mixup and Co-Mixup, we did not train them on STL-10 and Imagenette, and their training times are missing from the respective paper too (‡). We also were unable to benchmark these two methods on a batch of 32 images (last column; ‡‡) since these methods require a DNN forward pass and gradients respectively for augmentation. Finally, the local synthetic instances method [57] is not optimized for training DNNs (§), as it is two orders of magnitude slower than \(\zeta \)-mixup (see last column)