From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)
Data Loader | DALI [37] | |
---|---|---|
Local Optimizer | SGD | |
Local Optimizer Parameters | Momentum: 0.9 | Weight Decay: 0.0001 |
Epochs | 90 | |
Learning Rate (LR) Decay | Reduce on Stable | |
LR Parameters | Stable Epochs Before Change: 5 | Decay Factor: 0.5 |
LR Warmup Phase | 5 epochs, see Goyal et al. [38] | |
Maximum LR | Scaled by number of GPUs [38] | |
Loss Function | Cross Entropy |