From: Accelerating neural network training with distributed asynchronous and selective optimization (DASO)
Data Loader | PyTorch | |
---|---|---|
Local Optimizer | SGD | |
Local Optimizer Parameters | Momentum: 0.9 | Weight Decay: 0.0001 |
Epochs | 175 | |
Learning Rate (LR) Decay | Reduce on Stable | |
LR Parameters | Stable Epochs Before Change: 5 | Decay Factor: 0.75 |
LR Warmup Phase | 5 epochs, see Goyal et al. [38] | |
Maximum LR | 0.4 | |
Loss Function | Region Mutual Information [42] |