Network architecture | Convolutional layers | Network in network | Lin et al. [30] |
Inception and improved Inception models | |||
Doubly convolution | Zhai et al. [60] | ||
Pooling layers | Lp pooling | Sermanet et al. [40] | |
Stochastic pooling | Zeiler and Fergus [57] | ||
Fractional max pooling | Graham [11] | ||
Mixed pooling | Yu et al. [55] | ||
Gated pooling | Lee et al. [28] | ||
Tree pooling | Lee et al. [28] | ||
Spectral pooling | Rippel et al. [38] | ||
Spatial pyramid pooling | Grauman and Darrell [12], He et al. [18], Lazebnik et al. [27], Yang et al. [54] | ||
Multiscale orderless pooling | Gong et al. [9] | ||
Transformation invariant pooling | Laptev et al. [26] | ||
Nonlinear activation functions | Rectified linear unit (ReLU) | Â | Nair and Hinton [35] |
Leaky rectified linear unit (LReLU) | Â | Maas et al. [32] | |
Parametric rectified linear unit (PReLU) | Â | He et al. [19] | |
Adaptive piecewise linear (APL) activation functions | Â | Agostinelli et al. [1] | |
Randomized rectified linear unit (RReLU) | Â | National Data Science Bowl |Kaggle [36] | |
Exponential linear unit (ELU) | Â | Clevert et al. [4] | |
S-shaped rectified linear unit (SReLU) | Â | Jin et al. [23] | |
Maxout activations | Â | Goodfellow et al. [10] | |
Probout activations | Â | Springenberg and Riedmiller [42] | |
Loss function | Softmax loss | Â | Liu et al. [31] |
Contrastive and triplet losses | Â | Liu et al. [31] | |
Large margin loss | Â | Liu et al. [31] | |
L2-SVM loss | Â | ||
Regularization mechanisms | Dropout | Â | |
Fast dropout | Â | Wang and Manning [51] | |
Adaptive dropout | Â | Ba and Frey [2] | |
Multinomial dropout and evolutional dropout | Â | Li et al. [29] | |
Spatial dropout | Â | Tompson et al. [48] | |
Nested dropout | Â | Rippel et al. [37] | |
Max pooling dropout | Â | Wu and Gu [53] | |
DropConnect | Â | Wan et al. [49] | |
Optimization techniques | Enhanced initialization schemes | Xavier initialization | Glorot and Bengio [8] |
Theoretically derived adaptable initialization | He et al. [19] | ||
Standard fixed initialization | Krizhevsky et al. [25] | ||
Layer sequential unit variance initialization | Mishkin and Matas [33] | ||
Skip connections | Highway networks | ||
Residual networks | He et al. [17] | ||
Improved residual networks | He et al. [20] | ||
Densely connected convolutional networks | Huang et al. [22] |