Network architecture | Convolutional layers | Network in network | Lin et al. [30] |

Inception and improved Inception models | Szegedy et al. [46, 47] |

Doubly convolution | Zhai et al. [60] |

Pooling layers | *Lp* pooling | Sermanet et al. [40] |

Stochastic pooling | Zeiler and Fergus [57] |

Fractional max pooling | Graham [11] |

Mixed pooling | Yu et al. [55] |

Gated pooling | Lee et al. [28] |

Tree pooling | Lee et al. [28] |

Spectral pooling | Rippel et al. [38] |

Spatial pyramid pooling | Grauman and Darrell [12], He et al. [18], Lazebnik et al. [27], Yang et al. [54] |

Multiscale orderless pooling | Gong et al. [9] |

Transformation invariant pooling | Laptev et al. [26] |

Nonlinear activation functions | Rectified linear unit (ReLU) | | Nair and Hinton [35] |

Leaky rectified linear unit (LReLU) | | Maas et al. [32] |

Parametric rectified linear unit (PReLU) | | He et al. [19] |

Adaptive piecewise linear (APL) activation functions | | Agostinelli et al. [1] |

Randomized rectified linear unit (RReLU) | | National Data Science Bowl |Kaggle [36] |

Exponential linear unit (ELU) | | Clevert et al. [4] |

S-shaped rectified linear unit (SReLU) | | Jin et al. [23] |

Maxout activations | | Goodfellow et al. [10] |

Probout activations | | Springenberg and Riedmiller [42] |

Loss function | Softmax loss | | Liu et al. [31] |

Contrastive and triplet losses | | Liu et al. [31] |

Large margin loss | | Liu et al. [31] |

L2-SVM loss | | Collobert and Bengio [5], Nagi et al. [34] |

Regularization mechanisms | Dropout | | Hinton et al. [21], Srivastava et al. [43] |

Fast dropout | | Wang and Manning [51] |

Adaptive dropout | | Ba and Frey [2] |

Multinomial dropout and evolutional dropout | | Li et al. [29] |

Spatial dropout | | Tompson et al. [48] |

Nested dropout | | Rippel et al. [37] |

Max pooling dropout | | Wu and Gu [53] |

DropConnect | | Wan et al. [49] |

Optimization techniques | Enhanced initialization schemes | Xavier initialization | Glorot and Bengio [8] |

Theoretically derived adaptable initialization | He et al. [19] |

Standard fixed initialization | Krizhevsky et al. [25] |

Layer sequential unit variance initialization | Mishkin and Matas [33] |

Skip connections | Highway networks | Srivastava et al. [44, 45] |

Residual networks | He et al. [17] |

Improved residual networks | He et al. [20] |

Densely connected convolutional networks | Huang et al. [22] |