Skip to main content

Table 1 Details of various convolutional layers in WCNN model

From: Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

Sl. No.

Level

Name of convolutional layers

Kernel size/No. of filters

Output size

1

L1

Conv 1_1

3x3/64

256x256x64

2

 

Conv 1_2

3x3/64

256x256x64

3

 

Maxpool1

2x2/64/stride 2

128x128x64

4

L2

Conv 2_1

3x3/128

128x128x128

5

 

Conv 2_2

3x3/128

128x128x128

6

 

Maxpool2

2x2/128/stride 2

64x64x128

7

L3

Conv 3_1

5x5/256

64x64x256

8

 

Conv 3_2

5x5/256

64x64x256

9

 

Conv 3_3

5x5/256

64x64x256

10

 

Maxpool3

2x2/256/stride 2

32x32x256

11

L4

Conv 4_1

7x7/512

32x32x512

12

 

Conv 4_2

7x7/512

32x32x512

13

 

Conv 4_3

7x7/512

32x32x512