From: A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
MODEL
B@4
MSVD
MSRVTT
STAT [45]
52.0
39.3
SpatioTempo [46]
47.9
38.3
LSTM [42]
31.2
–
LSTM-E [ALEX] [42]
38.9
LSTM-E [C3D] [42]
41.7
FGM [15]
13.68
LSTM-YT [24]
31.19
MP-LSTM [24]
33.3
Base model
27.19
16.05
Base model with BN
16.61
14.02
Stacked LSTM
31.23
17.61
Multi-layer attention (Proposed)
33.07
19.61