A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Journal of Big Data

Table 2 Performance comparison of METEOR score with state-of-art-methods

MODEL	METEOR
MODEL	MSVD	MSRVTT
LSTM [42]	26.9	23.4
LSTM-E[VGG] [42]	29.5	–
LSTM-E[C3D] [42]	29.9	–
MM-VDN [43]	29.0	–
LK [44]	30.3	–
S2VT-unidirectional [17]	29.6	25.2
S2VT-bidirectional [17]	29.7	25.6
S2VT-reinforced [17]	29.9	25.9
S2VT-VGG [17]	29.2	–
S2VT-VGG+Flow (Alexnet) [17]	29.8	–
DVWA-uni [8]	29.6	25.7
DVWA-BiLSTM [8]	29.8	26.1
DVWA-ReBiLSTM [8]	30.3	26.2
DVWA-uni SA [8]	30.2	25.9
DVWA-BiLSTM SA [8]	30.5	26.2
DVWA-ReBiLSTM SA (shortcut) [8]	30.7	26.4
DVWA-ReBiLSTM SA (attention) [8]	30.9	26.6
Base Model	48.14	36.25
Base model with BN	39.30	35.82
Stacked LSTM	49.19	37.88
Multi-layer attention (Proposed)	51.57	39.47