From: A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
MODEL | METEOR | |
---|---|---|
MSVD | MSRVTT | |
LSTM [42] | 26.9 | 23.4 |
LSTM-E[VGG] [42] | 29.5 | – |
LSTM-E[C3D] [42] | 29.9 | – |
MM-VDN [43] | 29.0 | – |
LK [44] | 30.3 | – |
S2VT-unidirectional [17] | 29.6 | 25.2 |
S2VT-bidirectional [17] | 29.7 | 25.6 |
S2VT-reinforced [17] | 29.9 | 25.9 |
S2VT-VGG [17] | 29.2 | – |
S2VT-VGG+Flow (Alexnet) [17] | 29.8 | – |
DVWA-uni [8] | 29.6 | 25.7 |
DVWA-BiLSTM [8] | 29.8 | 26.1 |
DVWA-ReBiLSTM [8] | 30.3 | 26.2 |
DVWA-uni SA [8] | 30.2 | 25.9 |
DVWA-BiLSTM SA [8] | 30.5 | 26.2 |
DVWA-ReBiLSTM SA (shortcut) [8] | 30.7 | 26.4 |
DVWA-ReBiLSTM SA (attention) [8] | 30.9 | 26.6 |
Base Model | 48.14 | 36.25 |
Base model with BN | 39.30 | 35.82 |
Stacked LSTM | 49.19 | 37.88 |
Multi-layer attention (Proposed) | 51.57 | 39.47 |