From: A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM
Model | MSVD | MSR VTT | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
B@1 | B@2 | B@3 | B@4 | METEOR | B@1 | B@2 | B@3 | B@4 | METEOR | |
Base model | 66.01 | 49.42 | 38.69 | 27.19 | 48.14 | 57.75 | 37.49 | 29.50 | 16.05 | 36.25 |
Base model with BN | 62.07 | 40.28 | 27.07 | 16.61 | 39.30 | 63.09 | 38.84 | 26.99 | 14.02 | 35.82 |
Stacked LSTM | 67.49 | 51.98 | 41.90 | 31.23 | 49.19 | 58.18 | 41.41 | 32.02 | 17.61 | 37.88 |
Multi-layer attention(Proposed) | 70.50 | 56.62 | 49.60 | 33.07 | 51.77 | 60.33 | 43.72 | 34.12 | 19.61 | 39.47 |