From: Bilingual video captioning model for enhanced video retrieval
Ref. | Year | Method | Dataset | Evaluation metrics | |||
---|---|---|---|---|---|---|---|
B | M | R | C | ||||
[19] | 2021 | CNN-GRU | MSVD | 57.9 | 37.4 | 74.7 | 96.3 |
 |  |  | MSR-VTT | 45.1 | 28.6 | 61.8 | 51.5 |
[32] | 2021 | CNN-GRU | MSVD | 55.1 | 36.4 | 72.2 | 85.7 |
 |  |  | MSR-VTT | 42.3 | 28.9 | 61.7 | 49.2 |
[33] | 2021 | CNN-RNN | MSVD | 54.2 | 34.8 | 71.7 | 88.2 |
 |  |  | MSR-VTT | 40.9 | 27.5 | 60.2 | 47.5 |
[34] | 2021 | CNN-BiLSTMs | MSVD | 41.8 | – | – | 60.1 |
 |  |  | ActivityNet | 32.1 | – | – | 25.7 |
[35] | 2022 | CNN-LSTM | MSVD | 43.7 | 32.3 | 68.8 | 70.7 |
[36] | 2022 | CNN-LSTM | MSVD | 57.4 | 36.9 | 75.6 | 98.1 |
 |  |  | MSR-VTT | 46.5 | 32.8 | 55.8 | 62.4 |
[37] | 2021 | CNN-LSTM and RL | MSVD | 52.3 | 35.0 | 71.9 | 84.3 |
MSR-VTT | 41.1 | 27.5 | 60.4 | 47.0 | |||
[38] | 2022 | CNN-GRU and RL | MSVD | 52.5 | 35.0 | 72.4 | 94.5 |
MSR-VTT | 41.3 | 28.7 | 62.1 | 53.8 | |||
[40] | 2018 | CNN-LSTM and GAN | MSVD | 42.9 | 30.4 | – | – |
MSR-VTT | 36.0 | 26.1 | – | – | |||
M-VAD | – | 63.0 | – | – | |||
MPII-MD | – | 72.0 | – | – |