Skip to main content

Table 3 Summary of the free-template-based studies

From: Bilingual video captioning model for enhanced video retrieval

Ref.

Year

Method

Dataset

Evaluation metrics

B

M

R

C

[19]

2021

CNN-GRU

MSVD

57.9

37.4

74.7

96.3

   

MSR-VTT

45.1

28.6

61.8

51.5

[32]

2021

CNN-GRU

MSVD

55.1

36.4

72.2

85.7

   

MSR-VTT

42.3

28.9

61.7

49.2

[33]

2021

CNN-RNN

MSVD

54.2

34.8

71.7

88.2

   

MSR-VTT

40.9

27.5

60.2

47.5

[34]

2021

CNN-BiLSTMs

MSVD

41.8

–

–

60.1

   

ActivityNet

32.1

–

–

25.7

[35]

2022

CNN-LSTM

MSVD

43.7

32.3

68.8

70.7

[36]

2022

CNN-LSTM

MSVD

57.4

36.9

75.6

98.1

   

MSR-VTT

46.5

32.8

55.8

62.4

[37]

2021

CNN-LSTM and RL

MSVD

52.3

35.0

71.9

84.3

MSR-VTT

41.1

27.5

60.4

47.0

[38]

2022

CNN-GRU and RL

MSVD

52.5

35.0

72.4

94.5

MSR-VTT

41.3

28.7

62.1

53.8

[40]

2018

CNN-LSTM and GAN

MSVD

42.9

30.4

–

–

MSR-VTT

36.0

26.1

–

–

M-VAD

–

63.0

–

–

MPII-MD

–

72.0

–

–

  1. MSVD microsoft research video description, MSR-VTT microsoft research video to text, MPII-MD Max Planck Institute for Informatics-Movie Description