Bilingual video captioning model for enhanced video retrieval

Journal of Big Data

Table 3 Summary of the free-template-based studies

Ref.	Year	Method	Dataset	Evaluation metrics
Ref.	Year	Method	Dataset	B	M	R	C
[19]	2021	CNN-GRU	MSVD	57.9	37.4	74.7	96.3
			MSR-VTT	45.1	28.6	61.8	51.5
[32]	2021	CNN-GRU	MSVD	55.1	36.4	72.2	85.7
			MSR-VTT	42.3	28.9	61.7	49.2
[33]	2021	CNN-RNN	MSVD	54.2	34.8	71.7	88.2
			MSR-VTT	40.9	27.5	60.2	47.5
[34]	2021	CNN-BiLSTMs	MSVD	41.8	–	–	60.1
			ActivityNet	32.1	–	–	25.7
[35]	2022	CNN-LSTM	MSVD	43.7	32.3	68.8	70.7
[36]	2022	CNN-LSTM	MSVD	57.4	36.9	75.6	98.1
			MSR-VTT	46.5	32.8	55.8	62.4
[37]	2021	CNN-LSTM and RL	MSVD	52.3	35.0	71.9	84.3
[37]	2021	CNN-LSTM and RL	MSR-VTT	41.1	27.5	60.4	47.0
[38]	2022	CNN-GRU and RL	MSVD	52.5	35.0	72.4	94.5
[38]	2022	CNN-GRU and RL	MSR-VTT	41.3	28.7	62.1	53.8
[40]	2018	CNN-LSTM and GAN	MSVD	42.9	30.4	–	–
			MSR-VTT	36.0	26.1	–	–
			M-VAD	–	63.0	–	–
			MPII-MD	–	72.0	–	–

MSVD microsoft research video description, MSR-VTT microsoft research video to text, MPII-MD Max Planck Institute for Informatics-Movie Description