Skip to main content

Table 1 Performance of proposed models on MSVD and MSRVTT Dataset

From: A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM

Model

MSVD

MSR VTT

B@1

B@2

B@3

B@4

METEOR

B@1

B@2

B@3

B@4

METEOR

Base model

66.01

49.42

38.69

27.19

48.14

57.75

37.49

29.50

16.05

36.25

Base model with BN

62.07

40.28

27.07

16.61

39.30

63.09

38.84

26.99

14.02

35.82

Stacked LSTM

67.49

51.98

41.90

31.23

49.19

58.18

41.41

32.02

17.61

37.88

Multi-layer attention(Proposed)

70.50

56.62

49.60

33.07

51.77

60.33

43.72

34.12

19.61

39.47