From: Semantic context driven language descriptions of videos using deep neural network
Method | METEOR | CIDEr |
---|---|---|
S-VC [48] | 29.3 | – |
SA [49] | 29.6 | 51.7 |
S2VT [57] | 29.2 | – |
S2VT[VGGNet+Optical flow] [57] | 29.8 | – |
MM-VDN [50] | 29.0 | – |
MP-LSTM [9] | 29.1 | – |
LSTM-E[VGGNet] [51] | 29.5 | – |
LSTM-E[C3D] [51] | 29.9 | – |
LSTM-E[VGGNet+C3D] [51] | 31.0 | – |
LSTM-GAN [8] | 30.4 | – |
p-RNN[C3D] [11] | 30.3 | – |
p-RNN[VGGNet] [11] | 31.1 | – |
LVMVP [53] | 29.9 | 51.1 |
BPLSTM [55] | 32.0 | 62.20 |
HRNE [12] | 32.1 | – |
HBNEVC [52] | – | 63.5 |
SE-GRU [54] | – | 62.3 |
STAT [58] | – | 67.5 |
MA-LSTM [29] | – | 70.4 |
UTS [56] | 33.20 | 71.10 |
STAT_LOC_V [10] | 30.5 | 62.8 |
STAT_LOC_L [10] | 31.0 | 62.5 |
Model_3 (Proposed) | 32.3 | 70.7 |