From: Image captioning model using attention and object features to mimic human image understanding
Method | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | CIDEr | ROUGE-L | SPICE |
---|---|---|---|---|---|---|---|---|
Yin and Ordonez [9] baseline model | NA | NA | NA | 0.21 | 0.215 | 0.759 | 0.464 | NA |
Yin and Ordonez [9] results with object features | NA | NA | NA | 0.253 | 0.238 | 0.922 | 0.507 | NA |
Yin and Ordonez [9] increase (%) | NA | NA | NA | 20.47 | 10.69 | 21.47 | 9.26 | NA |
Our increase (%) | 6.26 | 8.42 | 11.53 | 16.09 | 3.82 | 15.04 | 3.76 | 5.88 |