From: Image captioning model using attention and object features to mimic human image understanding
Method | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | CIDEr | ROUGE-L | SPICE |
---|---|---|---|---|---|---|---|---|
Sharif et al.’s baseline model | 0.4368 | NA | NA | NA | 0.1297 | 0.2517 | 0.2997 | 0.0700 |
Sharif et al.’s suggested model | 0.4462 | NA | NA | NA | 0.1350 | 0.2835 | 0.3116 | 0.0741 |
Our baseline model | 0.3990 | 0.2200 | 0.1170 | 0.0620 | 0.1230 | 0.1480 | 0.2930 | 0.0740 |
Our model (with the importance factor) | 0.3980 | 0.2210 | 0.1160 | 0.0610 | 0.1290 | 0.1500 | 0.2980 | 0.0740 |
Sharif et al.’s increase (%) | 2.15 | NA | NA | NA | 4.08 | 12.63 | 3.97 | 5.85 |
Our increase (%) | − 0.25 | 0.45 | − 0.86 | − 1.63 | 4.87 | 1.35 | 1.7 | 0 |