论文标题
固有图像字幕评估
Intrinsic Image Captioning Evaluation
论文作者
论文摘要
图像字幕任务即将从图像生成合适的描述。对于这项任务,可能存在一些挑战,例如准确性,流利性和多样性。但是,很少有指标可以涵盖所有这些属性,同时评估字幕模型的结果。在本文中,我们首先对当代指标进行了全面的研究。由自动编码器机制和单词嵌入的研究进展,我们提出了一个基于学习的图像字幕指标,我们称之为固有的图像字幕标题评估(I2CE)。我们选择了几种最先进的图像字幕模型,并在MS Coco数据集上测试其相对于当代指标和提议的I2CE的性能。实验结果表明,我们提出的方法可以保持稳健的性能,并在遇到语义相似的表达或较少对齐语义时为候选标题提供更灵活的分数。在此问题上,拟议的指标可以作为标题之间的内在信息的新指标,这可能与现有的信息互补。
The image captioning task is about to generate suitable descriptions from images. For this task there can be several challenges such as accuracy, fluency and diversity. However there are few metrics that can cover all these properties while evaluating results of captioning models.In this paper we first conduct a comprehensive investigation on contemporary metrics. Motivated by the auto-encoder mechanism and the research advances of word embeddings we propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE). We select several state-of-the-art image captioning models and test their performances on MS COCO dataset with respects to both contemporary metrics and the proposed I2CE. Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics. On this concern the proposed metric could serve as a novel indicator on the intrinsic information between captions, which may be complementary to the existing ones.