注意光束：图像字幕方法

论文标题

注意光束：图像字幕方法

Attention Beam: An Image Captioning Approach

论文作者

Shrimal, Anubhav, Chakraborty, Tanmoy

论文摘要

图像字幕的目的是生成给定图像的文本描述。尽管对于人类来说似乎是一件容易的事，但对于机器而言，它具有挑战性，因为它需要理解图像（计算机视觉）并因此为图像（自然语言理解）产生类似人类的描述。最近，基于编码器的体系结构已获得图像字幕的最新结果。在这里，我们在基于编码器的架构的顶部介绍了光束搜索的启发式，该架构在三个基准数据集上提供了更好的标题：FlickR8K，FlickR30K和MS Coco。

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题