时尚字幕：旨在通过语义奖励产生准确的描述

论文标题

时尚字幕：旨在通过语义奖励产生准确的描述

Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards

论文作者

Yang, Xuewen, Zhang, Heming, Jin, Di, Liu, Yingru, Wu, Chi-Hao, Tan, Jianchao, Xie, Dongliang, Wang, Jue, Wang, Xin

论文摘要

为在线时尚项目生成准确的描述，不仅对于增强客户的购物体验，而且对于增加在线销售而言重要。除了需要正确介绍项目属性外，具有迷人风格的表达方式还可以更好地吸引客户兴趣。这项工作的目的是开发一个新颖的学习框架，以进行准确和表现力的时尚字幕。与图像字幕上的流行作品不同，很难识别和描述时尚项目的丰富属性。我们首先识别其属性，并介绍属性级别的语义（ALS）奖励和句子级别的语义（SLS）奖励作为指标来提高文本描述质量，从而播种了对项目的描述。我们将模型的训练与最大似然估计（MLE），属性嵌入和增强学习（RL）相结合。为了促进学习，我们构建了一个新的时尚字幕数据集（FACAD），其中包含993k图像和130k相应的迷人和多样的描述。 FACAD的实验证明了我们模型的有效性。

Generating accurate descriptions for online fashion items is important not only for enhancing customers' shopping experiences, but also for the increase of online sales. Besides the need of correctly presenting the attributes of items, the expressions in an enchanting style could better attract customer interests. The goal of this work is to develop a novel learning framework for accurate and expressive fashion captioning. Different from popular work on image captioning, it is hard to identify and describe the rich attributes of fashion items. We seed the description of an item by first identifying its attributes, and introduce attribute-level semantic (ALS) reward and sentence-level semantic (SLS) reward as metrics to improve the quality of text descriptions. We further integrate the training of our model with maximum likelihood estimation (MLE), attribute embedding, and Reinforcement Learning (RL). To facilitate the learning, we build a new FAshion CAptioning Dataset (FACAD), which contains 993K images and 130K corresponding enchanting and diverse descriptions. Experiments on FACAD demonstrate the effectiveness of our model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题