奖励有限的交互式建议，自然语言反馈

论文标题

奖励有限的交互式建议，自然语言反馈

Reward Constrained Interactive Recommendation with Natural Language Feedback

论文作者

Zhang, Ruiyi, Yu, Tong, Shen, Yilin, Jin, Hongxia, Chen, Changyou, Carin, Lawrence

论文摘要

基于文本的交互式建议提供了更丰富的用户反馈，并证明了与传统交互式推荐系统相比的优势。但是，建议很容易违反用户过去自然语言反馈的偏好，因为建议者需要探索新项目以进一步改进。为了减轻此问题，我们提出了一个新颖的约束强化强化学习（RL）框架，以有效地纳入用户偏好。具体而言，我们利用歧视者来检测违反用户历史偏好的建议，该建议纳入了最大化预期累积累积未来奖励的标准RL目标。我们提出的框架是一般的，进一步扩展到了约束文本生成的任务。经验结果表明，相对于标准RL方法，所提出的方法产生一致的改进。

Text-based interactive recommendation provides richer user feedback and has demonstrated advantages over traditional interactive recommender systems. However, recommendations can easily violate preferences of users from their past natural-language feedback, since the recommender needs to explore new items for further improvement. To alleviate this issue, we propose a novel constraint-augmented reinforcement learning (RL) framework to efficiently incorporate user preferences over time. Specifically, we leverage a discriminator to detect recommendations violating user historical preference, which is incorporated into the standard RL objective of maximizing expected cumulative future rewards. Our proposed framework is general and is further extended to the task of constrained text generation. Empirical results show that the proposed method yields consistent improvement relative to standard RL methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题