论文标题
神经互动协作过滤
Neural Interactive Collaborative Filtering
论文作者
论文摘要
在本文中,我们在交互式环境中研究了协作过滤,在该环境中,推荐代理在提出建议和根据交互式反馈更新用户配置文件之间进行迭代。在这种情况下,最具挑战性的问题是,当用户配置文件尚未得到充分建立时,即为冷启动用户或散发口味漂移的温暖启动用户推荐项目。现有方法要么依赖于过于悲观的线性探索策略,要么以完全的利用方式采用基于元学习的算法。在这项工作中,为了快速赶上用户的兴趣,我们建议用神经网络代表探索策略,并直接从反馈数据中学习。具体而言,勘探政策是根据多渠道堆叠的自发神经网络的权重编码的,并通过最大程度地提高了用户在推荐系统中的总体满意度来接受有效的Q学习培训。关键见解是,探索建议触发的满意建议可以看作是探索奖金(延迟奖励),因为其在提高用户资料质量方面的贡献。因此,可以通过最大化用户对强化学习的长期满意度来直接优化拟议的勘探政策,以平衡学习用户资料和提出准确的建议之间的平衡。在三个基准协作过滤数据集上进行的大量实验和分析证明了我们方法比最先进的方法的优势。
In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.