论文标题
互动建议的模型反应反事实综合策略
Model-agnostic Counterfactual Synthesis Policy for Interactive Recommendation
论文作者
论文摘要
交互式建议能够从用户和系统之间的交互过程中学习,以面对用户的动态利益。最近的进步确信,强化学习处理动态过程的能力可以有效地应用于交互式建议中。但是,交互数据的稀疏性可能会妨碍系统的性能。我们建议培训模型反事实的合成策略,以生成反事实数据,并通过通过观察和反事实分布进行建模来解决数据稀疏问题。拟议的策略可以与其他代理商在培训过程中任何州的任何状态识别和替换可以在任何基于RL的算法中部署的琐碎组件。实验结果证明了我们提出的政策的有效性和普遍性。
Interactive recommendation is able to learn from the interactive processes between users and systems to confront the dynamic interests of users. Recent advances have convinced that the ability of reinforcement learning to handle the dynamic process can be effectively applied in the interactive recommendation. However, the sparsity of interactive data may hamper the performance of the system. We propose to train a Model-agnostic Counterfactual Synthesis Policy to generate counterfactual data and address the data sparsity problem by modelling from observation and counterfactual distribution. The proposed policy can identify and replace the trivial components for any state in the training process with other agents, which can be deployed in any RL-based algorithm. The experimental results demonstrate the effectiveness and generality of our proposed policy.