论文标题

CTR预测中数值特征的嵌入学习框架

An Embedding Learning Framework for Numerical Features in CTR Prediction

论文作者

Guo, Huifeng, Chen, Bo, Tang, Ruiming, Zhang, Weinan, Li, Zhenguo, He, Xiuqiang

论文摘要

点击率(CTR)预测对于工业推荐系统至关重要,其中大多数深CTR模型都遵循嵌入\&特征交互范式。但是,大多数方法着重于设计网络体系结构以更好地捕获特征交互,而该功能嵌入,尤其是对于数值功能而言,已被忽略。由于基于离线专业知识特征工程的容量较低或严格的离散化,因此很难获取数值特征的现有方法。在本文中,我们提出了一个新颖的嵌入学习框架,用于具有高模型容量,端到端训练和保留唯一表示属性的CTR预测(AUTODIS)中的数值特征。 Autodis由三个核心组成部分组成:元嵌入,自动离散和聚合。具体而言,我们为每个数值字段提出荟萃限制,以从可管理数量的参数的字段的角度学习全局知识。然后,可区分的自动离散化执行软离散化,并捕获数值特征和元嵌入之间的相关性。最后,通过聚合函数学习了独特的和信息丰富的嵌入。对两个公共和一个工业数据集进行了全面的实验,以验证自动的有效性。此外,Autodis已部署到主流广告平台上,在线A/B测试在CTR和ECPM方面分别证明了基本模型的改善分别为2.1%和2.7%。此外,我们的框架代码在Mindspore中公开可用(https://gitee.com/mindspore/mindspore/mindspore/tree/master/model/model_zoo/research/recommend/autodis)。

Click-Through Rate (CTR) prediction is critical for industrial recommender systems, where most deep CTR models follow an Embedding \& Feature Interaction paradigm. However, the majority of methods focus on designing network architectures to better capture feature interactions while the feature embedding, especially for numerical features, has been overlooked. Existing approaches for numerical features are difficult to capture informative knowledge because of the low capacity or hard discretization based on the offline expertise feature engineering. In this paper, we propose a novel embedding learning framework for numerical features in CTR prediction (AutoDis) with high model capacity, end-to-end training and unique representation properties preserved. AutoDis consists of three core components: meta-embeddings, automatic discretization and aggregation. Specifically, we propose meta-embeddings for each numerical field to learn global knowledge from the perspective of field with a manageable number of parameters. Then the differentiable automatic discretization performs soft discretization and captures the correlations between the numerical features and meta-embeddings. Finally, distinctive and informative embeddings are learned via an aggregation function. Comprehensive experiments on two public and one industrial datasets are conducted to validate the effectiveness of AutoDis. Moreover, AutoDis has been deployed onto a mainstream advertising platform, where online A/B test demonstrates the improvement over the base model by 2.1% and 2.7% in terms of CTR and eCPM, respectively. In addition, the code of our framework is publicly available in MindSpore(https://gitee.com/mindspore/mindspore/tree/master/model_zoo/research/recommend/autodis).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源