看起来很有趣！通过随机森林节点嵌入个性化交流和细分

论文标题

看起来很有趣！通过随机森林节点嵌入个性化交流和细分

That looks interesting! Personalizing Communication and Segmentation with Random Forest Node Embeddings

论文作者

Wang, Weiwei, Eberhardt, Wiebke, Bromuri, Stefano

论文摘要

与客户有效沟通是许多营销人员的挑战，但尤其是在个人长期财务福祉和难以理解的背景下，养老金。在世界范围内，参与者不愿意事先考虑他们的退休金，这导致缺乏退休金退休的准备[1]，[2]。为了吸引参与者获取有关其预期养老金福利的信息，个性化养老金提供商的电子邮件通信是第一步。我们描述了一种机器学习方法，以建模电子邮件新闻通讯，以适应参与者的兴趣。建模和分析的数据是从荷兰大型荷兰退休金提供商发送的新闻通讯中收集的，并分为两个部分。第一部分包括2228,000个客户，而第二部分包括一项试点研究的数据，该数据于2018年7月与465,711名参与者进行。在这两种情况下，我们的算法都使用随机森林从连续和分类数据中提取特征，然后计算随机森林决策边界的节点嵌入。我们说明了该算法对分类任务的有效性，以及如何使用它来执行数据挖掘任务。为了确认结果对多个数据集有效，我们还说明了基准数据集中有关搅拌的算法的属性。在考虑的数据集中，提出的建模证明了基于随机森林的其他最新方法的竞争性能，从而在养老金数据集（0.948）中实现了曲线下最佳区域（AUC）。对于描述性部分，该算法可以识别营销部门可以使用的客户细分，以更好地针对他们的客户沟通。

Communicating effectively with customers is a challenge for many marketers, but especially in a context that is both pivotal to individual long-term financial well-being and difficult to understand: pensions. Around the world, participants are reluctant to consider their pension in advance, it leads to a lack of preparation of their pension retirement [1], [2]. In order to engage participants to obtain information on their expected pension benefits, personalizing the pension providers' email communication is a first and crucial step. We describe a machine learning approach to model email newsletters to fit participants' interests. The data for the modeling and analysis is collected from newsletters sent by a large Dutch pension provider of the Netherlands and is divided into two parts. The first part comprises 2,228,000 customers whereas the second part comprises the data of a pilot study, which took place in July 2018 with 465,711 participants. In both cases, our algorithm extracts features from continuous and categorical data using random forests, and then calculates node embeddings of the decision boundaries of the random forest. We illustrate the algorithm's effectiveness for the classification task, and how it can be used to perform data mining tasks. In order to confirm that the result is valid for more than one data set, we also illustrate the properties of our algorithm in benchmark data sets concerning churning. In the data sets considered, the proposed modeling demonstrates competitive performance with respect to other state of the art approaches based on random forests, achieving the best Area Under the Curve (AUC) in the pension data set (0.948). For the descriptive part, the algorithm can identify customer segmentations that can be used by marketing departments to better target their communication towards their customers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题