论文标题
检测转移欺诈的实例依赖性成本敏感学习
Instance-Dependent Cost-Sensitive Learning for Detecting Transfer Fraud
论文作者
论文摘要
卡交易欺诈是一个越来越多的问题,影响了全球的卡持有人。金融机构越来越依赖于数据驱动的方法来开发欺诈检测系统,这些方法能够自动检测并阻止欺诈性交易。从机器学习的角度来看,检测欺诈交易的任务是二进制分类问题。分类模型通常通过统计绩效指标(例如可能性和AUC)进行培训和评估。但是,这些措施没有考虑到实际的业务目标,即减少由于欺诈而造成的财务损失。欺诈检测应被公认为是一个依赖实例的成本敏感分类问题,在这种情况下,由于实例错误分类而导致的成本有所不同,并且需要学习分类模型的适应方法。在本文中,基于转移欺诈检测的实例依赖性成本矩阵,得出了一个与实例有关的阈值,该阈值允许为每个事务做出基于成本的最佳决策。基于套索调节的逻辑回归和梯度树的增强,提出了两个新型分类器,这些分类器在学习分类模型时直接最大程度地降低了所提出的实例依赖性成本度量。所提出的方法是在CSlogit和CSBoost的R软件包中实现的,并将与机器学习竞赛网站Kaggle和专有卡交易数据集的公开数据集的最新方法进行了比较。实验的结果突出了通过采用提出的方法来减少欺诈损失的潜力。
Card transaction fraud is a growing problem affecting card holders worldwide. Financial institutions increasingly rely upon data-driven methods for developing fraud detection systems, which are able to automatically detect and block fraudulent transactions. From a machine learning perspective, the task of detecting fraudulent transactions is a binary classification problem. Classification models are commonly trained and evaluated in terms of statistical performance measures, such as likelihood and AUC, respectively. These measures, however, do not take into account the actual business objective, which is to minimize the financial losses due to fraud. Fraud detection is to be acknowledged as an instance-dependent cost-sensitive classification problem, where the costs due to misclassification vary between instances, and requiring adapted approaches for learning a classification model. In this article, an instance-dependent threshold is derived, based on the instance-dependent cost matrix for transfer fraud detection, that allows for making the optimal cost-based decision for each transaction. Two novel classifiers are presented, based on lasso-regularized logistic regression and gradient tree boosting, which directly minimize the proposed instance-dependent cost measure when learning a classification model. The proposed methods are implemented in the R packages cslogit and csboost, and compared against state-of-the-art methods on a publicly available data set from the machine learning competition website Kaggle and a proprietary card transaction data set. The results of the experiments highlight the potential of reducing fraud losses by adopting the proposed methods.