机器学习分类器评估指标行为的实证研究大量不平衡和嘈杂的数据

论文标题

机器学习分类器评估指标行为的实证研究大量不平衡和嘈杂的数据

Empirical study of Machine Learning Classifier Evaluation Metrics behavior in Massively Imbalanced and Noisy data

论文作者

Kulatilleke, Gayan K., Samarakoon, Sugandika

论文摘要

随着信用卡交易量的不断增长，欺诈百分比也在上升，包括机构打击和补偿受害者的间接费用。将机器学习用于金融部门可以更有效地保护欺诈和其他经济犯罪。经过适当训练的机器学习分类器有助于主动欺诈检测，改善了利益相关者的信任和鲁棒性，对非法交易。但是，由于欺诈数据的极为不平衡的性质以及准确而完全确定欺诈行为的挑战以创建金标准的地面真相，因此基于机器学习的欺诈检测算法的设计变得具有挑战性和缓慢。此外，没有基准或标准分类器评估指标来衡量和识别更好的性能分类器，从而使研究人员处于黑暗状态。在这项工作中，我们建立了一个理论基础，以模拟人类注释错误和现实世界欺诈检测数据集中典型的极端失衡。通过在假设分类器上进行经验实验，综合数据分布近似于流行的现实世界信用卡欺诈数据集，我们模拟了人类注释误差和极端失衡，以观察流行的机器学习分类器评估矩阵的行为。我们证明，按照特定顺序，合并的F1得分和G均值是典型不平衡欺诈检测模型分类的最佳评估指标。

With growing credit card transaction volumes, the fraud percentages are also rising, including overhead costs for institutions to combat and compensate victims. The use of machine learning into the financial sector permits more effective protection against fraud and other economic crime. Suitably trained machine learning classifiers help proactive fraud detection, improving stakeholder trust and robustness against illicit transactions. However, the design of machine learning based fraud detection algorithms has been challenging and slow due the massively unbalanced nature of fraud data and the challenges of identifying the frauds accurately and completely to create a gold standard ground truth. Furthermore, there are no benchmarks or standard classifier evaluation metrics to measure and identify better performing classifiers, thus keeping researchers in the dark. In this work, we develop a theoretical foundation to model human annotation errors and extreme imbalance typical in real world fraud detection data sets. By conducting empirical experiments on a hypothetical classifier, with a synthetic data distribution approximated to a popular real world credit card fraud data set, we simulate human annotation errors and extreme imbalance to observe the behavior of popular machine learning classifier evaluation matrices. We demonstrate that a combined F1 score and g-mean, in that specific order, is the best evaluation metric for typical imbalanced fraud detection model classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题