论文标题
使用启发式的机器学习中流媒体服务中的滥用和欺诈检测
Abuse and Fraud Detection in Streaming Services Using Heuristic-Aware Machine Learning
论文作者
论文摘要
这项工作通过对用户流动行为进行建模,为流服务提供了一个欺诈和滥用检测框架。目的是发现异常和可疑事件,并通过创建表征用户行为的模型来扩展调查工作。我们研究了半监督和监督方法的使用方法。在半监督的方法中,仅利用一组身份验证的无异常数据样本,我们显示了使用一级分类算法以及自动编码器深神经网络进行异常检测。在监督的异常检测任务中,我们提出了一种所谓的启发式启发式数据标记策略,用于创建标记的数据样本。我们执行二进制分类以及多级多标签分类任务,不仅可以检测异常样本,还可以确定与每个样本相关的基本异常行为。最后,使用系统的功能重要性研究,我们提供了对表征不同流欺诈类别的一组基础功能的见解。据我们所知,这是第一篇使用机器学习方法在现实世界规模流服务中使用机器学习方法进行欺诈和滥用检测的论文。
This work presents a fraud and abuse detection framework for streaming services by modeling user streaming behavior. The goal is to discover anomalous and suspicious incidents and scale the investigation efforts by creating models that characterize the user behavior. We study the use of semi-supervised as well as supervised approaches for anomaly detection. In the semi-supervised approach, by leveraging only a set of authenticated anomaly-free data samples, we show the use of one-class classification algorithms as well as autoencoder deep neural networks for anomaly detection. In the supervised anomaly detection task, we present a so-called heuristic-aware data labeling strategy for creating labeled data samples. We carry out binary classification as well as multi-class multi-label classification tasks for not only detecting the anomalous samples but also identifying the underlying anomaly behavior(s) associated with each one. Finally, using a systematic feature importance study we provide insights into the underlying set of features that characterize different streaming fraud categories. To the best of our knowledge, this is the first paper to use machine learning methods for fraud and abuse detection in real-world scale streaming services.