利用NXT一无所有：类失衡对DGA检测分类器的影响

论文标题

利用NXT一无所有：类失衡对DGA检测分类器的影响

Making Use of NXt to Nothing: The Effect of Class Imbalances on DGA Detection Classifiers

论文作者

Drichel, Arthur, Meyer, Ulrike, Schüppen, Samuel, Teubert, Dominik

论文摘要

已经提出了许多机器学习分类器，以将域名分类为良性或恶意，甚至用于多类分类，以识别生成特定域名的域生成算法（DGA）。这两个分类任务都必须处理每个DGA的培训样本大量培训样本的班级失衡问题。目前，尚不清楚培训集已知只有几个样本的DGA是否对分类器的整体性能有益或有害。在本文中，我们对各种无上下文DGA分类器进行了全面的分析，该分类揭示了每个类别对两个分类任务的少数培训样本的高价值。我们证明，分类器能够通过包括以前几乎无法识别的代表性不足的类别来检测具有很高概率的各种DGA。同时，我们表明分类器对良好的类的检测能力并不能降低。

Numerous machine learning classifiers have been proposed for binary classification of domain names as either benign or malicious, and even for multiclass classification to identify the domain generation algorithm (DGA) that generated a specific domain name. Both classification tasks have to deal with the class imbalance problem of strongly varying amounts of training samples per DGA. Currently, it is unclear whether the inclusion of DGAs for which only a few samples are known to the training sets is beneficial or harmful to the overall performance of the classifiers. In this paper, we perform a comprehensive analysis of various contextless DGA classifiers, which reveals the high value of a few training samples per class for both classification tasks. We demonstrate that the classifiers are able to detect various DGAs with high probability by including the underrepresented classes which were previously hardly recognizable. Simultaneously, we show that the classifiers' detection capabilities of well represented classes do not decrease.

下载PDF全文

下载文献需遵守相关版权规定

论文标题