迈向更广泛的恶意URL检测模型

论文标题

迈向更广泛的恶意URL检测模型

Toward More Generalized Malicious URL Detection Models

论文作者

Tsai, YunDa, Liow, Cayon, Siang, Yin Sheng, Lin, Shou-De

论文摘要

本文揭示了一个数据偏差问题，该问题在进行机器学习模型以进行恶意URL检测时会严重影响性能。我们描述了如何使用可解释的机器学习技术来确定这种偏见，并进一步认为，这种偏见自然存在于训练分类模型的现实世界安全数据中。然后，我们提出了一种依据的培训策略，可以应用于大多数基于学习的模型，以减轻偏见特征的负面影响。该解决方案基于自我监督的对抗训练的技术，以训练深度神经网络从偏见的数据中学习不变的嵌入。我们进行了广泛的实验，以证明所提出的策略可以使基于CNN的基于CNN和基于RNN的检测模型具有明显更好的概括能力。

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. The solution is based on the technique of self-supervised adversarial training to train deep neural networks learning invariant embedding from biased data. We conduct a wide range of experiments to demonstrate that the proposed strategy can lead to significantly better generalization capability for both CNN-based and RNN-based detection models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题