具有拒绝选项的细粒度TLS服务分类

论文标题

具有拒绝选项的细粒度TLS服务分类

Fine-grained TLS services classification with reject option

论文作者

Luxemburk, Jan, Čejka, Tomáš

论文摘要

机器学习和深度学习的最新成功和扩散提供了强大的工具，这些工具也用于计算机网络中的加密流量分析，分类和威胁检测。这些方法特别是神经网络通常很复杂，需要大量的培训数据。因此，本文着重于收集一个大型最新数据集，其中包含近200个细粒度的服务标签和1.4亿个网络流，并扩展了数据包级元数据。流量的数量比其他现有公共标记的加密流量数据集高三个数量级。服务标签的数量对于使问题的严重和现实很重要，它是公共数据集的四倍。已发布的数据集旨在作为识别加密流量中服务的基准。可以通过“拒绝”未知服务的任务进一步扩展服务识别，即在培训阶段看不到的流量。神经网络为解决这个更具挑战性的问题提供了卓越的性能。为了展示数据集的实用性，我们实施了一个具有多模式体系结构的神经网络，这是最先进的方法，并实现了97.04％的分类准确性，并检测到91.94％的未知服务，并具有5％的假阳性率。

The recent success and proliferation of machine learning and deep learning have provided powerful tools, which are also utilized for encrypted traffic analysis, classification, and threat detection in computer networks. These methods, neural networks in particular, are often complex and require a huge corpus of training data. Therefore, this paper focuses on collecting a large up-to-date dataset with almost 200 fine-grained service labels and 140 million network flows extended with packet-level metadata. The number of flows is three orders of magnitude higher than in other existing public labeled datasets of encrypted traffic. The number of service labels, which is important to make the problem hard and realistic, is four times higher than in the public dataset with the most class labels. The published dataset is intended as a benchmark for identifying services in encrypted traffic. Service identification can be further extended with the task of "rejecting" unknown services, i.e., the traffic not seen during the training phase. Neural networks offer superior performance for tackling this more challenging problem. To showcase the dataset's usefulness, we implemented a neural network with a multi-modal architecture, which is the state-of-the-art approach, and achieved 97.04% classification accuracy and detected 91.94% of unknown services with 5% false positive rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题