SPECRNET：更快，更容易访问的音频深击检测

论文标题

SPECRNET：更快，更容易访问的音频深击检测

SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection

论文作者

Kawa, Piotr, Plata, Marcin, Syga, Piotr

论文摘要

音频深色是使用深神经网络产生的话语。它们具有很大的误导性，并且由于假新闻，模仿或勒索而构成威胁。在这项工作中，我们专注于通过提供SpecRnet（一种以快速推理时间和低计算需求为特征的神经网络架构）来提高对音频DeepFake检测方法的可访问性。我们的基准测试表明，SpecRnet需要减少处理音频样本的时间最多约40％，它提供了与LCNN体系结构相当的性能 - 最佳音频DeepFake检测模型之一。在线多媒体服务不仅可以使用这种方法来验证每天上传大量内容的大量内容，而且由于其低要求，普通公民可以评估其设备上的材料。此外，我们在三个独特的设置中提供基准，以确认我们的模型的正确性。它们反映了低资源数据集的场景，对简短话语的检测和有限的攻击基准，我们仔细研究了特定攻击对给定体系结构的影响。

Audio DeepFakes are utterances generated with the use of deep neural networks. They are highly misleading and pose a threat due to use in fake news, impersonation, or extortion. In this work, we focus on increasing accessibility to the audio DeepFake detection methods by providing SpecRNet, a neural network architecture characterized by a quick inference time and low computational requirements. Our benchmark shows that SpecRNet, requiring up to about 40% less time to process an audio sample, provides performance comparable to LCNN architecture - one of the best audio DeepFake detection models. Such a method can not only be used by online multimedia services to verify a large bulk of content uploaded daily but also, thanks to its low requirements, by average citizens to evaluate materials on their devices. In addition, we provide benchmarks in three unique settings that confirm the correctness of our model. They reflect scenarios of low-resource datasets, detection on short utterances and limited attacks benchmark in which we take a closer look at the influence of particular attacks on given architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题