论文标题
沉默的声音:合成音频检测中第一位数特征的效率
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
论文作者
论文摘要
生成神经策略和音频处理技术的最新整合促进了合成语音综合或转化算法的广泛扩展。事实证明,这种能力在许多法律和信息丰富的过程中是有害的(新闻,生物识别认证,法院中的音频证据等)。因此,由于伪造技术的异质性,有效检测算法的发展既重要又具有挑战性。 这项工作研究了沉默部分在合成语音检测中的歧视性作用,并显示了从MFCC系数中提取的第一个数字统计数据如何有效地实现可靠的检测。所提出的过程在许多不同的算法上是计算轻量级且有效的,因为它不依赖大型神经检测体系结构,并且在ASVSPOOF数据集的大多数类中获得了高于90 \%的精度。
The recent integration of generative neural strategies and audio processing techniques have fostered the widespread of synthetic speech synthesis or transformation algorithms. This capability proves to be harmful in many legal and informative processes (news, biometric authentication, audio evidence in courts, etc.). Thus, the development of efficient detection algorithms is both crucial and challenging due to the heterogeneity of forgery techniques. This work investigates the discriminative role of silenced parts in synthetic speech detection and shows how first digit statistics extracted from MFCC coefficients can efficiently enable a robust detection. The proposed procedure is computationally-lightweight and effective on many different algorithms since it does not rely on large neural detection architecture and obtains an accuracy above 90\% in most of the classes of the ASVSpoof dataset.