语音生成模型的可归因于含水标记

论文标题

语音生成模型的可归因于含水标记

Attributable-Watermarking of Speech Generative Models

论文作者

Cho, Yongbaek, Kim, Changhoon, Yang, Yezhou, Ren, Yi

论文摘要

现在，生成模型能够综合几乎无法与真实内容区分的图像，语音和视频。这样的功能引起了恶意模仿和IP盗窃等问题。本文研究了用于模型归因的解决方案，即通过嵌入内容中的水印对合成内容的分类。在图像域中模型归因的过去成功的基础上，我们讨论了算法改进，以生成用户和语音模型，这些语音模型在经验上实现高归因精度，同时保持高发电质量。在各种攻击中，我们对试图去除水印的发作信号的多种攻击以及学习强大的水印对这些攻击的可行性的差异之间的贸易。

Generative models are now capable of synthesizing images, speeches, and videos that are hardly distinguishable from authentic contents. Such capabilities cause concerns such as malicious impersonation and IP theft. This paper investigates a solution for model attribution, i.e., the classification of synthetic contents by their source models via watermarks embedded in the contents. Building on past success of model attribution in the image domain, we discuss algorithmic improvements for generating user-end speech models that empirically achieve high attribution accuracy, while maintaining high generation quality. We show the trade off between attributability and generation quality under a variety of attacks on generated speech signals attempting to remove the watermarks, and the feasibility of learning robust watermarks against these attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题