具有随机平滑的认证神经网络水印

论文标题

具有随机平滑的认证神经网络水印

Certified Neural Network Watermarks with Randomized Smoothing

论文作者

Bansal, Arpit, Chiang, Ping-yeh, Curry, Michael, Jain, Rajiv, Wigington, Curtis, Manjunatha, Varun, Dickerson, John P, Goldstein, Tom

论文摘要

水印是一种保护创作者对数字图像，视频和音频的权利的常用策略。最近，水印方法已扩展到深度学习模型 - 原则上，当对手试图复制该模型时，应保留水印。但是，实际上，智能对手通常可以去除水印。几篇论文提出了水印方法，这些方法声称在经验上对不同类型的拆除攻击具有抵抗力，但是在面对新的或更好的对手时，这些新技术通常会失败。在本文中，我们提出了一种可认证的水印方法。使用Chiang等人提出的随机平滑技术，我们表明我们的水印是不明显的，除非模型参数的更改超过一定的L2阈值。除了获得认证外，与以前的水印方法相比，我们的水印在经验上也更强。我们的实验可以在https://github.com/arpitbansal297/certified_watermarks上复制。

Watermarking is a commonly used strategy to protect creators' rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models -- in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose a certifiable watermarking method. Using the randomized smoothing technique proposed in Chiang et al., we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods. Our experiments can be reproduced with code at https://github.com/arpitbansal297/Certified_Watermarks

下载PDF全文

下载文献需遵守相关版权规定

论文标题