Mega：通过协作发电机 - 替代网络窃取模型

论文标题

Mega：通过协作发电机 - 替代网络窃取模型

MEGA: Model Stealing via Collaborative Generator-Substitute Networks

论文作者

Hong, Chi, Huang, Jiyue, Chen, Lydia Y.

论文摘要

深度机器学习模型越来越多地部署在野外，以向用户提供服务。对手可能会根据培训的培训模型对这些有价值的模型的知识，根据被限制的部署模型的推理结果。最新的无数据模型窃取方法可有效地提取thetarget模型的知识而无需使用实际查询示例，但是它们是富含的推理信息，例如类别的概率和logits。但是，它们都是基于竞争的生成器 - 积分网络，因此遇到了培训的不稳定性。在本文中，我们提出了一个无数据的模型窃取框架，Mega，该工作是基于合成质量示例的目标模型型号基于合作型号的toprovide-toprovide标签预测。我们方法的示例是一种模型窃取了两个协作模型的优化连接（i）替代模型通过合成的QueryExamples及其推断标签模仿目标模型，以及（ii）综合图像，这些图像在每个查询示例中的theSubstitute模型的信心是最大化的。 Wepropose一种新型的协调下降训练程序并分析了其收敛性。我们还在三个数据集及其Applicationon Black-Box对抗攻击上进行了经验评估替代模型。我们的结果表明，我们训练有素的替代模型和对抗性攻击的成功率的征服性最高可高达33％和40％的最先进的无数据黑盒攻击。

Deep machine learning models are increasingly deployedin the wild for providing services to users. Adversaries maysteal the knowledge of these valuable models by trainingsubstitute models according to the inference results of thetargeted deployed models. Recent data-free model stealingmethods are shown effective to extract the knowledge of thetarget model without using real query examples, but they as-sume rich inference information, e.g., class probabilities andlogits. However, they are all based on competing generator-substitute networks and hence encounter training instability.In this paper we propose a data-free model stealing frame-work,MEGA, which is based on collaborative generator-substitute networks and only requires the target model toprovide label prediction for synthetic query examples. Thecore of our method is a model stealing optimization con-sisting of two collaborative models (i) the substitute modelwhich imitates the target model through the synthetic queryexamples and their inferred labels and (ii) the generatorwhich synthesizes images such that the confidence of thesubstitute model over each query example is maximized. Wepropose a novel coordinate descent training procedure andanalyze its convergence. We also empirically evaluate thetrained substitute model on three datasets and its applicationon black-box adversarial attacks. Our results show that theaccuracy of our trained substitute model and the adversarialattack success rate over it can be up to 33% and 40% higherthan state-of-the-art data-free black-box attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题