XMAM：带有矩阵的X射线模型，以揭示用于联合学习的后门攻击

论文标题

XMAM：带有矩阵的X射线模型，以揭示用于联合学习的后门攻击

XMAM:X-raying Models with A Matrix to Reveal Backdoor Attacks for Federated Learning

论文作者

Zhang, Jianyi, Zhang, Fangjiao, Jin, Qichao, Wang, Zhiqiang, Lin, Xiaodong, Hei, Xiali

论文摘要

联邦学习（FL）由于其隐私保护能力而受到了越来越多的关注。但是，当基本算法遭受所谓的后门攻击时，FedAvg易受伤害。前研究人员提出了几种强大的聚合方法。不幸的是，许多这些聚合方法无法防御后门攻击。更重要的是，攻击者最近提出了一些隐藏方法，以进一步改善后门攻击的隐身性，从而使所有现有的强大聚合方法失败。为了应对后门攻击的威胁，我们提出了一种新的聚合方法，具有矩阵（XMAM）的X射线模型，以揭示后门攻击者提交的恶意本地模型更新。由于我们观察到SoftMax层的输出在恶意和良性更新之间表现出可区分的模式，因此我们专注于SoftMax层的输出，在该输出中，后门攻击者很难隐藏其恶意行为。具体来说，像X射线检查一样，我们通过使用矩阵作为输入来调查本地模型更新，以获取其SoftMax层的输出。然后，我们排除了其输出通过聚类异常的更新。没有服务器中的任何培训数据集，广泛的评估表明，我们的XMAM可以有效地将恶意的本地模型更新与良性模型区分开。例如，当其他方法无法以不超过20％的恶意客户端防御后门攻击，我们的方法可以在黑盒模式下忍受45％的恶意客户，在预计梯度下降（PGD）模式下约为30％。此外，在自适应攻击下，结果表明，即使有40％的恶意客户，XMAM仍然可以完成全球模型培训任务。最后，我们分析了方法的筛选复杂性，结果表明XMAM比现有方法快10-10000倍。

Federated Learning (FL) has received increasing attention due to its privacy protection capability. However, the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks. Former researchers proposed several robust aggregation methods. Unfortunately, many of these aggregation methods are unable to defend against backdoor attacks. What's more, the attackers recently have proposed some hiding methods that further improve backdoor attacks' stealthiness, making all the existing robust aggregation methods fail. To tackle the threat of backdoor attacks, we propose a new aggregation method, X-raying Models with A Matrix (XMAM), to reveal the malicious local model updates submitted by the backdoor attackers. Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates, we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior. Specifically, like X-ray examinations, we investigate the local model updates by using a matrix as an input to get their Softmax layer's outputs. Then, we preclude updates whose outputs are abnormal by clustering. Without any training dataset in the server, the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones. For instance, when other methods fail to defend against the backdoor attacks at no more than 20% malicious clients, our method can tolerate 45% malicious clients in the black-box mode and about 30% in Projected Gradient Descent (PGD) mode. Besides, under adaptive attacks, the results demonstrate that XMAM can still complete the global model training task even when there are 40% malicious clients. Finally, we analyze our method's screening complexity, and the results show that XMAM is about 10-10000 times faster than the existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题