对预训练的语言模型的威胁：调查和分类法

论文标题

对预训练的语言模型的威胁：调查和分类法

Threats to Pre-trained Language Models: Survey and Taxonomy

论文作者

Guo, Shangwei, Xie, Chunlong, Li, Jiwei, Lyu, Lingjuan, Zhang, Tianwei

论文摘要

在广泛的自然语言处理（NLP）任务中，预训练的语言模型（PTLM）取得了巨大的成功和出色的表现。但是，人们对PTLM的潜在安全问题的关注也日益增长。在这项调查中，我们全面地系统化了最近发现的对PTLM系统和应用程序的威胁。我们从三个有趣的角度执行攻击表征。（1）我们表明威胁可能发生在不同恶意实体提出的PTLM管道的不同阶段。（2）我们确定了有助于攻击的两种类型的模型可传递性（景观，肖像）。（3）根据攻击目标，我们总结了四类攻击（后门，逃避，数据隐私和模型隐私）。我们还讨论了一些开放问题和研究方向。我们认为，我们的调查和分类法将激发未来的研究，以朝着安全和隐私保护的PTLMS启用。

Pre-trained language models (PTLMs) have achieved great success and remarkable performance over a wide range of natural language processing (NLP) tasks. However, there are also growing concerns regarding the potential security issues in the adoption of PTLMs. In this survey, we comprehensively systematize recently discovered threats to PTLM systems and applications. We perform our attack characterization from three interesting perspectives. (1) We show threats can occur at different stages of the PTLM pipeline raised by different malicious entities. (2) We identify two types of model transferability (landscape, portrait) that facilitate attacks. (3) Based on the attack goals, we summarize four categories of attacks (backdoor, evasion, data privacy and model privacy). We also discuss some open problems and research directions. We believe our survey and taxonomy will inspire future studies towards secure and privacy-preserving PTLMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题