论文标题
基于蒙版语言建模的文本中隐藏在文本中的可逆数据的一般框架
General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling
论文作者
论文摘要
随着自然语言处理的快速发展,信息的最新进展隐藏在秘密嵌入秘密信息中。这些算法要么修改给定的封面文本,要么直接生成包含秘密信息的文本,但是,这不是可逆的,这意味着除非预先共享很多侧面信息,否则原始文本不携带秘密信息。为了解决这个问题,在本文中,我们提出了一个通用框架,将秘密信息嵌入给定的封面文本中,为此,可以从标记的文本中完美地检索到嵌入式信息和原始封面文本。提出方法的主要思想是使用蒙版的语言模型来生成如此明显的文本,以至于可以通过收集某些位置的单词来重建封面文本,并且可以处理其他位置的单词来提取秘密信息。我们的结果表明,原始封面文本和秘密信息可以成功地嵌入和提取。同时,带有秘密信息的标记文本具有良好的流利性和语义质量,表明所提出的方法具有令人满意的安全性,这已通过实验结果进行了验证。此外,不需要数据hider和数据接收器共享语言模型,这大大降低了侧面信息,因此在应用程序中具有良好的潜力。
With the fast development of natural language processing, recent advances in information hiding focus on covertly embedding secret information into texts. These algorithms either modify a given cover text or directly generate a text containing secret information, which, however, are not reversible, meaning that the original text not carrying secret information cannot be perfectly recovered unless much side information are shared in advance. To tackle with this problem, in this paper, we propose a general framework to embed secret information into a given cover text, for which the embedded information and the original cover text can be perfectly retrieved from the marked text. The main idea of the proposed method is to use a masked language model to generate such a marked text that the cover text can be reconstructed by collecting the words of some positions and the words of the other positions can be processed to extract the secret information. Our results show that the original cover text and the secret information can be successfully embedded and extracted. Meanwhile, the marked text carrying secret information has good fluency and semantic quality, indicating that the proposed method has satisfactory security, which has been verified by experimental results. Furthermore, there is no need for the data hider and data receiver to share the language model, which significantly reduces the side information and thus has good potential in applications.