论文标题

部分可观测时空混沌系统的无模型预测

I still know it's you! On Challenges in Anonymizing Source Code

论文作者

Horlboge, Micha, Quiring, Erwin, Meyer, Roland, Rieck, Konrad

论文摘要

程序的源代码不仅定义了其语义,还包含可以识别其作者的细微线索。几项研究表明,这些线索可以使用机器学习自动提取,并允许在数百名程序员中确定程序的作者。这种归因对反审查和增强隐私技术的开发商构成了重大威胁,因为它们变得可识别并可能受到起诉。对这种威胁的理想保护是源代码的匿名化。但是,到目前为止,尚未探索这种匿名化的理论和实际原则。 在本文中,我们解决了此问题,并为有关代码匿名化的推理开发了一个框架。我们证明,生成$ k $ - 匿名程序的任务(一个不能归因于$ k $ authors之一)的程序在一般情况下是不可计算的。作为一种补救措施,我们介绍了一个名为$ k $ uncluctity的轻松概念,这使我们能够衡量开发人员的保护。基于这个概念,我们在经验上研究了匿名化的候选技术,例如代码归一化,编码样式模仿和代码混淆。我们发现,当攻击者意识到匿名化时,这些技术都没有提供足够的保护。尽管我们观察到现实代码上的归因性能显着降低,但并非所有开发人员都能实现可靠的保护。我们得出的结论是,代码匿名是一个棘手的问题,需要研究界进一步关注。

The source code of a program not only defines its semantics but also contains subtle clues that can identify its author. Several studies have shown that these clues can be automatically extracted using machine learning and allow for determining a program's author among hundreds of programmers. This attribution poses a significant threat to developers of anti-censorship and privacy-enhancing technologies, as they become identifiable and may be prosecuted. An ideal protection from this threat would be the anonymization of source code. However, neither theoretical nor practical principles of such an anonymization have been explored so far. In this paper, we tackle this problem and develop a framework for reasoning about code anonymization. We prove that the task of generating a $k$-anonymous program -- a program that cannot be attributed to one of $k$ authors -- is not computable in the general case. As a remedy, we introduce a relaxed concept called $k$-uncertainty, which enables us to measure the protection of developers. Based on this concept, we empirically study candidate techniques for anonymization, such as code normalization, coding style imitation, and code obfuscation. We find that none of the techniques provides sufficient protection when the attacker is aware of the anonymization. While we observe a notable reduction in attribution performance on real-world code, a reliable protection is not achieved for all developers. We conclude that code anonymization is a hard problem that requires further attention from the research community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源