论文标题
朝着保护窃检测的内容的第一步
A First Step Towards Content Protecting Plagiarism Detection
论文作者
论文摘要
窃检测系统是维护学术和教育完整性的重要工具。但是,当今的系统需要披露输入文档的全部内容以及比较输入文档的文档收集。此外,系统是集中式的,并且在个人(通常是商业提供商)的控制之下。这种情况引起了有关敏感数据机密性的程序和法律关注,这可能会限制或禁止使用pla窃检测服务。为了消除当前系统的这些弱点,我们试图设计一种不需要集中式提供商或公开任何内容的窃检测方法。本文介绍了我们研究的最初结果。具体而言,我们采用私有集体交叉点来设计基于引文相似性的参考书目耦合的内容的变体,该耦合在我们的pla窃检测系统模板中实现。我们的评估表明,构造方法具有与原始方法相同的检测有效性,同时进行了常见的攻击以披露受保护的内容实际上是不可行的。我们未来的工作将通过设计pla窃检测方法来扩展这种成功的概念概念,该方法可以分析文档的整个内容而不将其公开为clearText。
Plagiarism detection systems are essential tools for safeguarding academic and educational integrity. However, today's systems require disclosing the full content of the input documents and the document collection to which the input documents are compared. Moreover, the systems are centralized and under the control of individual, typically commercial providers. This situation raises procedural and legal concerns regarding the confidentiality of sensitive data, which can limit or prohibit the use of plagiarism detection services. To eliminate these weaknesses of current systems, we seek to devise a plagiarism detection approach that does not require a centralized provider nor exposing any content as cleartext. This paper presents the initial results of our research. Specifically, we employ Private Set Intersection to devise a content-protecting variant of the citation-based similarity measure Bibliographic Coupling implemented in our plagiarism detection system HyPlag. Our evaluation shows that the content-protecting method achieves the same detection effectiveness as the original method while making common attacks to disclose the protected content practically infeasible. Our future work will extend this successful proof-of-concept by devising plagiarism detection methods that can analyze the entire content of documents without disclosing it as cleartext.