Misim：使用上下文感知语义结构的神经代码语义相似性系统

论文标题

Misim：使用上下文感知语义结构的神经代码语义相似性系统

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

论文作者

Ye, Fangke, Zhou, Shengtian, Venkat, Anand, Marcus, Ryan, Tatbul, Nesime, Tithi, Jesmin Jahan, Hasabnis, Niranjan, Petersen, Paul, Mattson, Timothy, Kraska, Tim, Dubey, Pradeep, Sarkar, Vivek, Gottschlich, Justin

论文摘要

代码语义相似性可用于许多任务，例如代码建议，自动软件缺陷校正和克隆检测。但是，此类系统的准确性尚未达到一定程度的通用可靠性。为了解决这个问题，我们提出机器推断代码相似性（MISIM），这是一个由两个核心组成部分组成的神经代码相似性系统：（i）Misim使用一种新颖的上下文感知语义语义结构，该结构是专门构建的，可以从代码语法中提升语义；（ii）Misim使用可扩展的神经代码相似性评分算法，该算法可用于具有学习参数的各种神经网络体系结构。我们将Misim与四个最先进的系统进行了比较，其中包括另外两个由超过1800万行代码组成的超过328K程序。我们的实验表明，与下一个最佳性能系统相比，Misim的精度（使用MAP@R）优于8.08％。

Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses a novel context-aware semantics structure, which was purpose-built to lift semantics from code syntax; (ii)MISIM uses an extensible neural code similarity scoring algorithm, which can be used for various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems, including two additional hand-customized models, over 328K programs consisting of over 18 million lines of code. Our experiments show that MISIM has 8.08% better accuracy (using MAP@R) compared to the next best performing system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题