论文标题

使用图间隔神经网络学习语义程序嵌入

Learning Semantic Program Embeddings with Graph Interval Neural Network

论文作者

Wang, Yu, Gao, Fengjuan, Wang, Linzhang, Wang, Ke

论文摘要

对于机器学习模型来说,学习源代码的分布式表示形式是一项具有挑战性的任务。较早的作品将程序视为文本,以便可以轻松地应用自然语言方法。不幸的是,这种方法并不能利用源代码所拥有的丰富结构信息。最近,提出了图形神经网络(GNN),以从其图表中学习程序的嵌入。由于均匀且昂贵的消息程序程序,GNN可能会遭受精确问题的困扰,尤其是在处理大图中的程序时。在本文中,我们提出了一种新的图神经结构,称为图间隔神经网络(GINN),以应对现有GNN的弱点。与标准GNN不同,Ginn从通过旨在帮助学习模型的抽象方法获得的策划图表示。特别是,Ginn专注于用于挖掘程序的特征表示的间隔,此外,Ginn在层次结构上运行,以将学习扩展到大图。我们评估Ginn的两个流行下游应用程序:可变滥用预测和方法名称预测。结果表明,在这两种情况下,Ginn都以舒适的利润率优于最先进的模型。我们还基于GINN创建了一个神经错误检测器,以在Java代码中捕获NULL指针deference错误。从从64个项目中提取的相同的9,000种方法中学习,基于GINN的错误检测器在13个看不见的测试项目上显着优于基于GNN的错误检测器。接下来,我们部署了经过训练的基于Ginn的错误检测器和Facebook推断,以扫描Github上20个高级项目的代码库。通过我们的手册检查,我们确认了基于Ginn的错误检测器提出的102个警告中的38个错误,而Facebook推断的129个警告中有34个错误。

Learning distributed representations of source code has been a challenging task for machine learning models. Earlier works treated programs as text so that natural language methods can be readily applied. Unfortunately, such approaches do not capitalize on the rich structural information possessed by source code. Of late, Graph Neural Network (GNN) was proposed to learn embeddings of programs from their graph representations. Due to the homogeneous and expensive message-passing procedure, GNN can suffer from precision issues, especially when dealing with programs rendered into large graphs. In this paper, we present a new graph neural architecture, called Graph Interval Neural Network (GINN), to tackle the weaknesses of the existing GNN. Unlike the standard GNN, GINN generalizes from a curated graph representation obtained through an abstraction method designed to aid models to learn. In particular, GINN focuses exclusively on intervals for mining the feature representation of a program, furthermore, GINN operates on a hierarchy of intervals for scaling the learning to large graphs. We evaluate GINN for two popular downstream applications: variable misuse prediction and method name prediction. Results show in both cases GINN outperforms the state-of-the-art models by a comfortable margin. We have also created a neural bug detector based on GINN to catch null pointer deference bugs in Java code. While learning from the same 9,000 methods extracted from 64 projects, GINN-based bug detector significantly outperforms GNN-based bug detector on 13 unseen test projects. Next, we deploy our trained GINN-based bug detector and Facebook Infer to scan the codebase of 20 highly starred projects on GitHub. Through our manual inspection, we confirm 38 bugs out of 102 warnings raised by GINN-based bug detector compared to 34 bugs out of 129 warnings for Facebook Infer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源