通过图神经网络改进了代码摘要

论文标题

通过图神经网络改进了代码摘要

Improved Code Summarization via a Graph Neural Network

论文作者

LeClair, Alexander, Haque, Sakib, Wu, Lingfei, McMillan, Collin

论文摘要

自动源代码摘要是为源代码生成自然语言描述的任务。自动代码摘要是一个快速扩展的研究领域，尤其是当社区在神经网络和AI技术方面的进步更加优势时。通常，源代码摘要技术使用源代码作为输入并输出自然语言描述。然而，强烈的共识正在发展，将结构信息用作输入会导致性能的提高。使用结构信息的第一个方法将AST扁平化为序列。最近，基于随机AST路径或图形神经网络的更复杂的方法已使用扁平的AST改进了模型。但是，文献仍然没有将图形神经网络与源代码序列一起描述为模型的单独输入。因此，在本文中，我们提出了一种使用基于图的神经体系结构的方法，该方法可以更好地匹配AST的默认结构来生成这些摘要。我们使用210万个Java方法征服对的数据集评估了我们的技术，并显示了四个基线技术的改进，其中两种来自软件工程文献，两部分来自机器学习文献。

Automatic source code summarization is the task of generating natural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as the community has taken greater advantage of advances in neural network and AI technologies. In general, source code summarization techniques use the source code as input and outputs a natural language description. Yet a strong consensus is developing that using structural information as input leads to improved performance. The first approaches to use structural information flattened the AST into a sequence. Recently, more complex approaches based on random AST paths or graph neural networks have improved on the models using flattened ASTs. However, the literature still does not describe the using a graph neural network together with source code sequence as separate inputs to a model. Therefore, in this paper, we present an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries. We evaluate our technique using a data set of 2.1 million Java method-comment pairs and show improvement over four baseline techniques, two from the software engineering literature, and two from machine learning literature.

下载PDF全文

下载文献需遵守相关版权规定

论文标题