Python代码生成通过提出澄清问题

论文标题

Python代码生成通过提出澄清问题

Python Code Generation by Asking Clarification Questions

论文作者

Li, Haau-Sing, Mesgar, Mohsen, Martins, André F. T., Gurevych, Iryna

论文摘要

从文本中生成代码需要从自然语言描述中了解用户的意图，并生成满足此意图的可执行代码段。尽管最近验证的语言模型在此任务中表现出了出色的性能，但当给定的自然语言描述不明显时，这些模型失败了。在这项工作中，我们为这项任务介绍了一种小说，更现实的设置。我们假设可以通过提出澄清问题来解决自然语言描述的规格不足。因此，我们收集并介绍了一个名为CodeClarqa的新数据集，其中包含一对自然语言描述和代码，并带有创建的合成澄清问题和答案。我们评估验证语言模型性能在代码生成上的经验结果表明，澄清导致更精确生成的代码，如所有评估指标中模型性能的实质性改善所示。除此之外，我们的任务和数据集向社区带来了新的挑战，包括何时和哪些澄清问题。我们的代码和数据集可在GitHub上找到。

Code generation from text requires understanding the user's intent from a natural language description and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. Therefore, we collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers. The empirical results of our evaluation of pretrained language model performance on code generation show that clarifications result in more precisely generated code, as shown by the substantial improvement of model performance in all evaluation metrics. Alongside this, our task and dataset introduce new challenges to the community, including when and what clarification questions should be asked. Our code and dataset are available on GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题