代码合规性评估作为学习问题

论文标题

代码合规性评估作为学习问题

Code Compliance Assessment as a Learning Problem

论文作者

Sawant, Neela, Sengamedu, Srinivasan H.

论文摘要

手动代码审查和静态代码分析仪是验证源代码是否符合编码策略的传统机制。但是，这些机制很难扩展。我们将代码合规性评估作为机器学习（ML）问题，以作为输入自然语言政策和代码，并对代码的合规性，不符合性或无关紧要产生预测。这可以帮助扩展合规性分类并搜索不受传统机制涵盖的政策。我们探讨了有关ML模型公式，培训数据和评估设置的关键研究问题。核心思想是获得一个联合代码嵌入式空间，该空间通过代码和策略嵌入的向量距离保持合规性关系。由于没有特定于任务的数据，我们将重新解释和过滤常用的软件数据集，并具有其他预训练和预先调查的任务，以减少语义差距。我们在两种编码策略清单（CWE和CBP）上进行了基准测试。这是零射的评估，因为培训集中没有任何政策。在CWE和CBP上，我们的工具策略2代表分类精度（59％，71％）和搜索MRR（0.05，0.21）与Codebert相比，分类精度为（37％，54％）和MRR（0.02，0.02）。在一项用户研究中，接受24％的策略检测，而Codebert则接受了7％。

Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning tasks that reduce the semantic gap. We benchmarked our approach on two listings of coding policies (CWE and CBP). This is a zero-shot evaluation as none of the policies occur in the training set. On CWE and CBP respectively, our tool Policy2Code achieves classification accuracies of (59%, 71%) and search MRR of (0.05, 0.21) compared to CodeBERT with classification accuracies of (37%, 54%) and MRR of (0.02, 0.02). In a user study, 24% Policy2Code detections were accepted compared to 7% for CodeBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题