论文标题
带有面膜神经元覆盖的NLP模型的白盒测试
White-box Testing of NLP models with Mask Neuron Coverage
论文作者
论文摘要
最近的文献对使用诸如清单等黑盒策略进行测试NLP模型的行为的兴趣越来越大。对白盒测试的研究开发了许多方法,用于评估深层模型的内部行为的彻底测试,但它们不适用于NLP模型。我们建议一组针对基于变压器的NLP模型定制的白色盒子测试方法。其中包括掩模神经元覆盖范围(MNCOVER),这些覆盖范围可以测量测试过程中模型中的注意力层的彻底层次。我们表明,MNCOVE可以通过大小减小清单生成的测试套件,平均减少60 \%,同时保留测试失败的测试 - 从而集中了测试套件的故障检测能力。此外,我们还展示了如何使用MNCOVE来指导清单输入生成,评估替代NLP测试方法并驱动数据增强以提高准确性。
Recent literature has seen growing interest in using black-box strategies like CheckList for testing the behavior of NLP models. Research on white-box testing has developed a number of methods for evaluating how thoroughly the internal behavior of deep models is tested, but they are not applicable to NLP models. We propose a set of white-box testing methods that are customized for transformer-based NLP models. These include Mask Neuron Coverage (MNCOVER) that measures how thoroughly the attention layers in models are exercised during testing. We show that MNCOVER can refine testing suites generated by CheckList by substantially reduce them in size, for more than 60\% on average, while retaining failing tests -- thereby concentrating the fault detection power of the test suite. Further we show how MNCOVER can be used to guide CheckList input generation, evaluate alternative NLP testing methods, and drive data augmentation to improve accuracy.