论文标题
NNSmith:为深度学习编译器生成多样化和有效的测试用例
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers
论文作者
论文摘要
诸如TVM和Tensorrt之类的深度学习(DL)编译器越来越多地用于优化深神经网络(DNN)模型,以满足性能,资源利用和其他要求。这些编译器中的错误可能会导致其语义与原始版本不同的模型,从而产生不正确的结果,从而破坏了下游应用程序的正确性。但是,由于其复杂性,在这些编译器中找到错误是具有挑战性的。在这项工作中,我们提出了一种新的模糊测试方法,用于在深入学习的编译器中查找错误。我们的核心方法包括(i)生成多样化但有效的DNN测试模型,这些测试模型可以使用轻型操作员规范来行使编译器转换逻辑的很大一部分; (ii)执行基于梯度的搜索以查找模型输入,以避免在模型执行过程中避免任何浮点异常值,从而减少错过错误或错误警报的机会; (iii)使用差分测试识别错误。我们在NNSmith中实现了这种方法,该方法已经发现了72个新的错误,迄今为止,TVM,Tensorrt,OnxRuntime和Pytorch发现了这种方法。在这58位已得到证实,51个由其各自的项目维护者确定。
Deep-learning (DL) compilers such as TVM and TensorRT are increasingly being used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can result in models whose semantics differ from the original ones, producing incorrect results that corrupt the correctness of downstream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach consists of (i) generating diverse yet valid DNN test models that can exercise a large part of the compiler's transformation logic using light-weight operator specifications; (ii) performing gradient-based search to find model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) using differential testing to identify bugs. We implemented this approach in NNSmith which has found 72 new bugs for TVM, TensorRT, ONNXRuntime, and PyTorch to date. Of these 58 have been confirmed and 51 have been fixed by their respective project maintainers.