论文标题

深度稳定:对不稳定的数值方法及其解决方案的研究

DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning

论文作者

Kloberdanz, E., Kloberdanz, K. G., Le, W.

论文摘要

深度学习(DL)已成为解决各种重要问题的解决方案的组成部分,这就是为什么确保DL系统质量至关重要的原因。达到DL软件的可靠性和鲁棒性的挑战之一是确保算法实现在数值上是稳定的。 DL算法需要大量和多种数值计算。数值计算的幼稚实现可能导致错误可能导致不正确或不准确的学习和结果。数值算法或数学公式可以具有数学等效的几种实现,但具有不同的数值稳定性。设计具有数值稳定的算法实现是具有挑战性的,因为它需要对软件工程,DL和数值分析的跨学科知识。在本文中,我们研究了两个成熟的DL库Pytorch和Tensorflow,目的是识别不稳定的数值方法及其解决方案。具体而言,我们研究了哪种DL算法在数值上是不稳定的,并对根本原因,表现和斑块进行了深入分析。基于这些发现,我们启动了DL中数值问题和解决方案的第一个数据库。我们的发现并为开发人员和工具构建者提供未来的参考,以防止,检测,本地化和修复数值不稳定的算法实现。为了证明,使用{\ it teepstorition}我们在Tensorflow中找到了数值稳定性问题,并提交了已接受并合并的修复程序。

Deep learning (DL) has become an integral part of solutions to various important problems, which is why ensuring the quality of DL systems is essential. One of the challenges of achieving reliability and robustness of DL software is to ensure that algorithm implementations are numerically stable. DL algorithms require a large amount and a wide variety of numerical computations. A naive implementation of numerical computation can lead to errors that may result in incorrect or inaccurate learning and results. A numerical algorithm or a mathematical formula can have several implementations that are mathematically equivalent, but have different numerical stability properties. Designing numerically stable algorithm implementations is challenging, because it requires an interdisciplinary knowledge of software engineering, DL, and numerical analysis. In this paper, we study two mature DL libraries PyTorch and Tensorflow with the goal of identifying unstable numerical methods and their solutions. Specifically, we investigate which DL algorithms are numerically unstable and conduct an in-depth analysis of the root cause, manifestation, and patches to numerical instabilities. Based on these findings, we launch, the first database of numerical stability issues and solutions in DL. Our findings and provide future references to developers and tool builders to prevent, detect, localize and fix numerically unstable algorithm implementations. To demonstrate that, using {\it DeepStability} we have located numerical stability issues in Tensorflow, and submitted a fix which has been accepted and merged in.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源