论文标题
深度学习模型压缩技术的经验评估
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
论文作者
论文摘要
Wavenet是一种最先进的文本到语音演讲器,由于其自回旋循环,其部署仍然具有挑战性。在这项工作中,我们专注于直接加速原始Wavenet体系结构的方法,而不是修改体系结构,以便可以将模型作为可扩展的文本到语音系统的一部分部署。我们调查了各种模型压缩技术,这些技术可在一系列硬件平台上部署。特别是,我们比较了不同的模型稀疏方法和水平,以及七个广泛使用的精确度作为量化目标。与密集的单精度浮点基线相比,音频保真度的压缩率最高为13.84,并且能够实现最多13.84的模型。所有技术均使用现有的开源深度学习框架和库来实施,以鼓励其更广泛的采用。
WaveNet is a state-of-the-art text-to-speech vocoder that remains challenging to deploy due to its autoregressive loop. In this work we focus on ways to accelerate the original WaveNet architecture directly, as opposed to modifying the architecture, such that the model can be deployed as part of a scalable text-to-speech system. We survey a wide variety of model compression techniques that are amenable to deployment on a range of hardware platforms. In particular, we compare different model sparsity methods and levels, and seven widely used precisions as targets for quantization; and are able to achieve models with a compression ratio of up to 13.84 without loss in audio fidelity compared to a dense, single-precision floating-point baseline. All techniques are implemented using existing open source deep learning frameworks and libraries to encourage their wider adoption.