深度学习的广义操作程序：无约束的最佳设计观点

论文标题

深度学习的广义操作程序：无约束的最佳设计观点

Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

论文作者

Chen, Shen, Zhang, Mingwei, Cui, Jiamin, Yao, Wei

论文摘要

深度学习（DL）在处理图像，视频和语音方面带来了显着突破，因为它在提取高度抽象的表示和学习非常复杂的功能方面有效。但是，很少有关于如何用于实际用例的操作程序。在本文中，我们打算通过从无约束的最佳设计的角度提出DL的广义操作程序来解决此问题，这是由简单的旨在消除使用DL的障碍的动机，尤其是对于那些新的，但渴望使用它的科学家或工程师。我们提出的过程包含七个步骤，分别是项目/问题语句，数据收集，架构设计，参数初始化，定义损耗函数，计算最佳参数和推理。按照此过程，我们构建了一个多流端对扬声器验证系统，其中输入语音发音是由不同频率范围内的多个平行流处理的，因此由于功能的多样性，声音建模可以更强大。经过Voxceleb数据集培训，我们的实验结果验证了我们提出的操作程序的有效性，还表明我们的多流框架的表现优于单流基线，最低决策成本功能相对降低了20％（MIDCF）。

Deep learning (DL) has brought about remarkable breakthrough in processing images, video and speech due to its efficacy in extracting highly abstract representation and learning very complex functions. However, there is seldom operating procedure reported on how to make it for real use cases. In this paper, we intend to address this problem by presenting a generalized operating procedure for DL from the perspective of unconstrained optimal design, which is motivated by a simple intension to remove the barrier of using DL, especially for those scientists or engineers who are new but eager to use it. Our proposed procedure contains seven steps, which are project/problem statement, data collection, architecture design, initialization of parameters, defining loss function, computing optimal parameters, and inference, respectively. Following this procedure, we build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams within different frequency range, so that the acoustic modeling can be more robust resulting from the diversity of features. Trained with VoxCeleb dataset, our experimental results verify the effectiveness of our proposed operating procedure, and also show that our multi-stream framework outperforms single-stream baseline with 20 % relative reduction in minimum decision cost function (minDCF).

下载PDF全文

下载文献需遵守相关版权规定

论文标题