论文标题
通过扩展的卷积网络与门控机构的密集连接使用扩张的卷积网络跟踪
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
论文作者
论文摘要
共振剂跟踪是语音处理中最根本的问题之一。传统上,使用信号处理方法估算实扣。最近的研究表明,通用的卷积体系结构可以超越诸如语音综合和机器翻译等时间任务的复发网络。在本文中,我们探讨了时间卷积网络(TCN)用于共振剂跟踪。除了传统的实施外,我们还从三个方面修改了体系结构。首先,我们关闭了扩张卷积的“因果”模式,使扩张的卷积会看到未来的语音框架。其次,每个隐藏的图层通过密集连接从所有以前的层中重复了输出信息。第三,我们还采用了一种门控机制来通过有选择地忘记不重要的信息来减轻梯度消失的问题。该模型已在“开放访问共振”数据库VTR上进行了验证。该实验表明,我们提出的模型易于收敛,并在言语标记的框架上达到了8.2%的总体绝对百分比误差(MAPE),而三个竞争性基线为9.4%(LSTM),9.1%(BI-LSTM)(BI-LSTM)和8.9%(TCN)。
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN).