论文标题
环境声音分类的多渠道时间关注卷积神经网络模型
A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification
论文作者
论文摘要
最近,许多基于注意力的深度神经网络在环境声音分类中出现并取得了最先进的表现。注意机制的本质是在特征的不同部分,即通道,光谱或空间内容和时间框架上分配贡献权重。在本文中,我们提出了一种有效的卷积神经网络结构,具有多通道时间注意(MCTA)块,该块在嵌入式特征的每个通道内应用了一个时间注意机制来提取智能通道相关的时间信息。这种多通道的时间注意力结构将为每个通道提供独特的注意向量,这使网络能够在不同渠道中充分利用相关的时间信息。用于测试我们的模型的数据集包括ESC-50及其子集ESC-10,以及Dcase 2018和2019的开发集。在我们的实验中,MCTA的性能优于单渠道时间关注模型和具有相同数量参数的非注意模型。此外,我们将模型与一些成功的基于注意力的模型进行了比较,并通过相对较轻的网络获得了竞争结果。
Recently, many attention-based deep neural networks have emerged and achieved state-of-the-art performance in environmental sound classification. The essence of attention mechanism is assigning contribution weights on different parts of features, namely channels, spectral or spatial contents, and temporal frames. In this paper, we propose an effective convolutional neural network structure with a multi-channel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information. This multi-channel temporal attention structure will result in a distinct attention vector for each channel, which enables the network to fully exploit the relevant temporal information in different channels. The datasets used to test our model include ESC-50 and its subset ESC-10, along with development sets of DCASE 2018 and 2019. In our experiments, MCTA performed better than the single-channel temporal attention model and the non-attention model with the same number of parameters. Furthermore, we compared our model with some successful attention-based models and obtained competitive results with a relatively lighter network.