带有网格的TC-sknet，用于低复杂的声学场景

论文标题

带有网格的TC-sknet，用于低复杂的声学场景

TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene

论文作者

Xie, Luyuan, Zhong, Yan, Yang, Lin, Yan, Zhaoyu, Wu, Zhonghai, Wang, Junjie

论文摘要

卷积神经网络（CNN）在低复杂性分类任务（例如声学场景分类（ASC））中具有良好的性能。但是，关于目标语音长度与卷积内核大小之间的关系的研究很少。在本文中，我们将选择性内核网络与颞卷卷积（TC-sknet）相结合，以调整卷积内核的接受场，以解决目标语音长度的问题，同时保持低复杂性。 GridMask是通过掩盖一部分原始数据或功能领域的数据增强策略。它可以增强模型的概括作为辍学的作用。在我们的实验中，GridMask带来的性能增益比ASC中的频谱增强更强。最后，我们采用汽车来搜索TC-sknet和GridMask的超参数的最佳结构，以改善分类性能。结果，峰精度为59.87％的Tc-snnet等效于SOTA，但参数仅使用20.9 k。

Convolution neural networks (CNNs) have good performance in low-complexity classification tasks such as acoustic scene classifications (ASCs). However, there are few studies on the relationship between the length of target speech and the size of the convolution kernels. In this paper, we combine Selective Kernel Network with Temporal-Convolution (TC-SKNet) to adjust the receptive field of convolution kernels to solve the problem of variable length of target voice while keeping low-complexity. GridMask is a data augmentation strategy by masking part of the raw data or feature area. It can enhance the generalization of the model as the role of dropout. In our experiments, the performance gain brought by GridMask is stronger than spectrum augmentation in ASCs. Finally, we adopt AutoML to search best structure of TC-SKNet and hyperparameters of GridMask for improving the classification performance. As a result, a peak accuracy of 59.87% TC-SKNet is equivalent to that of SOTA, but the parameters only use 20.9 K.

下载PDF全文

下载文献需遵守相关版权规定

论文标题