论文标题
滤清器的滤清器学习噪声小脚印关键字发现
Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting
论文作者
论文摘要
在关键字发现(KWS)的背景下,通过可学习的功能更换手工的语音功能并未产生出色的KWS性能。在这项研究中,我们证明,每当过滤库通道的数量严重减少时,FilterBank学习的表现都优于KWS手工制作的语音特征。减少渠道的数量可能会导致某些KW的性能下降,但也大大减少了能源消耗,这是在低资源设备上部署常见的始终kws时的关键。 Google语音命令数据集的嘈杂版本的实验结果表明,FilterBank的学习适应噪声特征,以提供更高程度的噪声鲁棒性,尤其是当集成掉落时。因此,从典型使用的40通道日志特征转换为8通道学到的功能导致相对KWS的精度损失仅为3.5%,同时同时实现了6.3倍的能源消耗降低。
In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3x energy consumption reduction.