使用CNN准确检测唤醒单词的启动和结束

论文标题

使用CNN准确检测唤醒单词的启动和结束

Accurate Detection of Wake Word Start and End Using a CNN

论文作者

Jose, Christin, Mishchenko, Yuriy, Senechal, Thibaud, Shah, Anish, Escott, Alex, Vitaladevuni, Shiv

论文摘要

嵌入式嵌入式设备的小占地面积需要较小的模型大小和检测潜伏期的关键字点（kWs），以启用语音助手。这样的关键字通常被称为\ textit {唤醒字}，因为它用于唤醒启用语音助手的设备。与唤醒单词检测一起，对尾流终点的准确估计（启动和结尾）是KWS的重要任务。在本文中，我们提出了两种新方法，用于检测使用单阶段单词级神经网络的神经kw中的尾关。我们的结果表明，新技术具有较高的准确性，可与传统的声学模型以及HMM强制对齐相当，可检测尾流词的终点高达50毫秒标准误差与人类注释。据我们所知，这是对单阶段神经KWS的Wake Word端点检测方法的第一次研究。

Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods for detecting the endpoints of wake words in neural KWS that use single-stage word-level neural networks. Our results show that the new techniques give superior accuracy for detecting wake words' endpoints of up to 50 msec standard error versus human annotations, on par with the conventional Acoustic Model plus HMM forced alignment. To our knowledge, this is the first study of wake word endpoints detection methods for single-stage neural KWS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题