论文标题
有效网络吸收零,用于连续的语音关键字斑点
EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting
论文作者
论文摘要
关键字斑点是在计算机记录的语音中查找一些特定单词或短语的过程。深度神经网络算法作为强大的引擎,如果通过适当的数据集进行了培训,可以处理此问题。为此,通过众包收集了足球关键字数据集(FKD)作为波斯语中的新关键字发现数据集。该数据集在18个类中包含近31000个样本。提出的连续语音合成方法使FKD在与连续语音的实际应用中可用。此外,我们通过在EdgitionNet-B0上应用复合缩放方法来进行关键字点斑点任务,提出了一种称为EdgitionNet-A0(绝对零)的轻质体系结构。最后,通过各种模型评估了所提出的体系结构。已经意识到,EdgitionNet-A0和Resnet模型在此数据集上的其他模型都优于其他模型。
Keyword spotting is a process of finding some specific words or phrases in recorded speeches by computers. Deep neural network algorithms, as a powerful engine, can handle this problem if they are trained over an appropriate dataset. To this end, the football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes. The continuous speech synthesis method proposed to made FKD usable in the practical application which works with continuous speeches. Besides, we proposed a lightweight architecture called EfficientNet-A0 (absolute zero) by applying the compound scaling method on EfficientNet-B0 for keyword spotting task. Finally, the proposed architecture is evaluated with various models. It is realized that EfficientNet-A0 and Resnet models outperform other models on this dataset.