论文标题
使用卷积神经网络和深度声学嵌入的ACII 2022 A-VB型竞争的声音爆发分类
Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Neural Networks and Deep Acoustic Embeddings
论文作者
论文摘要
该报告简要说明了我们针对ACII 2022情感声音爆发(A-VB)竞争的人声爆发类型分类任务的建议解决方案。我们尝试了两种方法,作为解决方案的一部分。其中第一个基于在MEL频谱图上训练的卷积神经网络,第二个基于预验证的WAV2VEC2模型的深声嵌入的平均池基于。我们的最佳性能模型的测试分区的未加权平均召回(UAR)为0.5190,而机会级UAR为0.1250,基线为0.4172。因此,比挑战基线提高了约20%。本文档中报告的结果证明了我们提出的解决AV-B类型分类任务的方法的功效。
This report provides a brief description of our proposed solution for the Vocal Burst Type classification task of the ACII 2022 Affective Vocal Bursts (A-VB) Competition. We experimented with two approaches as part of our solution for the task at hand. The first of which is based on convolutional neural networks trained on Mel Spectrograms, and the second is based on average pooling of deep acoustic embeddings from a pretrained wav2vec2 model. Our best performing model achieves an unweighted average recall (UAR) of 0.5190 for the test partition, compared to the chance-level UAR of 0.1250 and a baseline of 0.4172. Thus, an improvement of around 20% over the challenge baseline. The results reported in this document demonstrate the efficacy of our proposed approaches to solve the AV-B Type Classification task.