深元素频率的精细扬声器识别

论文标题

深元素频率的精细扬声器识别

Fine-grained Early Frequency Attention for Deep Speaker Recognition

论文作者

Hajavi, Amirhossein, Etemad, Ali

论文摘要

注意机制已成为重要的工具，可以通过允许其专注于学习嵌入的关键部分来提高深层模型的性能。但是，在扬声器识别任务中使用的当前注意力机制无法考虑深度网络使用的输入频谱表示中的频率箱等细粒度。为了解决这个问题，我们建议在野外识别说话者识别新颖的早期频率关注（FEFA）。一旦集成到一个深度神经网络中，我们提出的机制就通过从网络的早期层中获取查询并生成可学习的权重来处理输入频谱表示中与频率箱一样小的信息项来起作用。为了评估FEFA的性能，我们使用几个著名的深层模型作为骨干网络，并将注意力模块集成到其管道中。在Voxceleb1数据集上评估了这些网络（有和没有FEFA）的总体性能，当使用FEFA时，我们会观察到很大的改进。

Attention mechanisms have emerged as important tools that boost the performance of deep models by allowing them to focus on key parts of learned embeddings. However, current attention mechanisms used in speaker recognition tasks fail to consider fine-grained information items such as frequency bins in input spectral representations used by the deep networks. To address this issue, we propose the novel Fine-grained Early Frequency Attention (FEFA) for speaker recognition in-the-wild. Once integrated into a deep neural network, our proposed mechanism works by obtaining queries from early layers of the network and generating learnable weights to attend to information items as small as the frequency bins in the input spectral representations. To evaluate the performance of FEFA, we use several well-known deep models as backbone networks and integrate our attention module in their pipelines. The overall performance of these networks (with and without FEFA) are evaluated on the VoxCeleb1 dataset, where we observe considerable improvements when FEFA is used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题