论文标题
I-矢量空间中的环境特征表示,用于房间验证和元数据估计
An Environmental Feature Representation in I-vector Space for Room Verification and Metadata Estimation
论文作者
论文摘要
本文研究了环境特征表示在房间验证任务和声学元数据估计中的应用。录音既包含扬声器和非说话者信息。我们将非扬声器相关的信息(包括渠道和其他环境因素)称为电子媒介。 I-向量是在说话者识别中使用的,在总变异空间中提取,并在没有歧视的情况下捕获说话者和通道环境信息。因此,可以使用线性判别分析等方法从I-向量中提取电子媒介。在本文中,我们首先证明可以将电子矢量成功应用于较低的误差率的房间验证任务。其次,我们提出了两种方法,用于估算这些电子媒介器的元数据信息 - 信噪比(SNR)和混响(T60)。在将我们的系统与当代全球SNR估计方法进行比较时,就准确性而言,即使在较低的I-I-向量方面,我们也表现出色。最后,我们表明,如果通过提取的元数据信息增加电子矢量,则可以改进房间验证任务。
This paper investigates the application of environmental feature representations for room verification tasks and acoustic meta-data estimation. Audio recordings contain both speaker and non-speaker information. We refer to the non-speaker-related information, including channel and other environmental factors, as e-vectors. I-vectors, commonly used in speaker identification, are extracted in the total variability space and capture both speaker and channel-environment information without discrimination. Accordingly, e-vectors can be extracted from i-vectors using methods such as linear discriminant analysis. In this paper, we first demonstrate that e-vectors can be successfully applied to room verification tasks with a low equal error rate. Second, we propose two methods for estimating metadata information -- signal-to-noise (SNR) and reverberation (T60) -- from these e-vectors. When comparing our system to contemporary global SNR estimation methods, in terms of accuracy, we perform favorably even with low dimensional i-vectors. Lastly, we show that room verification tasks can be improved if e-vectors are augmented with the extracted metadata information.