论文标题
临时声学传感器网络的会议转录系统
A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network
论文作者
论文摘要
我们提出了一个系统,该系统会抄录典型会议场景的对话,该方案是由一组最初不同步的麦克风阵列捕获的,处于未知位置。它由信号同步的子系统组成,包括采样率和采样时间偏移估计,基于说话者和麦克风阵列位置估计,多通道语音增强和自动语音识别的诊断。有了估计的诊断信息,初始化了空间混合模型,该模型用于估计源分离的波束形式系数。模拟表明,与单个紧凑型麦克风阵列相比,可以通过同步和组合多个分布式麦克风阵列来提高语音识别精度。此外,提出的空间混合模型的知情初始化在随机初始化方面具有明显的性能优势。
We propose a system that transcribes the conversation of a typical meeting scenario that is captured by a set of initially unsynchronized microphone arrays at unknown positions. It consists of subsystems for signal synchronization, including both sampling rate and sampling time offset estimation, diarization based on speaker and microphone array position estimation, multi-channel speech enhancement, and automatic speech recognition. With the estimated diarization information, a spatial mixture model is initialized that is used to estimate beamformer coefficients for source separation. Simulations show that the speech recognition accuracy can be improved by synchronizing and combining multiple distributed microphone arrays compared to a single compact microphone array. Furthermore, the proposed informed initialization of the spatial mixture model delivers a clear performance advantage over random initialization.