论文标题
重建无约束语音的动态方向
Reconstructing the Dynamic Directivity of Unconstrained Speech
论文作者
论文摘要
本文介绍了一种估计和重建自然语音的空间能量分布模式的方法,这对于在虚拟通信环境中实现现实的声音至关重要。该方法包括两个阶段。首先,使用真实的静态麦克风阵列捕获的语音录音用于创建以自我为中心的虚拟阵列,该数组跟踪扬声器随着时间的流动。该虚拟阵列用于测量和编码语音信号的高分辨率方向性模式,因为它随着自然语音和运动而动态演变。在第二阶段,使用编码的方向性表示形式来训练一个机器学习模型,该模型可以估算有限的语音信号,例如使用麦克风在头部安装的显示器上录制的语音信号。我们的结果表明,神经网络可以准确地估计有限信息的自然,无约束语音的完整方向性模式。估计和重建自然语音的空间能量分布模式的拟议方法,以及对各种机器学习模型和培训范式的评估,为在虚拟通信环境中逼真的声音存在的发展提供了重要的贡献。
This article presents a method for estimating and reconstructing the spatial energy distribution pattern of natural speech, which is crucial for achieving realistic vocal presence in virtual communication settings. The method comprises two stages. First, recordings of speech captured by a real, static microphone array are used to create an egocentric virtual array that tracks the movement of the speaker over time. This virtual array is used to measure and encode the high-resolution directivity pattern of the speech signal as it evolves dynamically with natural speech and movement. In the second stage, the encoded directivity representation is utilized to train a machine learning model that can estimate the full, dynamic directivity pattern given a limited set of speech signals, such as those recorded using the microphones on a head-mounted display. Our results show that neural networks can accurately estimate the full directivity pattern of natural, unconstrained speech from limited information. The proposed method for estimating and reconstructing the spatial energy distribution pattern of natural speech, along with the evaluation of various machine learning models and training paradigms, provides an important contribution to the development of realistic vocal presence in virtual communication settings.