论文标题
ASR的SSCF衍生极坐标的初步研究
Preliminary Study on SSCF-derived Polar Coordinate for ASR
论文作者
论文摘要
过渡角的定义是为了描述光谱亚带中心的声学空间中的元音到元音转变,发现它们在说话者和说话速度之间相似。在本文中,我们建议研究对层坐标的使用,而偏向于角度来描述语音信号,以表征其声学轨迹并在自动语音识别中使用它们。根据在BRAF100数据集上评估的实验结果,极性坐标的准确性明显高于混合和跨性别语音识别的角度,表明这些表示在定义语音信号的声学轨迹方面表现出色。此外,当使用其一阶和二阶导数($δ$,$δ$δ$)时,精度得到了显着提高,尤其是在跨女性识别中。但是,结果表明,它们比常规的梅尔频曲线系数(MFCC)更不依赖性别无关。
The transition angles are defined to describe the vowel-to-vowel transitions in the acoustic space of the Spectral Subband Centroids, and the findings show that they are similar among speakers and speaking rates. In this paper, we propose to investigate the usage of polar coordinates in favor of angles to describe a speech signal by characterizing its acoustic trajectory and using them in Automatic Speech Recognition. According to the experimental results evaluated on the BRAF100 dataset, the polar coordinates achieved significantly higher accuracy than the angles in the mixed and cross-gender speech recognitions, demonstrating that these representations are superior at defining the acoustic trajectory of the speech signal. Furthermore, the accuracy was significantly improved when they were utilized with their first and second-order derivatives ($Δ$, $Δ$$Δ$), especially in cross-female recognition. However, the results showed they were not much more gender-independent than the conventional Mel-frequency Cepstral Coefficients (MFCCs).