论文标题

Kaggle比赛:粤语视听语音识别车内命令

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

论文作者

Dai, Wenliang, Cahyawijaya, Samuel, Yu, Tiezheng, Barezi, Elham J, Fung, Pascale

论文摘要

随着深度学习和智能车辆的兴起,智能助手已成为促进驾驶和提供额外功能的重要车内组件。车内智能助手应能够处理一般以及与汽车相关的命令并执行相应的操作,从而简化驾驶并提高安全性。但是,在这个研究领域,大多数数据集都采用主要语言,例如英语和中文。对于低资源语言,存在一个巨大的数据稀缺问题,阻碍了对更广泛社区的研究和应用的发展。因此,至关重要的是,拥有更多的基准来提高认识并激发低资源语言的研究。为了减轻此问题,我们收集了一个新的数据集,即广东话音频 - 视听语音识别(CI-AVSR),以使用视频和音频数据在广东话中进行car式语音识别。与此同时,我们提出了广东话音频的语音识别在车内命令中,这是社区在车内场景下应对低资源语音识别的新挑战。

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, in this research field, most datasets are in major languages, such as English and Chinese. There is a huge data scarcity issue for low-resource languages, hindering the development of research and applications for broader communities. Therefore, it is crucial to have more benchmarks to raise awareness and motivate the research in low-resource languages. To mitigate this problem, we collect a new dataset, namely Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car speech recognition in the Cantonese language with video and audio data. Together with it, we propose Cantonese Audio-Visual Speech Recognition for In-car Commands as a new challenge for the community to tackle low-resource speech recognition under in-car scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源