从机器学习角度来看印度语言识别的概述

论文标题

从机器学习角度来看印度语言识别的概述

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

论文作者

Dey, Spandan, Sahidullah, Md, Saha, Goutam

论文摘要

自动口语识别（LID）是多语言基于语音命令的人类计算机（HCI）时代非常重要的研究领域。前端盖模块有助于提高多语言方案中许多基于语音的应用程序的性能。印度是一个拥有多种文化和语言的人口的国家。大多数印度人口都需要使用各自的母语进行与机器的口头互动。因此，有效的印度口语识别系统的开发对于在印度社会的每个领域都适应智能技术很有用。在过去的二十年中，印度盖子领域已经开始增强动力，这主要是由于开发了印度语言的几种标准多语言语音语料库。尽管据我们所知，在这一领域已经取得了重大的研究进展，但在分析上并没有太多尝试集体审查它们的尝试。在这项工作中，我们进行了最早的尝试，以对印度口语识别研究领域进行全面审查。已经提出了深入的分析，以强调在印度背景下对开发盖子系统开发的低资源和相互影响的独特挑战。印度盖子研究的几个基本方面，例如对可用语音语料库的详细描述，主要的研究贡献，包括基于统计建模的较早尝试，基于不同的神经网络体系结构的最新方法，并讨论了未来的研究趋势。这项审查工作将有助于评估任何活跃研究人员或来自相关领域的任何研究爱好者的当前印度盖子研究状态。

Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.

下载PDF全文

下载文献需遵守相关版权规定

论文标题