具有多语言封闭式模型的细粒语言识别

论文标题

具有多语言封闭式模型的细粒语言识别

Fine-grained Language Identification with Multilingual CapsNet Model

论文作者

Verma, Mudit, Buduru, Arun Balaji

论文摘要

由于全球互联网服务质量的急剧提高，多语言内容的产生和消费都会爆炸。这在多语言受众群体的国家中尤其普遍，他们越来越多地在语言熟悉度/偏好之外消耗媒体。因此，越来越需要实时和细粒度分析服务，包括语言识别，内容转录和分析。准确且细粒度的口语检测是所有后续内容分析算法的重要第一步。在以下方面之一，口语检测中的当前技术可能缺乏：准确性，细粒度检测，数据要求，数据收集\和预处理中的手动努力。因此，在这项工作中，一种实时语言检测方法，可从5秒钟的音频剪辑中检测口语，准确性为91.8 \％，并具有紧急数据要求和最少的预处理。提出了针对胶囊网络的新型体系结构，该结构在提供的音频片段的频谱图上运行。我们使用基于复发性神经网络和iVector的先前方法来提出结果。最后，我们展示了``非级别''分析，以进一步强调CAPSNET架构为何用于盖子任务。

Due to a drastic improvement in the quality of internet services worldwide, there is an explosion of multilingual content generation and consumption. This is especially prevalent in countries with large multilingual audience, who are increasingly consuming media outside their linguistic familiarity/preference. Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis. Accurate and fine-grained spoken language detection is an essential first step for all the subsequent content analysis algorithms. Current techniques in spoken language detection may lack on one of these fronts: accuracy, fine-grained detection, data requirements, manual effort in data collection \& pre-processing. Hence in this work, a real-time language detection approach to detect spoken language from 5 seconds' audio clips with an accuracy of 91.8\% is presented with exiguous data requirements and minimal pre-processing. Novel architectures for Capsule Networks is proposed which operates on spectrogram images of the provided audio snippets. We use previous approaches based on Recurrent Neural Networks and iVectors to present the results. Finally we show a ``Non-Class'' analysis to further stress on why CapsNet architecture works for LID task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题