梵语衍生名词分析的基准语料库和神经方法

论文标题

梵语衍生名词分析的基准语料库和神经方法

A Benchmark Corpus and Neural Approach for Sanskrit Derivative Nouns Analysis

论文作者

Singh, Arun Kumar, Dave, Sushant, P., Prathosh A., Lall, Brejesh, Mehta, Shresth

论文摘要

本文介绍了梵文pratyaya（后缀）和拐点单词（PADAS）的第一个基准基准，该基础是由于后缀而形成的，以及基于神经网络的方法来处理拐点的形成和分裂。拐点词跨越主要和次级衍生名词作为当前工作的范围。 pratyayas是梵语文本形态分析的重要方面。梵语计算语言学工具用于处理和分析梵语文本。不幸的是，没有任何工作来标准化和验证这些工具专门用于衍生名词分析。在这项工作中，我们准备了一个名为Pratyaya-Kosh的梵语后缀基准基准语料库，以评估工具的性能。我们还介绍了自己的神经方法进行衍生名词分析，同时在最突出的梵语形态分析工具上评估了该方法。这个基准将是自由专门的，并向全球研究人员提供，我们希望它能激励所有人以梵语的语言改善形态学分析。

This paper presents first benchmark corpus of Sanskrit Pratyaya (suffix) and inflectional words (padas) formed due to suffixes along with neural network based approaches to process the formation and splitting of inflectional words. Inflectional words spans the primary and secondary derivative nouns as the scope of current work. Pratyayas are an important dimension of morphological analysis of Sanskrit texts. There have been Sanskrit Computational Linguistics tools for processing and analyzing Sanskrit texts. Unfortunately there has not been any work to standardize & validate these tools specifically for derivative nouns analysis. In this work, we prepared a Sanskrit suffix benchmark corpus called Pratyaya-Kosh to evaluate the performance of tools. We also present our own neural approach for derivative nouns analysis while evaluating the same on most prominent Sanskrit Morphological Analysis tools. This benchmark will be freely dedicated and available to researchers worldwide and we hope it will motivate all to improve morphological analysis in Sanskrit Language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题