论文标题
芬兰依赖解析的域外评估
Out-of-Domain Evaluation of Finnish Dependency Parsing
论文作者
论文摘要
学术界的普遍做法是评估通常从培训语料库中列出的内域评估数据上的模型性能。但是,在许多现实世界中,应用模型的数据可能与培训数据的特征有很大不同。在本文中,我们通过引入新颖的UD Finnish-ood Dormain Treebank来关注芬兰的跨域解析,其中包括五个非常不同的数据源(网络文档,临床,在线讨论,推文,诗歌和诗歌),以及在2,122个句子中,总共有19,382个语法单词在2,122个句子中发行的依恋依赖性依赖性框架。我们与新的树库一起,利用来自三个不同芬兰UD Treebanks(TDT,PUD,OOD)的可用部分级信息提供了广泛的域外解析评估。与以前现有的树库相比,新的芬兰 - 欧德(Finnish-ood)显示了对一般解析器更具挑战性的部分,创建了一个有趣的评估设置,并为在其培训领域之外应用解析器的人提供了宝贵的信息。
The prevailing practice in the academia is to evaluate the model performance on in-domain evaluation data typically set aside from the training corpus. However, in many real world applications the data on which the model is applied may very substantially differ from the characteristics of the training data. In this paper, we focus on Finnish out-of-domain parsing by introducing a novel UD Finnish-OOD out-of-domain treebank including five very distinct data sources (web documents, clinical, online discussions, tweets, and poetry), and a total of 19,382 syntactic words in 2,122 sentences released under the Universal Dependencies framework. Together with the new treebank, we present extensive out-of-domain parsing evaluation utilizing the available section-level information from three different Finnish UD treebanks (TDT, PUD, OOD). Compared to the previously existing treebanks, the new Finnish-OOD is shown include sections more challenging for the general parser, creating an interesting evaluation setting and yielding valuable information for those applying the parser outside of its training domain.