论文标题

多语言参数挖掘:数据集和分析

Multilingual Argument Mining: Datasets and Analysis

论文作者

Toledo-Ronen, Orith, Orbach, Matan, Bilu, Yonatan, Spector, Artem, Slonim, Noam

论文摘要

对论证挖掘和计算论证的兴趣日益增加,它带来了多种自然语言理解(NLU)任务和相应的数据集。但是,与许多其他NLU任务一样,主要的语言是英语,其他语言中的资源很少,而且彼此之间很远。在这项工作中,我们探讨了使用多语言BERT模型转移学习的潜力,以基于英语数据集和使用机器翻译的使用,以非英语语言来解决参数挖掘任务。我们表明,这种方法非常适合对论点的立场进行分类和检测证据,但对于评估论点的质量而言,这种方法较少,大概是因为质量很难保留翻译。此外,着眼于翻译训练方法,我们展示了语言的选择以及它们之间的关系的选择如何影响所得模型的准确性。最后,为了促进对参数挖掘任务的转移学习的评估,我们提供了一个人类生成的数据集,其中包含多种语言的10K参数以及英语数据集的机器翻译。

The growing interest in argument mining and computational argumentation brings with it a plethora of Natural Language Understanding (NLU) tasks and corresponding datasets. However, as with many other NLU tasks, the dominant language is English, with resources in other languages being few and far between. In this work, we explore the potential of transfer learning using the multilingual BERT model to address argument mining tasks in non-English languages, based on English datasets and the use of machine translation. We show that such methods are well suited for classifying the stance of arguments and detecting evidence, but less so for assessing the quality of arguments, presumably because quality is harder to preserve under translation. In addition, focusing on the translate-train approach, we show how the choice of languages for translation, and the relations among them, affect the accuracy of the resultant model. Finally, to facilitate evaluation of transfer learning on argument mining tasks, we provide a human-generated dataset with more than 10k arguments in multiple languages, as well as machine translation of the English datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源