扩展的多语言抗议新闻检测 - 共享任务1，案例2021和2022

论文标题

扩展的多语言抗议新闻检测 - 共享任务1，案例2021和2022

Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022

论文作者

Hürriyetoğlu, Ali, Mutlu, Osman, Duruşan, Fırat, Uca, Onur, Gürel, Alaeddin Selçuk, Radford, Benjamin, Dai, Yaoyao, Hettiarachchi, Hansi, Stoehr, Niklas, Nomoto, Tadashi, Slavcheva, Milena, Vargas, Francielle, Javid, Aaqib, Beyhan, Fatih, Yörük, Erdem

论文摘要

我们报告了2022年案件的结果共享任务1关于多语言抗议事件检测。此任务是案例2021的延续，由四个子任务组成，即i）文档分类，ii）句子分类，iii）事件句子核心识别，iv）事件提取。案例2022扩展名包括使用以前可用的语言（即英语，印地语，葡萄牙语和西班牙语）扩展测试数据，并在子任务1，文档分类中添加新的测试数据。使用了2021年案例，英语，葡萄牙语和西班牙语的培训数据。因此，预测印地语，普通话，土耳其语和乌尔都语中的文档标签发生在零拍设置中。案件2022研讨会接受有关用于预测案例2021测试数据的系统的报告。我们观察到，案例2022参与者提交的最佳系统在零拍设置中实现了新语言的79.71至84.06 f1-Macro。获胜方法主要是结合模型和以多种语言合并数据。案例2021年数据的最佳提交的表现优于去年的子任务1和所有语言中的子任务2。在案例2021：子任务3葡萄牙语\＆subtask 4英语中，只有以下方案不超过新的提交。

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese \& Subtask 4 English.

下载PDF全文

下载文献需遵守相关版权规定

论文标题