论文标题
计算2021共享任务:代码切换数据的机器翻译
CALCS 2021 Shared Task: Machine Translation for Code-Switched Data
论文作者
论文摘要
迄今为止,代码开关文献的努力大部分都集中在语言识别,pos,ner和句法解析上。在本文中,我们解决了机器翻译以获取代码转换的社交媒体数据。我们创建社区共享任务。我们提供了两种参与方式:监督和无监督。对于有监督的环境,参与者面临将英语翻译成一个方向的印度英语(英语)。对于无监督的环境,我们提供以下语言对:英语和西班牙语(英语)以及两个方向上的英语和现代标准阿拉伯语阿拉伯语(Eng-Msaea)。我们在策划“进入”代码转换语言评估数据方面分享了见解和挑战。此外,我们在共享任务中为所有语言对提供基准。共享任务的排行榜包括12个与5个不同团队相对应的单个系统提交。获得的最好的表现是英语至Hinglish的BLEU得分为12.67%,MSAEA到英语的BLEU得分为25.72%。
To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions. We share insights and challenges in curating the "into" code-switching language evaluation data. Further, we provide baselines for all language pairs in the shared task. The leaderboard for the shared task comprises 12 individual system submissions corresponding to 5 different teams. The best performance achieved is 12.67% BLEU score for English to Hinglish and 25.72% BLEU score for MSAEA to English.