论文标题
XOR QA:跨语义的开放回答问题回答
XOR QA: Cross-lingual Open-Retrieval Question Answering
论文作者
论文摘要
多语言问题回答任务通常假设答案与问题相同的语言存在。然而,在实践中,许多语言都面临着两种信息稀缺性 - 语言很少有参考文章和信息不对称性 - 在其中提出了来自其他文化的参考概念。这项工作将回答的开放性问题扩展到跨语性设置,从而使一种语言的问题通过另一种语言的答案内容回答。我们构建了一个基于Tydi Qa的问题而构建的大规模数据集,缺乏相同的语言答案。我们的任务配方称为跨语言公开检索问题回答(XOR QA),包括来自7种不同非英语语言的40k信息寻求问题。基于此数据集,我们介绍了三个新任务,涉及使用多语言和英语资源的跨语性文档检索。我们通过最先进的机器翻译系统和跨语性预审预周仔的模型来建立基准。实验结果表明,XOR QA是一项具有挑战性的任务,它将促进用于多语言问题回答的新技术的发展。我们的数据和代码可在https://nlp.cs.washington.edu/xorqa上找到。
Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity -- where languages have few reference articles -- and information asymmetry -- where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questions from one language to be answered via answer content from another language. We construct a large-scale dataset built on questions from TyDi QA lacking same-language answers. Our task formulation, called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k information-seeking questions from across 7 diverse non-English languages. Based on this dataset, we introduce three new tasks that involve cross-lingual document retrieval using multi-lingual and English resources. We establish baselines with state-of-the-art machine translation systems and cross-lingual pretrained models. Experimental results suggest that XOR QA is a challenging task that will facilitate the development of novel techniques for multilingual question answering. Our data and code are available at https://nlp.cs.washington.edu/xorqa.