论文标题
Xalign:低资源语言的跨语性事实对文本对齐和发电
XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages
论文作者
论文摘要
多种关键场景(例如Wikipedia的文本生成英文Infoboxes)需要从英语Fact Triles中自动化的描述性文本(LR)语言。以前的工作重点是英语事实 - 文本(F2T)。据我们所知,以前没有尝试过针对LR语言的跨语言对齐或一代。构建有效的跨语言F2T(XF2T)系统需要在英语结构化事实和LR句子之间进行对齐。我们提出了两种无监督的跨语言对准方法。我们为Xalign贡献Xalign,这是一个XF2T数据集,具有8种语言的0.45m对,其中5402对已被手动注释。我们还在Xalign数据集上训练强大的基线XF2T生成模型。
Multiple critical scenarios (like Wikipedia text generation given English Infoboxes) need automated generation of descriptive text in low resource (LR) languages from English fact triples. Previous work has focused on English fact-to-text (F2T) generation. To the best of our knowledge, there has been no previous attempt on cross-lingual alignment or generation for LR languages. Building an effective cross-lingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. We propose two unsupervised methods for cross-lingual alignment. We contribute XALIGN, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on the XAlign dataset.