论文标题

突尼斯阿拉伯多任务序列预测多级注释

Multi-Task Sequence Prediction For Tunisian Arabizi Multi-Level Annotation

论文作者

Gugliotta, Elisa, Dinarelli, Marco, Kraif, Olivier

论文摘要

在本文中,我们提出了一个基于复发性神经网络的多任务序列预测系统,并用于在多个级别上注释阿拉伯突尼斯语料库。执行的注释是文本分类,象征性,POS标记和突尼斯阿拉伯语编码为尾声*阿拉伯拼字法。学会了该系统以预测级联反应中的所有注释水平,从阿拉伯式输入开始。我们在Tiger German语料库上评估了该系统,适当地将数据转换为具有多任务问题,以显示我们的神经建筑的有效性。我们还展示了如何使用该系统来注释突尼斯阿拉伯语语料库,后来已手动纠正并用于进一步评估突尼斯数据上的序列模型。我们的系统是为FairSeq框架开发的,该框架可以为任何其他序列预测问题提供快速,易于使用。

In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源