Keis@仅在2020年Semeval-2020任务12：使用加权合奏和微调Bert识别多语言进攻推文

论文标题

Keis@仅在2020年Semeval-2020任务12：使用加权合奏和微调Bert识别多语言进攻推文

KEIS@JUST at SemEval-2020 Task 12: Identifying Multilingual Offensive Tweets Using Weighted Ensemble and Fine-Tuned BERT

论文作者

Tawalbeh, Saja Khaled, Hammad, Mahmoud, AL-Smadi, Mohammad

论文摘要

这项研究介绍了我们的团队Keis@仅参加Semeval-2020任务12，该任务代表了多语言攻击语言的共同任务。我们参与了除英语子任务以外的所有子任务的所有提供的语言。已经开发了两种主要方法，首先是针对阿拉伯语和英语的语言，由Bi-Gru和CNN组成，然后是高斯噪声，全球池层乘以重量以改善整体性能。第二种是针对其他语言执行的，这是从Bi-LSTM和Bi-Gru等复发性神经网络旁边的BERT进行转移学习，然后是全球平均合并层。单词嵌入和上下文嵌入已被用作特征，此外，数据增强仅用于阿拉伯语。

This research presents our team KEIS@JUST participation at SemEval-2020 Task 12 which represents shared task on multilingual offensive language. We participated in all the provided languages for all subtasks except sub-task-A for the English language. Two main approaches have been developed the first is performed to tackle both languages Arabic and English, a weighted ensemble consists of Bi-GRU and CNN followed by Gaussian noise and global pooling layer multiplied by weights to improve the overall performance. The second is performed for other languages, a transfer learning from BERT beside the recurrent neural networks such as Bi-LSTM and Bi-GRU followed by a global average pooling layer. Word embedding and contextual embedding have been used as features, moreover, data augmentation has been used only for the Arabic language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题