wlv-rit在Hasoc-Dravidian-Codemix-Fire2020：YouTube代码转换中的进攻性语言标识评论

论文标题

wlv-rit在Hasoc-Dravidian-Codemix-Fire2020：YouTube代码转换中的进攻性语言标识评论

WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments

论文作者

Ranasinghe, Tharindu, Gupte, Sarthak, Zampieri, Marcos, Nwogu, Ifeoma

论文摘要

本文介绍了WLV-RIT进入仇恨言论和以印欧语（HASOC）共享任务的仇恨言论和进攻性内容识别的条目。2020年的HASOC 2020组织者为参与者提供了带注释的数据集，其中包含Dravidian语言（Malayalam-Genglish-genglish和Tamil-english和Tamil-english）的社交媒体帖子。我们参与了任务1：在Malayalam YouTube代码中的评论中的进攻性评论标识。在我们的方法论中，我们通过应用跨语性上下文单词嵌入和转移学习来利用可用的英语数据来对马拉雅拉姆语数据进行预测。我们使用各种微调策略进一步改善了结果。我们的系统在测试组中获得了0.89加权的F1得分，在12名参与者中排名第五。

This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020. The HASOC 2020 organizers provided participants with annotated datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English). We participated in task 1: Offensive comment identification in Code-mixed Malayalam Youtube comments. In our methodology, we take advantage of available English data by applying cross-lingual contextual word embeddings and transfer learning to make predictions to Malayalam data. We further improve the results using various fine tuning strategies. Our system achieved 0.89 weighted average F1 score for the test set and it ranked 5th place out of 12 participants.

下载PDF全文

下载文献需遵守相关版权规定

论文标题