论文标题

cusatnlp@hasoc-dravidian-codemix-fire2020:从manglishtweets中识别令人反感的语言

CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets

论文作者

Renjit, Sara, Idicula, Sumam Mary

论文摘要

随着社交媒体的普及,通过博客,Facebook,Twitter和其他平台的沟通增加了。最初,英语是唯一的交流媒介。幸运的是,现在我们可以用任何语言进行交流。它导致人们以混合形式使用英语和自己的母语或母语语言。有时,其他语言的评论具有英语音译格式或其他情况;人们使用预期的语言脚本。在这些时间中,从此类代码中确定情感和进攻内容是一项必要的任务。我们提出了一个为sub-track Hesoc进攻语言标识的任务2提交的工作模型-DravidianCodemix在论坛上进行信息检索评估,2020年。这是消息级别的分类任务。基于嵌入模型的分类器在我们的方法中确定了令人反感的,而不是冒犯性的评论。我们在提供的Manglish数据集以及子轨道上应用了此方法。

With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We applied this method in the Manglish dataset provided along with the sub-track.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源