论文标题
iiitdwd-shankarb@ dravidian-codemixi-hasoc2021:基于Mbert的模型,用于识别南印度语言的进攻内容
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages
论文作者
论文摘要
近年来,人们非常关注进攻内容。社交媒体产生的进攻内容数量正在以惊人的速度增加。这比以往任何时候都更需要解决此问题。为了解决这些问题,“ Dravidian代码混合Hasoc-2020”的组织者构成了两个挑战。任务1涉及确定马拉雅拉姆语数据中的进攻内容,而任务2包括马拉雅拉姆语和泰米尔语代码混合句子。我们的团队参与了任务2。在建议的模型中,我们尝试了多种语言BERT来提取功能,并在提取的功能上使用了三个不同的分类器。我们的模型获得了Malayalam数据的加权F1分数为0.70,排名第五。我们还获得了泰米尔代码混合数据的加权F1得分为0.573,排名第11。
In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.