论文标题
单词嵌入模型对仇恨和进攻性语音检测的影响
Effect of Word Embedding Models on Hate and Offensive Speech Detection
论文作者
论文摘要
在仇恨言论检测问题中,深层神经网络已成功采用。然而,在文献中尚未对嵌入模型一词对神经网络性能的影响。在我们的研究中,通过不同的检测任务,2级,3级和6级分类,我们研究了单词嵌入模型和神经网络体系结构对预测精度的影响。我们的重点是阿拉伯语。我们首先在大规模未标记的阿拉伯文本语料库上训练几个单词嵌入模型。接下来,基于阿拉伯仇恨和进攻性语音的数据集,对于每个检测任务,我们使用预训练的单词嵌入模型训练几个神经网络分类器。该任务产生了大量的各种学习模型,这允许进行详尽的比较。一方面,经验分析证明了跳过模型的优越性,另一方面是CNN网络在三个检测任务上的优越性。
Deep neural networks have been adopted successfully in hate speech detection problems. Nevertheless, the effect of the word embedding models on the neural network's performance has not been appropriately examined in the literature. In our study, through different detection tasks, 2-class, 3-class, and 6-class classification, we investigate the impact of both word embedding models and neural network architectures on the predictive accuracy. Our focus is on the Arabic language. We first train several word embedding models on a large-scale unlabelled Arabic text corpus. Next, based on a dataset of Arabic hate and offensive speech, for each detection task, we train several neural network classifiers using the pre-trained word embedding models. This task yields a large number of various learned models, which allows conducting an exhaustive comparison. The empirical analysis demonstrates, on the one hand, the superiority of the skip-gram models and, on the other hand, the superiority of the CNN network across the three detection tasks.