论文标题

SEXWES:通过跨语性语义专业化在社交媒体中检测中国性别歧视的语义专业

SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

论文作者

Jiang, Aiqi, Zubiaga, Arkaitz

论文摘要

性别歧视的目的是减轻针对某些性别群体的负面在线内容。但是,标记为性别歧视相关的数据集的有限可用性使得识别低资源语言的在线性别歧视是有问题的。在本文中,我们解决了一种低资源语言的社交媒体中自动性别歧视检测的任务 - 中文。我们没有收集新的性别歧视数据或构建跨语性转移学习模型,而是开发跨语化领域感知的语义专业化系统,以充分利用现有数据。语义专业化是一种通过将外部语言知识(例如词典语义关系)整合到专业特征空间中来改造预训练的分布词向量的技术。为此,我们利用从高资源语言(英语)来利用性别歧视的语义资源来专门用目标语言(中文)的预训练的单词矢量来注入领域知识。我们通过固有评估单词相似性和性别歧视检测的外在评估,展示了我们的性别歧视词嵌入(性别)的好处。与其他专业方法和中国基线单词矢量相比,我们的性爱分别显示出内在和外在评估的平均得分提高0.033和0.064。性别的融合结果和可视化性也证明了我们框架对使用低资源语言改造单词向量的有效性。

The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to specialise pre-trained word vectors in the target language (Chinese) to inject domain knowledge. We demonstrate the benefit of our sexist word embeddings (SexWEs) specialised by our framework via intrinsic evaluation of word similarity and extrinsic evaluation of sexism detection. Compared with other specialisation approaches and Chinese baseline word vectors, our SexWEs shows an average score improvement of 0.033 and 0.064 in both intrinsic and extrinsic evaluations, respectively. The ablative results and visualisation of SexWEs also prove the effectiveness of our framework on retrofitting word vectors in low-resource languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源