论文标题

使用变压器基于Twitter的性别识别

Twitter-Based Gender Recognition Using Transformers

论文作者

Nia, Zahra Movahedi, Ahmadi, Ali, Mellado, Bruce, Wu, Jianhong, Orbinski, James, Agary, Ali, Kong, Jude Dzevela

论文摘要

社交媒体包含有关人和社会的有用信息,可以帮助您在许多不同领域进行研究(例如,采用意见采矿,情感/情感分析以及统计分析),例如商业和金融,健康,健康,社会经济不平等和性别脆弱性。用户人口统计资料提供了丰富的信息,可以进一步研究主题。但是,用户人口统计(例如性别)被认为是私人的,并且不能自由使用。在这项研究中,我们提出了一个基于变形金刚的模型,以通过其图像和推文来预测用户的性别。我们微调基于视觉变压器(VIT)的模型,以对女性和男性图像进行分层。接下来,我们将基于双向编码器(BERT)的双向编码的另一个模型微调,以通过推文来识别用户的性别。这是非常有益的,因为并非所有用户都提供表明其性别的图像。这些用户的性别可以被检测到他们的推文。组合模型将图像和文本分类模型的准确性分别提高了6.98%和4.43%。这表明图像和文本分类模型能够通过提供其他信息相互补充。我们将方法应用于PAN-2018数据集,并获得85.52%的精度。

Social media contains useful information about people and the society that could help advance research in many different areas (e.g. by applying opinion mining, emotion/sentiment analysis, and statistical analysis) such as business and finance, health, socio-economic inequality and gender vulnerability. User demographics provide rich information that could help study the subject further. However, user demographics such as gender are considered private and are not freely available. In this study, we propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune a model based on Vision Transformers (ViT) to stratify female and male images. Next, we fine-tune another model based on Bidirectional Encoders Representations from Transformers (BERT) to recognize the user's gender by their tweets. This is highly beneficial, because not all users provide an image that indicates their gender. The gender of such users could be detected form their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively. This shows that the image and text classification models are capable of complementing each other by providing additional information to one another. We apply our method to the PAN-2018 dataset, and obtain an accuracy of 85.52%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源