论文标题

使用自然语言处理的预先现代英语文学中性别差异的强大量化

Robust Quantification of Gender Disparity in Pre-Modern English Literature using Natural Language Processing

论文作者

Nagaraj, Akarsh, Kejriwal, Mayank

论文摘要

研究继续阐明性别差异在社会,文化和经济领域的程度和意义。最近,已经提出了使用相对广泛的数据集和经验严格的方法来衡量这种差异的自然语言处理(NLP)文献中的计算工具。在本文中,我们通过在前现代时期发表的版权期预期的文学文本中研究性别差异(在这项工作中定义为从十九世纪中叶至三世纪中叶的时期定义),从而为这一研究做出了贡献。使用此类工具的挑战之一是确保质量控制,并扩展可信赖的统计分析。另一个挑战是使用公开可用并已建立一段时间的材料和方法,以确保将来可以使用和审查它们,还可以对方法论本身增加信心。我们提出了解决这些挑战的解决方案,并使用多种措施表明,女性角色和男性角色在前现代文献中的普遍性之间存在显着差异。证据表明,当作者是女性时,差异会下降。随着我们在这个世纪为期几十年中绘制数据,差异似乎相对稳定。最后,我们的目标是仔细描述与这项研究相关的局限性和道德警告,以及其他类似的局限性和伦理。

Research has continued to shed light on the extent and significance of gender disparity in social, cultural and economic spheres. More recently, computational tools from the Natural Language Processing (NLP) literature have been proposed for measuring such disparity using relatively extensive datasets and empirically rigorous methodologies. In this paper, we contribute to this line of research by studying gender disparity, at scale, in copyright-expired literary texts published in the pre-modern period (defined in this work as the period ranging from the mid-nineteenth through the mid-twentieth century). One of the challenges in using such tools is to ensure quality control, and by extension, trustworthy statistical analysis. Another challenge is in using materials and methods that are publicly available and have been established for some time, both to ensure that they can be used and vetted in the future, and also, to add confidence to the methodology itself. We present our solution to addressing these challenges, and using multiple measures, demonstrate the significant discrepancy between the prevalence of female characters and male characters in pre-modern literature. The evidence suggests that the discrepancy declines when the author is female. The discrepancy seems to be relatively stable as we plot data over the decades in this century-long period. Finally, we aim to carefully describe both the limitations and ethical caveats associated with this study, and others like it.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源