在Twitter上的多语言个人就业状况

论文标题

在Twitter上的多语言个人就业状况

Multilingual Detection of Personal Employment Status on Twitter

论文作者

Tonneau, Manuel, Adjodah, Dhaval, Palotti, João, Grinberg, Nir, Fraiberger, Samuel

论文摘要

检测个人在社交媒体上的就业状况的披露可以提供有价值的信息，以使求职者与合适的职位相匹配，提供社会保护或衡量劳动力市场流动。但是，确定这种个人披露是一项具有挑战性的任务，因为它们在社交媒体内容中的稀有性以及用于描述它们的各种语言形式。在这里，我们在极端阶级不平衡的现实环境中研究了三种活跃的学习（AL）策略，并使用基于BERT的分类模型来确定三种语言中有关个人就业状况的五种披露类型。我们的发现表明，即使在极端不平衡的环境下，与具有相同数量标签数量的监督基线相比，少数AL迭代足以在精确，回忆和结果多样性方面获得大量而显着的收益。我们还发现，没有AL策略始终优于其余的策略。定性分析表明，AL有助于将BERT的注意机制集中在核心术语上，并调整语义扩展的界限，从而强调了可解释模型的重要性，以提供对这一动态学习过程的更大控制和可见性。

Detecting disclosures of individuals' employment status on social media can provide valuable information to match job seekers with suitable vacancies, offer social protection, or measure labor market flows. However, identifying such personal disclosures is a challenging task due to their rarity in a sea of social media content and the variety of linguistic forms used to describe them. Here, we examine three Active Learning (AL) strategies in real-world settings of extreme class imbalance, and identify five types of disclosures about individuals' employment status (e.g. job loss) in three languages using BERT-based classification models. Our findings show that, even under extreme imbalance settings, a small number of AL iterations is sufficient to obtain large and significant gains in precision, recall, and diversity of results compared to a supervised baseline with the same number of labels. We also find that no AL strategy consistently outperforms the rest. Qualitative analysis suggests that AL helps focus the attention mechanism of BERT on core terms and adjust the boundaries of semantic expansion, highlighting the importance of interpretable models to provide greater control and visibility into this dynamic learning process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题