职业：劳动序列数据的基础模型

论文标题

职业：劳动序列数据的基础模型

CAREER: A Foundation Model for Labor Sequence Data

论文作者

Vafa, Keyon, Palikot, Emil, Du, Tianyu, Kanodia, Ayush, Athey, Susan, Blei, David M.

论文摘要

劳工经济学家通过将预测模型拟合到小型，精心构建的纵向调查数据集来定期分析就业数据。尽管机器学习方法为这些问题提供了希望，但这些调查数据集太小了，无法利用它们。近年来，大量的在线简历数据集也已获得，提供了有关数百万个人职业轨迹的数据。但是，标准计量经济学模型无法利用其规模或将其纳入调查数据的分析中。为此，我们发展了职业，这是工作序列的基础模型。职业首先适合大型，被动收集的简历数据，然后对较小，策划的数据集进行微调，以进行经济推断。我们将职业适合于简历中的2400万个工作序列的数据集，并在小型纵向调查数据集中对其进行调整。我们发现职业形式的工作序列准确预测，在三个广泛使用的经济学数据集上优于计量经济学基线。我们进一步发现，职业可用于对其他下游变量进行良好的预测。例如，将职业纳入工资模型比当前正在使用的计量经济学模型提供了更好的预测。

Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small to take advantage of them. In recent years large datasets of online resumes have also become available, providing data about the career trajectories of millions of individuals. However, standard econometric models cannot take advantage of their scale or incorporate them into the analysis of survey data. To this end we develop CAREER, a foundation model for job sequences. CAREER is first fit to large, passively-collected resume data and then fine-tuned to smaller, better-curated datasets for economic inferences. We fit CAREER to a dataset of 24 million job sequences from resumes, and adjust it on small longitudinal survey datasets. We find that CAREER forms accurate predictions of job sequences, outperforming econometric baselines on three widely-used economics datasets. We further find that CAREER can be used to form good predictions of other downstream variables. For example, incorporating CAREER into a wage model provides better predictions than the econometric models currently in use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题