论文标题
多语言语言模型会捕获不同的道德规范吗?
Do Multilingual Language Models Capture Differing Moral Norms?
论文作者
论文摘要
大量多语言句子表示对大量未经保育数据的培训,培训中包含的语言比例非常不平衡。这可能会导致模型掌握文化价值观,包括来自高资源语言的道德判断,并将其强加于低资源语言。缺乏某些语言的数据也可能导致发展随机,从而发展潜在的有害信念。这两个问题都会对零拍的跨语性模型转移产生负面影响,并可能导致有害结果。因此,我们的目标是(1)通过比较不同语言的不同模型来检测和量化这些问题,(2)开发改善模型不良属性的方法。我们使用多语言模型XLM-R进行的最初实验表明,实际上多语言LMS捕获了道德规范,即使人类的渐进率可能比单语的更高。但是,尚不清楚这些道德规范在多大程度上不同。
Massively multilingual sentence representations are trained on large corpora of uncurated data, with a very imbalanced proportion of languages included in the training. This may cause the models to grasp cultural values including moral judgments from the high-resource languages and impose them on the low-resource languages. The lack of data in certain languages can also lead to developing random and thus potentially harmful beliefs. Both these issues can negatively influence zero-shot cross-lingual model transfer and potentially lead to harmful outcomes. Therefore, we aim to (1) detect and quantify these issues by comparing different models in different languages, (2) develop methods for improving undesirable properties of the models. Our initial experiments using the multilingual model XLM-R show that indeed multilingual LMs capture moral norms, even with potentially higher human-agreement than monolingual ones. However, it is not yet clear to what extent these moral norms differ between languages.