论文标题

带有零拍学习的跨语性上下文化主题模型

Cross-lingual Contextualized Topic Models with Zero-shot Learning

论文作者

Bianchi, Federico, Terragni, Silvia, Hovy, Dirk, Nozza, Debora, Fersini, Elisabetta

论文摘要

许多数据集(例如,评论,论坛,新闻等)都以多种语言为单位。它们都涵盖了相同的内容,但是语言上的差异使得不可能使用传统的基于单词的主题模型。模型必须是单语言或遭受巨大但极稀疏的词汇量。这两个问题都可以通过转移学习来解决。在本文中,我们介绍了一个零拍的跨语性主题模型。我们的模型学习了一种语言的主题(在这里,英语),并以不同语言(在这里,意大利语,法语,德语和葡萄牙语)预测它们的文档。我们评估了不同语言的同一文档的主题预测质量。我们的结果表明,转移的主题在跨语言之间是连贯和稳定的,这表明了令人兴奋的未来研究方向。

Many data sets (e.g., reviews, forums, news, etc.) exist parallelly in multiple languages. They all cover the same content, but the linguistic differences make it impossible to use traditional, bag-of-word-based topic models. Models have to be either single-language or suffer from a huge, but extremely sparse vocabulary. Both issues can be addressed by transfer learning. In this paper, we introduce a zero-shot cross-lingual topic model. Our model learns topics on one language (here, English), and predicts them for unseen documents in different languages (here, Italian, French, German, and Portuguese). We evaluate the quality of the topic predictions for the same document in different languages. Our results show that the transferred topics are coherent and stable across languages, which suggests exciting future research directions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源