论文标题

韦尔姆:一种读精心训练的中文语言模型

WeLM: A Well-Read Pre-trained Language Model for Chinese

论文作者

Su, Hui, Zhou, Xiao, Yu, Houjin, Shen, Xiaoyu, Chen, Yuwen, Zhu, Zilin, Yu, Yang, Zhou, Jie

论文摘要

通过自我监督的学习预先训练的大型语言模型在各种各样的任务上表现出令人印象深刻的零击功能。在这项工作中,我们介绍了Welm:一种针对中文的精心读取的预训练的语言模型,能够通过零或几次演示执行不同类型的任务。 Welm通过“阅读”涵盖广泛主题的精选高质量语料库来接受10B参数的培训。我们表明,韦尔姆对各种领域和语言都有广泛的知识。在18个单语(中文)任务中,Welm可以显着胜过现有的具有相似尺寸的预训练模型,并匹配高达25倍大的模型的性能。韦尔姆还表现出强大的功能,在多语性和代码转换理解中表现出色,优于预先对30种语言进行预培训的现有多语言模型。此外,我们收集了人工写的提示,并通过多次培训进行了大量的中文和微调Welm的监督数据集。最终的模型可以实现对看不见的任务类型的强烈概括,并在零射门学习中胜过无监督的韦尔姆。最后,我们证明韦尔姆具有解释和校准自己的决策的基本技能,这可能是未来研究的有希望的方向。我们的模型可以通过https://welm.weixin.qq.com/docs/api/应用。

Large Language Models pre-trained with self-supervised learning have demonstrated impressive zero-shot generalization capabilities on a wide spectrum of tasks. In this work, we present WeLM: a well-read pre-trained language model for Chinese that is able to seamlessly perform different types of tasks with zero or few-shot demonstrations. WeLM is trained with 10B parameters by "reading" a curated high-quality corpus covering a wide range of topics. We show that WeLM is equipped with broad knowledge on various domains and languages. On 18 monolingual (Chinese) tasks, WeLM can significantly outperform existing pre-trained models with similar sizes and match the performance of models up to 25 times larger. WeLM also exhibits strong capabilities in multi-lingual and code-switching understanding, outperforming existing multilingual language models pre-trained on 30 languages. Furthermore, We collected human-written prompts for a large set of supervised datasets in Chinese and fine-tuned WeLM with multi-prompted training. The resulting model can attain strong generalization on unseen types of tasks and outperform the unsupervised WeLM in zero-shot learning. Finally, we demonstrate that WeLM has basic skills at explaining and calibrating the decisions from itself, which can be promising directions for future research. Our models can be applied from https://welm.weixin.qq.com/docs/api/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源