无图像的NLP任务的视觉启动预审计的语言模型

论文标题

无图像的NLP任务的视觉启动预审计的语言模型

Visually-augmented pretrained language models for NLP tasks without images

论文作者

Guo, Hangyu, Zhou, Kun, Zhao, Wayne Xin, Zhang, Qinyu, Wen, Ji-Rong

论文摘要

尽管预先训练的语言模型〜（PLM）通过纯文本自学培训表现出令人印象深刻的表现，但发现他们缺乏视觉语义或常识。现有的解决方案通常依靠明确的图像来进行视觉知识增强（需要耗时的检索或生成），并且它们还为整个输入文本进行了增强，而无需考虑在特定输入或任务中是否确实需要它。为了解决这些问题，我们提出了一种新颖的\ textbf {v}。实验结果表明，我们的方法可以始终如一地提高伯特，罗伯塔，巴特和T5在不同尺度上的性能，并且在十项任务上的表现都超过了几个竞争性基线。我们的代码和数据可在〜\ url {https://github.com/rucaibox/vawi}上公开获得。

Although pre-trained language models~(PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel \textbf{V}isually-\textbf{A}ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, \textbf{W}ithout using any retrieved or generated \textbf{I}mages, namely \textbf{VAWI}. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/VAWI}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题