论文标题
BERT用于单语和跨语言反向词典
BERT for Monolingual and Cross-Lingual Reverse Dictionary
论文作者
论文摘要
反向字典是找到给定单词描述的正确目标单词的任务。在本文中,我们试图将BERT纳入此任务中。但是,由于BERT基于字节对编码(BPE)子词编码,因此使Bert生成一个给定描述的单词并非平底。我们提出了一种简单但有效的方法,使BERT为此特定任务生成目标词。此外,跨语言反向字典是找到用另一种语言描述的正确目标词的任务。以前的模型必须保留两个不同的单词嵌入并学会对齐这些嵌入。然而,通过使用多语言Bert(Mbert),我们可以用一个子字嵌入有效地进行跨语性反向字典,并且不需要语言之间的对齐方式。更重要的是,即使没有平行语料库,Mbert也可以实现显着的跨语性反向字典性能,这意味着它可以使用仅使用相应的单语言数据进行跨语性的反向字典。代码可在https://github.com/yhcc/bertforrd.git上公开获取。
Reverse dictionary is the task to find the proper target word given the word description. In this paper, we tried to incorporate BERT into this task. However, since BERT is based on the byte-pair-encoding (BPE) subword encoding, it is nontrivial to make BERT generate a word given the description. We propose a simple but effective method to make BERT generate the target word for this specific task. Besides, the cross-lingual reverse dictionary is the task to find the proper target word described in another language. Previous models have to keep two different word embeddings and learn to align these embeddings. Nevertheless, by using the Multilingual BERT (mBERT), we can efficiently conduct the cross-lingual reverse dictionary with one subword embedding, and the alignment between languages is not necessary. More importantly, mBERT can achieve remarkable cross-lingual reverse dictionary performance even without the parallel corpus, which means it can conduct the cross-lingual reverse dictionary with only corresponding monolingual data. Code is publicly available at https://github.com/yhcc/BertForRD.git.