通过融合生物医学问题的域知识来回答上下文嵌入和模型加权

论文标题

通过融合生物医学问题的域知识来回答上下文嵌入和模型加权

Contextual embedding and model weighting by fusing domain knowledge on Biomedical Question Answering

论文作者

Lu, Yuxuan, Yan, Jingya, Qi, Zhixuan, Ge, Zhongzheng, Du, Yongping

论文摘要

生物医学问题的回答旨在从生物医学领域获得给定问题的答案。由于其对生物医学领域知识的需求很高，因此模型很难从有限的培训数据中学习域知识。我们提出了一种上下文嵌入方法，该方法结合了开放域QA模型\ aoA和\ \ biobert模型在生物医学域数据上进行了预先训练。我们对大型生物医学语料库采用无监督的预培训，并在生物医学问题答案数据集上进行了微调。此外，我们采用基于MLP的模型加权层自动利用两个模型的优势以提供正确的答案。由PubMed语料库构建的公共数据集\ BIOMRC用于评估我们的方法。实验结果表明，我们的模型以很大的边距优于最先进的系统。

Biomedical Question Answering aims to obtain an answer to the given question from the biomedical domain. Due to its high requirement of biomedical domain knowledge, it is difficult for the model to learn domain knowledge from limited training data. We propose a contextual embedding method that combines open-domain QA model \aoa and \biobert model pre-trained on biomedical domain data. We adopt unsupervised pre-training on large biomedical corpus and supervised fine-tuning on biomedical question answering dataset. Additionally, we adopt an MLP-based model weighting layer to automatically exploit the advantages of two models to provide the correct answer. The public dataset \biomrc constructed from PubMed corpus is used to evaluate our method. Experimental results show that our model outperforms state-of-the-art system by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题