Bioadapt-MRC：基于对抗性学习的域适应性改善生物医学机器阅读理解任务

论文标题

Bioadapt-MRC：基于对抗性学习的域适应性改善生物医学机器阅读理解任务

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

论文作者

Mahbub, Maria, Srinivasan, Sudarshan, Begoli, Edmon, Peterson, Gregory D

论文摘要

生物医学机器阅读理解（生物医学MRC）旨在理解复杂的生物医学叙事，并协助医疗保健专业人员从中检索信息。现代神经网络的MRC系统的高性能取决于高质量的大规模，人为宣传的培训数据集。在生物医学领域中，创建此类数据集的一个至关重要的挑战是域知识的要求，引起了标记数据的稀缺性以及从标记的通用通用（源）域转移学习到生物医学（目标）域的需求。然而，由于主题方差，通用和生物医学领域之间的边际分布存在差异。因此，从在通用域上训练的模型到生物医学领域的模型直接转移学会的表示可能会损害模型的性能。我们为生物医学机器阅读理解任务（BioAdapt-MRC）提供了基于对抗性学习的域适应框架，这是一种基于神经网络的方法，可解决一般和生物医学域数据之间边际分布的差异。 BioAdapt-MRC放宽了生成伪标签的需求，以训练表现良好的生物医学MRC模型。我们通过将三种广泛使用的基准生物医学MRC数据集的现有方法与最佳现有方法进行比较，从而广泛评估了生物adapt-MRC的性能-BioASQ-7B，BioASQ-8B和BioASQ-9B。我们的结果表明，如果不使用来自生物医学领域的任何合成或人类通知的数据，bioadapt-MRC可以在这些数据集中实现最先进的性能。可用性：bioadapt-MRC可以作为开源项目免费获得，\ url {https://github.com/mmahbub/bioadapt-mrc}。

Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets -- BioASQ-7b, BioASQ-8b, and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. Availability: BioADAPT-MRC is freely available as an open-source project at \url{https://github.com/mmahbub/BioADAPT-MRC}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题