生物医学科学评论的自动外行语言汇总

论文标题

生物医学科学评论的自动外行语言汇总

Automated Lay Language Summarization of Biomedical Scientific Reviews

论文作者

Guo, Yue, Qiu, Wei, Wang, Yizhong, Cohen, Trevor

论文摘要

健康素养已成为做出适当的健康决策和确保治疗结果的关键因素。但是，医疗术语和该领域专业语言的复杂结构使健康信息特别难以解释。因此，迫切需要对自动化方法提高生物医学文献对普通人群的可访问性的需求。这个问题可以被构成作为医疗保健专业人员语言和公众语言之间的一种翻译问题。在本文中，我们介绍了生物医学科学评论的外行语言摘要的新颖任务，并构建一个数据集，以支持自动化方法的开发和评估，以增强生物医学文献的可访问性。我们对解决这项任务的各种挑战进行了分析，不仅包括汇总关键点，还可以解释背景知识和专业语言的简化。我们尝试了最新的摘要模型以及几种数据增强技术，并使用自动指标和人类评估来评估其性能。结果表明，与专家为外行公众开发的参考摘要相比，使用当代神经体系结构生成的自动生成的摘要可以实现有希望的质量和可读性（最佳Rouge-L的50.24和Flesch-Kincaid可读性得分为13.30）。我们还讨论了当前尝试的局限性，为将来的工作提供了见解和方向。

Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes. However, medical jargon and the complex structure of professional language in this domain make health information especially hard to interpret. Thus, there is an urgent unmet need for automated methods to enhance the accessibility of the biomedical literature to the general population. This problem can be framed as a type of translation problem between the language of healthcare professionals, and that of the general public. In this paper, we introduce the novel task of automated generation of lay language summaries of biomedical scientific reviews, and construct a dataset to support the development and evaluation of automated methods through which to enhance the accessibility of the biomedical literature. We conduct analyses of the various challenges in solving this task, including not only summarization of the key points but also explanation of background knowledge and simplification of professional language. We experiment with state-of-the-art summarization models as well as several data augmentation techniques, and evaluate their performance using both automated metrics and human assessment. Results indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with reference summaries developed for the lay public by experts (best ROUGE-L of 50.24 and Flesch-Kincaid readability score of 13.30). We also discuss the limitations of the current attempt, providing insights and directions for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题