忠实临床医学的检索授课和知识接地的语言模型

论文标题

忠实临床医学的检索授课和知识接地的语言模型

Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine

论文作者

Liu, Fenglin, Yang, Bang, You, Chenyu, Wu, Xian, Ge, Shen, Liu, Zhangdaihong, Sun, Xu, Yang, Yang, Clifton, David A.

论文摘要

语言模型（LMS），包括大型语言模型（例如Chatgpt），有可能协助临床医生生成各种临床笔记。但是，LMS容易产生``幻觉''，即生成的内容与事实和知识不符。在本文中，我们提出了$^3 $作家方法，并具有检索型的一代和知识接地的推理，以使LMS能够产生忠实的临床文本。我们证明了我们方法在生成患者出院指导方面的有效性。它要求LMS不仅要了解患者的长期临床文件，即住院期间的健康记录，而且还要在出院时生成向护理人员和患者提供的关键教学信息。拟议的RE $^3 $ Writer模仿了医生的工作模式，首先是\ TextBf {re} Trieve与医生所写的历史指示相关的工作经验，然后\ textbf {re}与医学知识相关。最后，它\ textbf {re}罚款所检索的工作经验和合理的医学知识以提取有用的信息，该信息用于为以前未见的患者生成出院指令。我们的实验表明，使用我们的方法，可以在所有指标中大大提高五个代表性LM的性能。同时，我们展示了人类评估的结果，以衡量流利，忠诚和全面性方面的有效性。

Language models (LMs), including large language models (such as ChatGPT), have the potential to assist clinicians in generating various clinical notes. However, LMs are prone to produce ``hallucinations'', i.e., generated content that is not aligned with facts and knowledge. In this paper, we propose the Re$^3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning to enable LMs to generate faithful clinical texts. We demonstrate the effectiveness of our method in generating patient discharge instructions. It requires the LMs not to only understand the patients' long clinical documents, i.e., the health records during hospitalization, but also to generate critical instructional information provided both to carers and to the patient at the time of discharge. The proposed Re$^3$Writer imitates the working patterns of physicians to first \textbf{re}trieve related working experience from historical instructions written by physicians, then \textbf{re}ason related medical knowledge. Finally, it \textbf{re}fines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the discharge instructions for previously-unseen patients. Our experiments show that, using our method, the performance of five representative LMs can be substantially boosted across all metrics. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of fluency, faithfulness, and comprehensiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题