问题在哪里？文本数据中的问题识别的多渠道深卷卷积神经网络

论文标题

问题在哪里？文本数据中的问题识别的多渠道深卷卷积神经网络

Where's the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data

论文作者

Michalopoulos, George, Chen, Helen, Wong, Alexander

论文摘要

在大多数临床实践环境中，对临床文档没有严格的审查，从而导致患者病历中捕获的信息不正确。临床数据捕获的黄金标准是通过“专家审查”实现的，临床医生可以与域专家（审阅者）进行对话，并向他们询问有关数据输入规则的问题。在这些对话中自动识别“真实问题”可能会发现歧义性或在给定临床环境中捕获数据中的常见问题。在这项研究中，我们提出了一个新型的多渠道深度卷积神经网络架构，即Quest-CNN，目的是分离出对句子的答案（信息或帮助）的真实问题，这些问题是从句子中出现的，而不是问题，以及从附近句子中提到的问题引用的问题（例如，您可以澄清一下吗？我们对拟议的多通道深卷积神经网络对其他深层神经网络进行了全面的性能比较分析。此外，我们评估了用于检测问题句子的传统基于规则和基于学习的方法的性能。所提出的Quest-CNN在透析护理设置和一般域数据集中的数据入门评论对话数据集中获得了最佳的F1分数。

In most clinical practice settings, there is no rigorous reviewing of the clinical documentation, resulting in inaccurate information captured in the patient medical records. The gold standard in clinical data capturing is achieved via "expert-review", where clinicians can have a dialogue with a domain expert (reviewers) and ask them questions about data entry rules. Automatically identifying "real questions" in these dialogues could uncover ambiguities or common problems in data capturing in a given clinical setting. In this study, we proposed a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions that expect an answer (information or help) about an issue from sentences that are not questions, as well as from questions referring to an issue mentioned in a nearby sentence (e.g., can you clarify this?), which we will refer as "c-questions". We conducted a comprehensive performance comparison analysis of the proposed multi-channel deep convolutional neural network against other deep neural networks. Furthermore, we evaluated the performance of traditional rule-based and learning-based methods for detecting question sentences. The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题