论文标题
多标签生物医学文本分类的示例审核
Exemplar Auditing for Multi-Label Biomedical Text Classification
论文作者
论文摘要
AI在医学中的许多实际应用包括半监督的发现:研究人员旨在以比可用的人类标签更精细的分辨率确定感兴趣的特征。这通常是医疗保健应用中面临的方案,因为粗,高级标签(例如,计费代码)通常是唯一容易获得的来源。这些挑战对于诸如文本之类的模式,特征空间非常高,并且通常包含相当多的噪声。 在这项工作中,我们将最近提出的零摄 - 序列标记方法“通过卷积分解进行二进制标记”概括为可用文档级的人类标签本身是相对较高的维度。该方法通过“内省”产生分类,将推理时间预测的细粒度特征与模型下的训练集中的最近的邻居相关。该方法在电子健康记录数据的经过精心培训的模拟模仿多标签分类任务上证明,这是有效的,却是简约的,并且非常有用,可作为组织神经模型预测和高维数据集的工具。我们提出的方法既产生有效的有效分类模型,又产生一种审讯机制,以帮助医疗保健工人理解推动模型预测的显着特征。
Many practical applications of AI in medicine consist of semi-supervised discovery: The investigator aims to identify features of interest at a resolution more fine-grained than that of the available human labels. This is often the scenario faced in healthcare applications as coarse, high-level labels (e.g., billing codes) are often the only sources that are readily available. These challenges are compounded for modalities such as text, where the feature space is very high-dimensional, and often contains considerable amounts of noise. In this work, we generalize a recently proposed zero-shot sequence labeling method, "binary labeling via a convolutional decomposition", to the case where the available document-level human labels are themselves relatively high-dimensional. The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors from the training set, under the model. The approach is effective, yet parsimonious, as demonstrated on a well-studied MIMIC-III multi-label classification task of electronic health record data, and is useful as a tool for organizing the analysis of neural model predictions and high-dimensional datasets. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.