论文标题
评估预测准确性与可解释性的主题建模之间的权衡取舍
Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora
论文作者
论文摘要
随着能量研究的数量和种类的增加,机器意识到的主题识别对于简化未来的研究管道是必要的。自动主题识别过程的构成包括创建文档表示和执行分类。但是,这些过程对能量研究的实施构成了新的挑战。 Energetics数据集包含许多科学术语,这些术语是了解文档的上下文所必需的,但可能需要更复杂的文档表示。其次,分类的预测必须由管道中的化学家可以理解和信任。在这项工作中,我们通过实施三种在计算复杂性方面的嵌入方法来研究预测准确性和可解释性之间的权衡。通过我们的准确性结果,我们还介绍了每个预测的局部解释性模型 - 敏捷的解释(LIME),以提供对每个预测的局部理解,并与我们的能量专家团队验证分类器决策。这项研究是在由我们的Energetics专家团队创建和验证的一个新颖的标记为Energetics数据集上进行的。
As the amount and variety of energetics research increases, machine aware topic identification is necessary to streamline future research pipelines. The makeup of an automatic topic identification process consists of creating document representations and performing classification. However, the implementation of these processes on energetics research imposes new challenges. Energetics datasets contain many scientific terms that are necessary to understand the context of a document but may require more complex document representations. Secondly, the predictions from classification must be understandable and trusted by the chemists within the pipeline. In this work, we study the trade-off between prediction accuracy and interpretability by implementing three document embedding methods that vary in computational complexity. With our accuracy results, we also introduce local interpretability model-agnostic explanations (LIME) of each prediction to provide a localized understanding of each prediction and to validate classifier decisions with our team of energetics experts. This study was carried out on a novel labeled energetics dataset created and validated by our team of energetics experts.