论文标题
有限光谱数据的可解释预测建模
Explainable Predictive Modeling for Limited Spectral Data
论文作者
论文摘要
具有有限观测值的高维标记数据的特征选择对于使域专家可访问,可扩展和可解释的强大预测建模至关重要。光谱数据记录了物质和电磁辐射之间的相互作用,尤其是单个样本中包含许多信息。由于获取此类高维数据是一项复杂的任务,因此利用最佳的分析工具来提取必要的信息至关重要。在本文中,我们研究了最常用的特征选择技术,并介绍了应用最新可解释的AI技术来解释高维和有限光谱数据的预测结果。对预测结果的解释对领域专家确保了ML模型对领域知识的透明度和忠诚,这是有益的。由于仪器分辨率的局限性,光谱数据的重要区域可以通过光谱仪设备的小型化来优化数据收集过程。降低设备的尺寸和功率,因此成本是整个传感器对预测系统的现实部署的必要条件。我们专门设计了三种不同的方案,以确保对ML模型的评估对于开发方法的实时实践是可靠的,并发现噪声源对最终结果的隐藏影响。
Feature selection of high-dimensional labeled data with limited observations is critical for making powerful predictive modeling accessible, scalable, and interpretable for domain experts. Spectroscopy data, which records the interaction between matter and electromagnetic radiation, particularly holds a lot of information in a single sample. Since acquiring such high-dimensional data is a complex task, it is crucial to exploit the best analytical tools to extract necessary information. In this paper, we investigate the most commonly used feature selection techniques and introduce applying recent explainable AI techniques to interpret the prediction outcomes of high-dimensional and limited spectral data. Interpretation of the prediction outcome is beneficial for the domain experts as it ensures the transparency and faithfulness of the ML models to the domain knowledge. Due to the instrument resolution limitations, pinpointing important regions of the spectroscopy data creates a pathway to optimize the data collection process through the miniaturization of the spectrometer device. Reducing the device size and power and therefore cost is a requirement for the real-world deployment of such a sensor-to-prediction system as a whole. We specifically design three different scenarios to ensure that the evaluation of ML models is robust for the real-time practice of the developed methodologies and to uncover the hidden effect of noise sources on the final outcome.