论文标题
Sibila:应用于医疗环境的通用机器学习模型的新型可解释的合奏
SIBILA: A novel interpretable ensemble of general-purpose machine learning models applied to medical contexts
论文作者
论文摘要
个性化医学仍然是科学家的主要挑战。机器学习和深度学习的快速增长使它们成为预测对个别患者最合适的疗法的可行性。但是,需要为每个数据集开发自定义模型,缺乏对结果的解释和高度计算要求的需求使许多人不愿使用这些方法。旨在节省时间并为模型内部工作的方式带来光线,Sibila已开发出来。西比拉(Sibila)是机器学习和深度学习模型的合奏,它应用了一系列可解释性算法来识别最相关的输入功能。由于可解释性算法可能与彼此不符,因此已经建立了共识阶段来估计每个变量对预测的全局归因。 Sibila被化为容器以在任何高性能计算平台上运行。尽管被视为命令行工具,但也可以在https://bio-hpc.ucam.edu/sibila免费为所有用户免费提供作为Web服务器。因此,即使技术技能很少的用户也可以利用它。 Sibila已应用于两个医学案例研究,以显示其在分类问题中预测的能力。尽管它是一种通用工具,但它的开发是为了成为临床医生的强大决策工具,但实际上可以在许多其他领域中使用。因此,将其他两个非医学示例作为补充材料提供,以证明西比拉仍然可以在噪声和回归问题中效果很好。
Personalized medicine remains a major challenge for scientists. The rapid growth of Machine learning and Deep learning has made them a feasible al- ternative for predicting the most appropriate therapy for individual patients. However, the need to develop a custom model for every dataset, the lack of interpretation of their results and high computational requirements make many reluctant to use these methods. Aiming to save time and bring light to the way models work internally, SIBILA has been developed. SIBILA is an ensemble of machine learning and deep learning models that applies a range of interpretability algorithms to identify the most relevant input features. Since the interpretability algo- rithms may not be in line with each other, a consensus stage has been imple- mented to estimate the global attribution of each variable to the predictions. SIBILA is containerized to be run on any high-performance computing plat- form. Although conceived as a command-line tool, it is also available to all users free of charge as a web server at https://bio-hpc.ucam.edu/sibila. Thus, even users with few technological skills can take advantage of it. SIBILA has been applied to two medical case studies to show its ability to predict in classification problems. Even though it is a general-purpose tool, it has been developed with the aim of becoming a powerful decision-making tool for clinicians, but can actually be used in many other domains. Thus, other two non-medical examples are supplied as supplementary material to prove that SIBILA still works well with noise and in regression problems.