多旋转：使用结构MRI和功能网络连接数据的精神分裂症预测的多模式视觉变压器

论文标题

多旋转：使用结构MRI和功能网络连接数据的精神分裂症预测的多模式视觉变压器

MultiCrossViT: Multimodal Vision Transformer for Schizophrenia Prediction using Structural MRI and Functional Network Connectivity Data

论文作者

Bi, Yuda, Abrol, Anees, Fu, Zening, Calhoun, Vince

论文摘要

Vision Transformer（VIT）是一个开创性的深度学习框架，可以解决现实世界中的计算机视觉问题，例如图像分类和对象识别。重要的是，VIT被证明超过了传统的深度学习模型，例如卷积神经网络（CNN）。相对较最近，许多VIT突变已移植到医学成像领域，从而解决了各种批判性分类和分割挑战，尤其是在大脑成像数据方面。在这项工作中，我们提供了一种新型的多模式深度学习管道，MulticRossvit，能够分析结构MRI（SMRI）和静态功能网络连通性（SFNC）数据，以预测精神分裂症。在具有最小培训对象的数据集中，我们的新型模型可以达到0.832的AUC。最后，我们通过从变压器编码器中提取特征，可视化与精神分裂症最相关的多个大脑区域和协方差模式。

Vision Transformer (ViT) is a pioneering deep learning framework that can address real-world computer vision issues, such as image classification and object recognition. Importantly, ViTs are proven to outperform traditional deep learning models, such as convolutional neural networks (CNNs). Relatively recently, a number of ViT mutations have been transplanted into the field of medical imaging, thereby resolving a variety of critical classification and segmentation challenges, especially in terms of brain imaging data. In this work, we provide a novel multimodal deep learning pipeline, MultiCrossViT, which is capable of analyzing both structural MRI (sMRI) and static functional network connectivity (sFNC) data for the prediction of schizophrenia disease. On a dataset with minimal training subjects, our novel model can achieve an AUC of 0.832. Finally, we visualize multiple brain regions and covariance patterns most relevant to schizophrenia based on the resulting ViT attention maps by extracting features from transformer encoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题