论文标题
结肠镜检查具有视觉变压器的地标检测
Colonoscopy Landmark Detection using Vision Transformers
论文作者
论文摘要
结肠镜检查是一种常规门诊手术,用于检查结肠和直肠的任何异常,包括息肉,憩室和结肠结构的狭窄。临床医生的大量时间用于在结肠镜检查过程中拍摄的快照,以维持医疗记录或进一步研究。自动化此步骤可以节省时间并提高流程的效率。在我们的工作中,我们收集了一个由专家注释的过程中的120个结肠镜检查视频和2416张快照的数据集。此外,我们开发了一种基于新颖的,视觉转化器的地标检测算法,该算法可以从结肠镜检查过程中识别出关键的解剖标记(阑尾孔,回肠瓣膜/Cecum Landmark和直肠反复体)。我们的算法在预处理过程中使用自适应伽马校正,以保持所有图像的一致亮度。然后,我们将视觉变压器用作特征提取主链和完全连接的基于网络的分类器头,将给定的框架分为四个类:三个地标或非地标框架。我们将视觉变压器(VIT-B/16)主链与RESNET-101和Convnext-B骨干进行了比较,这些骨干和Convnext-B骨干也接受了类似的训练。我们报告了快照的测试数据集上的视觉变压器主链的精度为82%。
Colonoscopy is a routine outpatient procedure used to examine the colon and rectum for any abnormalities including polyps, diverticula and narrowing of colon structures. A significant amount of the clinician's time is spent in post-processing snapshots taken during the colonoscopy procedure, for maintaining medical records or further investigation. Automating this step can save time and improve the efficiency of the process. In our work, we have collected a dataset of 120 colonoscopy videos and 2416 snapshots taken during the procedure, that have been annotated by experts. Further, we have developed a novel, vision-transformer based landmark detection algorithm that identifies key anatomical landmarks (the appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflexion) from snapshots taken during colonoscopy. Our algorithm uses an adaptive gamma correction during preprocessing to maintain a consistent brightness for all images. We then use a vision transformer as the feature extraction backbone and a fully connected network based classifier head to categorize a given frame into four classes: the three landmarks or a non-landmark frame. We compare the vision transformer (ViT-B/16) backbone with ResNet-101 and ConvNext-B backbones that have been trained similarly. We report an accuracy of 82% with the vision transformer backbone on a test dataset of snapshots.