伪标签引导的跨效率像素对比度，用于机器人手术场景分割，注释有限

论文标题

伪标签引导的跨效率像素对比度，用于机器人手术场景分割，注释有限

Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations

论文作者

Yu, Yang, Zhao, Zixu, Jin, Yueming, Chen, Guangyong, Dou, Qi, Heng, Pheng-Ann

论文摘要

手术场景细分对于促使机器人手术的认知援助至关重要。但是，以逐帧方式以像素为单位的注释手术视频是昂贵且耗时的。为了大大减轻标签负担，在这项工作中，我们从机器人手术视频中研究了半监督的场景细分，这实际上是必不可少的，但以前很少探索。我们考虑在等距采样下的临床上适当的注释情况。然后，我们提出了PGV-CL，这是一种新型的伪标签引导的跨视频对比学习方法，以增强场景分割。它有效地利用了未标记的数据来获得可信赖和全球模型正则化，从而产生更具歧视性的特征表示。具体来说，对于可信赖的表示学习，我们建议合并伪标签以指导对选择，从而获得更可靠的像素对形成对比对比。此外，我们将表示的学习空间从以前的图像级扩展到跨效率，这些图像级可以捕获全球语义，以使学习过程受益。我们对公共机器人手术数据集Edovis18和公共白内障数据集Cadis进行了广泛评估我们的方法。实验结果证明了我们方法的有效性，在不同的标签率下始终超过了最先进的半监督方法，甚至超过了10.1％标签的Edsovis18上的完全监督的培训。

Surgical scene segmentation is fundamentally crucial for prompting cognitive assistance in robotic surgery. However, pixel-wise annotating surgical video in a frame-by-frame manner is expensive and time consuming. To greatly reduce the labeling burden, in this work, we study semi-supervised scene segmentation from robotic surgical video, which is practically essential yet rarely explored before. We consider a clinically suitable annotation situation under the equidistant sampling. We then propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation. It effectively leverages unlabeled data for a trusty and global model regularization that produces more discriminative feature representation. Concretely, for trusty representation learning, we propose to incorporate pseudo labels to instruct the pair selection, obtaining more reliable representation pairs for pixel contrast. Moreover, we expand the representation learning space from previous image-level to cross-video, which can capture the global semantics to benefit the learning process. We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS. Experimental results demonstrate the effectiveness of our method, consistently outperforming the state-of-the-art semi-supervised methods under different labeling ratios, and even surpassing fully supervised training on EndoVis18 with 10.1% labeling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题