论文标题
内镜视频中实时仪器分割的多帧功能聚合
Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video
论文作者
论文摘要
基于深度学习的方法已在手术仪器分割方面取得了有希望的结果。但是,高计算成本可能会限制对时间敏感的任务的应用,例如用于机器人辅助手术的在线外科视频分析。此外,目前的方法仍可能遭受外科手术图像的挑战性条件,例如各种照明条件和血液的存在。我们提出了一个新颖的多帧特征聚合(MFFA)模块,以在复发模式下在时间和空间上汇总视频框架特征。通过在顺序框架上分配深度提取的计算负载,我们可以使用轻质编码器在每个时间步骤中降低计算成本。此外,公共外科视频通常不是逐帧标记的,因此我们开发了一种可以从单个标记的框架中随机合成外科手术框架序列以帮助网络训练的方法。我们证明,我们的方法可以在两个公共手术数据集上获得较高的性能。
Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.