论文标题
LRTD:用于手术工作流识别的基于远程时间依赖性的主动学习
LRTD: Long-Range Temporal Dependency based Active Learning for Surgical Workflow Recognition
论文作者
论文摘要
视频中的自动手术工作流识别是开发计算机辅助和机器人辅助手术的基本基本却充满挑战的问题。深度学习的现有方法在分析外科视频的分析方面取得了出色的表现,但是,在很大程度上依靠大规模标记的数据集。不幸的是,注释通常很少有很多,因为它需要外科医生的领域知识。在本文中,我们提出了一种新型的主动学习方法,用于具有成本效益的手术视频分析。具体而言,我们提出了一个非本地复发卷积网络(NL-RCNET),该网络引入了非本地块,以捕获连续帧之间的远程时间依赖性(LRTD)。然后,我们制定一个折叠式依赖性评分,以表示该剪辑中的整体依赖性。通过在未标记的数据库中对剪辑中的分数进行排名,我们选择了依赖性较弱的剪辑来注释,这表明最有用的剪辑可以更好地使网络培训受益。我们通过执行手术工作流识别任务来验证大型外科视频数据集(Cholec80)的方法。通过使用基于LRTD的选择策略,我们可以胜过其他最先进的活跃学习方法。我们的方法最多只能使用50%的样本,可以超过全数据培训的性能。
Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods. Using only up to 50% of samples, our approach can exceed the performance of full-data training.