人类的凝视引导着手术活动识别的关注

论文标题

人类的凝视引导着手术活动识别的关注

Human Gaze Guided Attention for Surgical Activity Recognition

论文作者

Awale, Abdishakour, Sarikaya, Duygu

论文摘要

建模并自动识别手术活动是朝着手术自动化的基本步骤，并在向外科医生提供及时的反馈中起着重要作用。准确地识别视频中的手术活动带来了一个具有挑战性的问题，需要一种有效学习空间和时间动态的方法。人类的凝视和视觉显着性具有有关视觉注意力的重要信息，可用于提取更相关的特征，以更好地反映这些空间和时间动态。在这项研究中，我们建议在手术视频中使用人类凝视和时空注意机制进行活动识别。我们的模型由一个基于I3D的架构组成，使用3D卷积学习时空特征，并使用人类目光作为监督学习了注意力图。我们评估了拼图的缝合任务，这是一个公开可用的外科视频理解数据集。据我们所知，我们是第一个使用人类目光进行外科活动识别的人。我们的结果和消融研究支持使用人类凝视来指导注意力的贡献，该贡献优于最先进的模型，其精度为85.4％。

Modeling and automatically recognizing surgical activities are fundamental steps toward automation in surgery and play important roles in providing timely feedback to surgeons. Accurately recognizing surgical activities in video poses a challenging problem that requires an effective means of learning both spatial and temporal dynamics. Human gaze and visual saliency carry important information about visual attention and can be used to extract more relevant features that better reflect these spatial and temporal dynamics. In this study, we propose to use human gaze with a spatio-temporal attention mechanism for activity recognition in surgical videos. Our model consists of an I3D-based architecture, learns spatio-temporal features using 3D convolutions, as well as learns an attention map using human gaze as supervision. We evaluate our model on the Suturing task of JIGSAWS which is a publicly available surgical video understanding dataset. To our knowledge, we are the first to use human gaze for surgical activity recognition. Our results and ablation studies support the contribution of using human gaze to guide attention by outperforming state-of-the art models with an accuracy of 85.4%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题