使用空间时间图卷积网络朝着可推广的手术活动识别

论文标题

使用空间时间图卷积网络朝着可推广的手术活动识别

Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks

论文作者

Sarikaya, Duygu, Jannin, Pierre

论文摘要

对外科活动的建模和认可提出了一个有趣的研究问题。尽管许多最近的作品研究了对手术活动的自动识别，但这些作品在不同任务和不同数据集中的概括性仍然是一个挑战。我们介绍了一种对场景变化非常强大的方式，并且能够推断出诸如方向和相对空间关系之类的部分信息。所提出的模式基于视频中手术工具的空间时间图表示，以供手术活动识别。为了探索其有效性，我们以拟议的方式对外科手势进行建模并识别手术手势。我们构建了连接手术工具关节姿势估计的空间图。然后，我们将每个接头连接到相应的接头，形成代表关节轨迹的框架间边缘。然后，我们使用空间时间图卷积网络（ST-GCN）学习层次的空间时间图表示。我们的实验表明，学习的空间时间图表示即使单独使用，在手术识别中的表现也很好。我们尝试拼图数据集的缝合任务，在该数据集中，手势识别的机会基线为10％。我们的结果表明平均准确性68％，这表明有显着改善。学到的层次空间时间图表示可以单独使用，在级联或作为手术活动识别的互补方式中使用，因此为将来的研究提供了基准。据我们所知，我们的论文是第一个使用手术工具的空间时间图表示，并且通常是基于姿势的骨骼表示，以进行手术活动识别。

Modeling and recognition of surgical activities poses an interesting research problem. Although a number of recent works studied automatic recognition of surgical activities, generalizability of these works across different tasks and different datasets remains a challenge. We introduce a modality that is robust to scene variation, and that is able to infer part information such as orientational and relative spatial relationships. The proposed modality is based on spatial temporal graph representations of surgical tools in videos, for surgical activity recognition. To explore its effectiveness, we model and recognize surgical gestures with the proposed modality. We construct spatial graphs connecting the joint pose estimations of surgical tools. Then, we connect each joint to the corresponding joint in the consecutive frames forming inter-frame edges representing the trajectory of the joint over time. We then learn hierarchical spatial temporal graph representations using Spatial Temporal Graph Convolutional Networks (ST-GCN). Our experiments show that learned spatial temporal graph representations perform well in surgical gesture recognition even when used individually. We experiment with the Suturing task of the JIGSAWS dataset where the chance baseline for gesture recognition is 10%. Our results demonstrate 68% average accuracy which suggests a significant improvement. Learned hierarchical spatial temporal graph representations can be used either individually, in cascades or as a complementary modality in surgical activity recognition, therefore provide a benchmark for future studies. To our knowledge, our paper is the first to use spatial temporal graph representations of surgical tools, and pose-based skeleton representations in general, for surgical activity recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题