使用深层神经网络进行区域建议和检测，机器人工具在机器人辅助手术视频中的检测和定位

论文标题

使用深层神经网络进行区域建议和检测，机器人工具在机器人辅助手术视频中的检测和定位

Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection

论文作者

Sarikaya, Duygu, Corso, Jason J., Guru, Khurshid A.

论文摘要

对机器人辅助手术（RAS）视频的视频理解是一个活跃的研究领域。建模外科医生的手势和技能水平提出了一个有趣的问题。提取的见解可用于有效的技能获取，客观的技能评估，实时反馈和人类机器人协作手术。我们使用严格的计算机视觉方法和深度学习的最新进展，为RAS视频理解中的工具检测和本地化问题提供了解决方案。我们建议使用多模式卷积神经网络进行架构，以快速检测和在RAS视频中定位工具。据我们所知，这种方法将是第一个在RAS视频中纳入用于工具检测和本地化的深神经网络的方法。我们的体系结构应用区域提案网络（RPN）和一个多模式的两个流卷积网络进行对象检测，以共同预测图像和时间运动提示的融合上的对象和定位。我们的结果平均精度（AP）为91％，平均计算时间为0.1秒，每个测试框架检测表明，我们的研究优于传统使用的医学成像方法，同时还强调了使用RPN提高精度和效率的好处。我们还介绍了一个新的数据集Atlas Dione，以了解RAS视频的理解。我们的数据集提供了来自罗斯威尔公园癌症研究所（RPCI）（纽约州布法罗）的十个外科医生的视频数据，该数据在Davinci手术系统（DVSS R）上执行了六项不同的手术任务，并用机器人工具的注释每一帧。

Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. We propose a solution to the tool detection and localization open problem in RAS video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. To our knowledge, this approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos. Our architecture applies a Region Proposal Network (RPN), and a multi-modal two stream convolutional network for object detection, to jointly predict objectness and localization on a fusion of image and temporal motion cues. Our results with an Average Precision (AP) of 91% and a mean computation time of 0.1 seconds per test frame detection indicate that our study is superior to conventionally used methods for medical imaging while also emphasizing the benefits of using RPN for precision and efficiency. We also introduce a new dataset, ATLAS Dione, for RAS video understanding. Our dataset provides video data of ten surgeons from Roswell Park Cancer Institute (RPCI) (Buffalo, NY) performing six different surgical tasks on the daVinci Surgical System (dVSS R ) with annotations of robotic tools per frame.

下载PDF全文

下载文献需遵守相关版权规定

论文标题