Abstract:
Surgical interaction recognition (SIR) plays a crucial role in navigation decision support for minimally invasive surgery (MIS) or robot-assisted MIS. Currently, the rese...Show MoreMetadata
Abstract:
Surgical interaction recognition (SIR) plays a crucial role in navigation decision support for minimally invasive surgery (MIS) or robot-assisted MIS. Currently, the research in SIR is at a coarse-grained level and barely considers the surgical interaction dependencies unrelated to endoscopic images. This work proposes a fine-grained SIR method named SIRNet aiming at predicting surgical interaction triplets. In the proposed SIRNet, a multi-head self-attention mechanism learns the relations among surgical interaction triplets without defining them before the training process, while a multi-head cross-attention mechanism learns the relationships between the endoscopic images and each triplet. The bipartite matching loss, which considers the permutation and combination of instruments, verbs, and targets, is adopted to make appropriate learning and prediction for each component in the surgical interaction triplet. Moreover, a weight attention module is designed to weigh the importance of each predicted surgical interaction triplet and each component in triplet when predicting final valid surgical interaction triplets. The experimental results show the proposed method improves the performance of fine-grained SIR. In addition, experiments also present the effectiveness of each module. The code is available at https://github.com/cynerelee/SIRNet https://github.com/cynerelee/SIRNet.
Published in: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 2, April 2022)