Subscribe to RSS
DOI: 10.1055/s-0042-1743643
Automatic Assessment of Surgical Performance Using Intraoperative Video and Deep Learning: A Comparison with Expert Surgeon Video Review
Introduction: Intraoperative video contains sufficient information for experts to assess surgeon skill and provide feedback, but expert evaluation is rarely available. Deep neural networks (DNN) capable of interpreting video can provide rich feedback, but often require prodigious amounts of training data. The ability of a deep neural network to predict the outcome of an attempt at surgical hemostasis and accurately quantify blood loss has never been compared with that of expert skull base surgeons.
Methods: Simulated outcomes following carotid artery laceration (SOCAL) is a publicly available video dataset with 154 videos of surgeons managing a simulated carotid artery injury during endoscopic endonasal surgery. A model was developed which first extracted features from each frame of video individually using a DNN; these features were then passed in sequence to a long-short-term memory (LSTM) recurrent neural network. The model was trained on 134 videos (1 minute in length) and tested on 20 videos to predict: participant hemorrhage control ability (dichotomous success/failure) and cumulative blood loss (mL). Four skull base neurosurgeons with endoscopic expertise viewed the 20 test videos and predicted hemorrhage control ability, blood loss and overall technical skill (Likert's scale).
Results: Surgeons successfully predicted the outcome of the hemorrhage control attempt in 14/20 trials (sensitivity: 79%, specificity: 56%, positive predictive value [PPV] 69%, and negative predictive value [NPV]: 71%). The interrater reliability for predicting outcome between surgeons was 0.95. After training, the model correctly predicted outcome of the same videos in 17 of 20 trials (sensitivity = 100%, specificity = 66%, PPV = 79%, and NPV = 100%). The interrater reliability was 0.43 between the model and expert cohort. Expert surgeons estimated blood loss with root mean standard error (RMSE) of 350 mL (R2 = 0.64), while the model had a lower (superior) RMSE of 295 mL (R2 = 0.74). We validated the model by inputting video segments of known high technical proficiency and low technical proficiency, as adjudicated by the experts, and the model universally predicted success and failure appropriately. In further validation testing, we explored two trials where experts and the model both incorrectly predicted outcome. Providing the model with video containing the critical error from these trials resulted in correct prediction of task failure.
Conclusion: DNN can be trained to generate accurate predictions of hemorrhage control outcome and blood loss using small, surgical video datasets. After training, the DNN demonstrated similar outcome prediction and superior blood loss prediction compared with expert surgeons. Validation testing revealed that the DNN predictions were improved when provided with critical moments, just as human experts would be. A broad collection, classification, and annotation of surgical video could help develop DNN capable of predictions across a wide range of surgical tasks.
#
No conflict of interest has been declared by the author(s).
Publication History
Article published online:
15 February 2022
© 2022. Thieme. All rights reserved.
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany