4.6 Article

Speech-Based Surgical Phase Recognition for Non-Intrusive Surgical Skills' Assessment in Educational Contexts

Journal

SENSORS
Volume 21, Issue 4, Pages -

Publisher

MDPI
DOI: 10.3390/s21041330

Keywords

natural language processing; surgical workflow analysis; speech analysis; machine learning; feature extraction; surgical process model

Funding

  1. Bioengineering and Telemedicine Center associated to the Universidad Politecnica de Madrid, Madrid, Spain

Ask authors/readers for more resources

Surgeons' procedural skills and intraoperative decision making are essential in clinical practice, but objective assessment remains a challenge. Surgical workflow analysis (SWA) using video signals and natural language processing (NLP) to analyze surgeons' speech offers a promising solution. A study successfully classified phases of a laparoscopic cholecystectomy and identified a support vector machine (SVM) coupled with a hidden-Markov model (HMM) as the most effective model for phase recognition, with an average accuracy of 82.95%.
Surgeons' procedural skills and intraoperative decision making are key elements of clinical practice. However, the objective assessment of these skills remains a challenge to this day. Surgical workflow analysis (SWA) is emerging as a powerful tool to solve this issue in surgical educational environments in real time. Typically, SWA makes use of video signals to automatically identify the surgical phase. We hypothesize that the analysis of surgeons' speech using natural language processing (NLP) can provide deeper insight into the surgical decision-making processes. As a preliminary step, this study proposes to use audio signals registered in the educational operating room (OR) to classify the phases of a laparoscopic cholecystectomy (LC). To do this, we firstly created a database with the transcriptions of audio recorded in surgical educational environments and their corresponding phase. Secondly, we compared the performance of four feature extraction techniques and four machine learning models to find the most appropriate model for phase recognition. The best resulting model was a support vector machine (SVM) coupled to a hidden-Markov model (HMM), trained with features obtained with Word2Vec (82.95% average accuracy). The analysis of this model's confusion matrix shows that some phrases are misplaced due to the similarity in the words used. The study of the model's temporal component suggests that further attention should be paid to accurately detect surgeons' normal conversation. This study proves that speech-based classification of LC phases can be effectively achieved. This lays the foundation for the use of audio signals for SWA, to create a framework of LC to be used in surgical training, especially for the training and assessment of procedural and decision-making skills (e.g., to assess residents' procedural knowledge and their ability to react to adverse situations).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available