4.5 Review

Wide range of applications for machine-learning prediction models in orthopedic surgical outcome: a systematic review

Journal

ACTA ORTHOPAEDICA
Volume 92, Issue 5, Pages 526-+

Publisher

Medical Journal Sweden AB
DOI: 10.1080/17453674.2021.1932928

Keywords

-

Categories

Ask authors/readers for more resources

The study evaluated the development of clinical prediction models based on machine learning in orthopedic surgery, finding that medical management is the most commonly studied topic and neural networks are the most frequently employed algorithm. However, calibration and decision-curve analysis were generally poorly reported in these studies.
Background and purpose - Advancements in software and hardware have enabled the rise of clinical prediction models based on machine learning (ML) in orthopedic surgery. Given their growing popularity and their likely implementation in clinical practice we evaluated which outcomes these new models have focused on and what methodologies are being employed. Material and methods - We performed a systematic search in PubMed, Embase, and Cochrane Library for studies published up to June 18, 2020. Studies reporting on non-ML prediction models or non-orthopedic outcomes were excluded. After screening 7,138 studies, 59 studies reporting on 77 prediction models were included. We extracted data regarding outcome, study design, and reported performance metrics. Results - Of the 77 identified ML prediction models the most commonly reported outcome domain was medical management (17/77). Spinal surgery was the most commonly involved orthopedic subspecialty (28/77). The most frequently employed algorithm was neural networks (42/77). Median size of datasets was 5,507 (IQR 635-26,364). The median area under the curve (AUC) was 0.80 (IQR 0.73-0.86). Calibration was reported for 26 of the models and 14 provided decision-curve analysis. Interpretation - ML prediction models have been developed for a wide variety of topics in orthopedics. Topics regarding medical management were the most commonly studied. Heterogeneity between studies is based on study size, algorithm, and time-point of outcome. Calibration and decision-curve analysis were generally poorly reported.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available