4.2 Article

Comparison of machine learning models for predicting the risk of breast cancer- related lymphedema in Chinese women

Journal

ASIA-PACIFIC JOURNAL OF ONCOLOGY NURSING
Volume 9, Issue 12, Pages -

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.apjon.2022.100101

Keywords

Breast cancer-related lymphedema; Machine learning; Na?ve Bayes; Logistic regression; K-nearest neighbors; Support vector machine; Multilayer perceptron; Prediction

Categories

Funding

  1. National Natural Science Foundation of China
  2. [72004039]

Ask authors/readers for more resources

This study aimed to develop and validate classification models using machine learning algorithms to predict breast cancer-related lymphedema (BCRL) in Chinese women. The logistic regression model achieved the best performance, and the most important predictors were the number of positive lymph nodes, BCRL occurring on the same side as the surgery, a history of sentinel lymph node biopsy, a dietary preference for meat and fried food, and an exercise frequency of less than three times per week.
Objective: Predictive models for the occurrence of cancer symptoms by using machine learning (ML) algorithms could be used to aid clinical decision-making in order to enhance the quality of cancer care. This study aimed to develop and validate a selection of classification models that used ML algorithms to predict the occurrence of breast cancer-related lymphedema (BCRL) among Chinese women. Methods: This was a retrospective cohort study of consecutive cases that had been diagnosed with breast cancer, stages I-IV. Forty-eight variables were grouped into five feature sets. Five classification models with ML algorithms were developed, and the models' performance and the variables' relative importance were assessed accordingly. Results: Of 370 eligible female participants, 91 had BCRL (24.6%). The mean age of this study sample was 49.89 (SD = 7.45). All participants had had breast cancer surgery, and more than half of them had had a modified radical mastectomy (n = 206, 55.5%). The mean follow-up time after breast cancer surgery was 28.73 months (SD =11.71). Most of the tumors were either stage I (n =49, 31.2%) or stage II (n = 252, 68.1%). More than half of the sample had had postoperative chemotherapy (n = 227, 61.4%). Overall, the logistic regression model achieved the best performance in terms of accuracy (91.6%), precision (82.1%), and recall (91.4%) for BCRL. Although this study included 48 predicting variables, we found that the five models required only 22 variables to achieve predictive performance. The most important variable was the number of positive lymph nodes, followed in descending order by the BCRL occurring on the same side as the surgery, a history of sentinel lymph node biopsy, a dietary preference for meat and fried food, and an exercise frequency of less than three times per week. These factors were the most influential predictors for enhancing the ML models' performance.Conclusions: This study found that in the ML training dataset, the multilayer perceptron model and the logistic regression model were the best discrimination models for predicting the outcome of BCRL, and the k-nearest neighbors and support vector machine models demonstrated good calibration performance in the ML validation dataset. Future research will need to use large-sample datasets to establish a more robust ML model for predicting BCRL deeply and reliably.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available