期刊
COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING
卷 25, 期 8, 页码 887-895出版社
TAYLOR & FRANCIS LTD
DOI: 10.1080/10255842.2021.1985476
关键词
chronic kidney disease (CKD); clinical risk assessment; genetic programming; imbalanced classification; fitness function
Chronic kidney disease is a serious health concern affecting millions of Americans, and early diagnosis through machine learning can help prevent loss of life. This study proposes a new fitness function for dealing with imbalanced data sets in genetic programming, which outperforms other classification techniques in terms of accuracy and AUC values.
Chronic kidney disease (CKD) is one of the serious health concerns in the twenty-first century. CKD impacts over 37 million Americans. By applying machine learning (ML) techniques to clinical data, CKD can be diagnosed early. This early detection of CKD can prevent numerous loss of life. In this work, clinical data set of 400 patients, available on the UCI repository, are taken. Unfortunately, this data set doesn't have an equal distribution of CKD and Non-CKD samples. This imbalanced nature of data highly influences the learning capabilities of classifiers. Genetic Programming (GP) is an ML technique based on the evolution of species. GP with standard fitness function, also impacted by this imbalanced nature of data. A new Euclidean distance-based fitness function in GP is proposed to handle this imbalanced nature of the data set. To compare the robustness of the proposed work, other classification techniques, K-nearest neighborhood (KNN), KNN with particle swarm optimization (PSO), and GP with the standard fitness function, is also applied. For ten-fold cross-validation, the KNN shows an accuracy of 83.54% with an AUC value of 0.69, the PSO-KNN shows an accuracy of 96.79% with an AUC value of 0.94, and the GP, with the newly proposed fitness function, supersedes KNN and PSO-KNN and shows the accuracy of 99.33% with an AUC value of 0.99.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据