Journal
JOURNAL OF BIOMEDICAL INFORMATICS
Volume 116, Issue -, Pages -Publisher
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2021.103695
Keywords
Coronary artery disease; Data mining; Profile-based fuzzy association rule mining; Risk factor; Patient profile
Ask authors/readers for more resources
Existing data mining solutions for identifying risk factors associated with diseases have limitations due to the use of crisp partitions for numerical features and lack of patient-specific profiles. This paper introduces a profile-based fuzzy association rule mining approach to accurately assess risk factors correlated with diseases. The proposed method shows higher partitioning accuracy and reasonable execution time compared to other methods.
The existing data mining solutions to identify risk factors associated with diseases are burdened with quite a few shortcomings. They usually use crisp partitions for numerical features and also do not use patient-specific profiles. These shortcomings create limitations for solving real problems. Discretizing a numerical feature through crisp partitions can also generate substantial partitioning errors, particularly for features whose values are closer to crisp boundaries. Since the normal range of each numerical feature varies according to the age, gender, and medical conditions of the patients, then ignoring these differences can undermine the accuracy of the extracted itemsets and rules. This paper presents a profile-based fuzzy association rule mining (PB-FARM) approach for the assessment of risk factors highly correlated with diseases. The proposed approach has three phases. Phase I involves creating profiles for patients based on their age, gender, and medical conditions, to determine a normal range of each numerical feature. Then fuzzy partitioning is done for all features (namely, numerical and categorical), and consequently, a structure, called FirstScan, is created. In Phase II, the FirstScan structure is utilized to mine for large fuzzy k-itemsets. Ultimately, in Phase III, the given k-itemsets are employed to generate fuzzy rules for associations between risk factors and diseases. To evaluate the performance of the proposed method the Z-Alizadeh Sani coronary artery disease (CAD) dataset, containing 303 records and 54 features, was used. The results show a positive correlation between typical chest pain and old age with the incidence of CAD. The comparisons made in this study showed that, firstly, the proposed algorithm has a higher partitioning accuracy than other methods, and secondly, it has a reasonably short execution time.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available