4.7 Article

A computational workflow for the detection of candidate diagnostic biomarkers of Kawasaki disease using time-series gene expression data

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
Volume 19, Issue -, Pages 3058-3068

Publisher

ELSEVIER
DOI: 10.1016/j.csbj.2021.05.036

Keywords

Systemic autoinflammatory diseases (SAIDs); Kawasaki disease (KD); Self-Organizing Maps (SOMs); Diagnostic biomarkers; Boosting ensembles

Funding

  1. European Union [779295]

Ask authors/readers for more resources

A computational workflow was developed to cluster Kawasaki disease patients and identify candidate diagnostic biomarker genes. Five novel genes were discovered as potential biomarkers for Kawasaki disease diagnosis. Training classifiers on these genes showed improved accuracy, sensitivity, and specificity compared to known genes.
Unlike autoimmune diseases, there is no known constitutive and disease-defining biomarker for systemic autoinflammatory diseases (SAIDs). Kawasaki disease (KD) is one of the undiagnosed types of SAIDs whose pathogenic mechanism and gene mutation still remain unknown. To address this issue, we have developed a sequential computational workflow which clusters KD patients with similar gene expression profiles across the three different KD phases (Acute, Subacute and Convalescent) and utilizes the resulting clustermap to detect prominent genes that can be used as diagnostic biomarkers for KD. Self-Organizing Maps (SOMs) were employed to cluster patients with similar gene expressions across the three phases through inter-phase and intra-phase clustering. Then, false discovery rate (FDR)-based feature selection was applied to detect genes that significantly deviate across the per-phase clusters. Our results revealed five genes as candidate biomarkers for KD diagnosis, namely, the HLA-DQB1, HLA-DRA, ZBTB48, TNFRSF13C, and CASD1. To our knowledge, these five genes are reported for the first time in the literature. The impact of the discovered genes for KD diagnosis against the known ones was demonstrated by training boosting ensembles (AdaBoost and XGBoost) for KD classification on common platform and cross-platform datasets. The classifiers which were trained on the proposed genes from the common platform data yielded an average increase by 4.40% in accuracy, 5.52% in sensitivity, and 3.57% in specificity than the known genes in the Acute and Subacute phases, followed by a notable increase by 2.30% in accuracy, 2.20% in sensitivity, and 4.70% in specificity in the cross-platform analysis. (C) 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available