4.7 Article

N-glycan fingerprint predicts alpha-fetoprotein negative hepatocellular carcinoma: A large-scale multicenter study

Journal

INTERNATIONAL JOURNAL OF CANCER
Volume 149, Issue 3, Pages 717-727

Publisher

WILEY
DOI: 10.1002/ijc.33564

Keywords

hepatocellular carcinoma (HCC); logistic regression (LR); N‐ glycan; random forest (RF); support vector machine (SVM)

Categories

Funding

  1. China National Key Projects for Infectious Disease [2018ZX10302205-003]
  2. Shanghai Science and Technology Commission [17411960500, 17JC1404500]
  3. Innovation Group Project of Shanghai Municipal Health Commission [2019CXJQ03]

Ask authors/readers for more resources

This study successfully identified 13 N-glycan structures as effective biomarkers for AFP-negative hepatocellular carcinoma (ANHCC) using differential gene expression screening and machine learning algorithms. The LR algorithm showed the best diagnostic performance in identifying ANHCC patients and demonstrated high accuracy in independent validation.
Alpha-fetoprotein (AFP)-negative hepatocellular carcinoma (ANHCC) patients account for more than 30% of the whole entity of HCC patients and are easily misdiagnosed. This three-phase study was designed to find and validate new ANHCC N-glycan markers which identified from The Cancer Genome Atlas (TCGA) database and noninvasive detection. Differentially expressed genes (DEGs) of N-glycan biosynthesis and degradation related genes were screened from TCGA database. Serum N-glycan structure abundances were analyzed using N-glycan fingerprint (NGFP) technology. Totally 1340 participants including ANHCC, chronic liver diseases and healthy controls were enrolled after propensity score matching (PSM). The Lasso algorithm was used to select the most significant N-glycan structures abundances. Three machine learning models [random forest (RF), support vector machine (SVM) and logistic regression (LR)] were used to construct the diagnostic algorithms. All 13N-glycan structure abundances analyzed by NGFP demonstrated significant and was enrolled by Lasso. Among the three machine learning models, LR algorithm demonstrated the best diagnostic performance for identifying ANHCC in training cohort (LR: AUC: 0.842, 95%CI: 0.784-0.899; RF: AUC: 0.825, 95%CI: 0.766-0.885; SVM: AUC: 0.610, 95%CI: 0.527-0.684). This LR algorithm achieved a high diagnostic performance again in the independent validation (AUC: 0.860, 95%CI: 0.824-0.897). Furthermore, the LR algorithm could stratify ANHCC into two distinct subgroups with high or low risks of overall survival and recurrence in follow-up validation. In conclusion, the biomarker panel consisting of 13N-glycan structures abundances using the best-performing algorithm (LR) was defined and indicative as an effective tool for HCC prediction and prognosis estimate in AFP negative subjects.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available