4.8 Article

Machine Learning Analysis of Naive B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls

期刊

FRONTIERS IN IMMUNOLOGY
卷 12, 期 -, 页码 -

出版社

FRONTIERS MEDIA SA
DOI: 10.3389/fimmu.2021.627813

关键词

celiac disease; BCR repertoire; immune response; machine learning; naï ve B-cells

资金

  1. Research Council of Norway [179573/V40]
  2. South-Eastern Norway Regional Health Authority [2016113]
  3. Stiftelsen KG Jebsen [SKGMED-017]
  4. ISF [832/16]
  5. European Union's Horizon 2020 research and innovation program [825821]
  6. H2020 Societal Challenges Programme [825821] Funding Source: H2020 Societal Challenges Programme

向作者/读者索取更多资源

Celiac disease is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins, with high heritability. The interaction between disease-specific antibodies and factors like HLA is likely crucial in pathogenesis, and machine learning classification models can be used for analyzing naive B cell receptor repertoires in CeD patients and healthy controls.
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naive T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naive B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naive BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naive BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naive BCR repertoires, as disease susceptibility markers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据