4.7 Article

CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems

期刊

GIGASCIENCE
卷 9, 期 6, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giaa062

关键词

CRISPR-Cas; machine learning; Cas genes; Cas proteins

资金

  1. Federal Agency for Support and Evaluation of Graduate Education within the Ministry of Education of Brazil (CAPES) (Probral CAPES/DAAD grant ) [88887.302257/2018-00]
  2. Sao Paulo Research Foundation (FAPESP) [2013/07375-0, 2016/18615-0, 2019/21300-9]
  3. Intel
  4. German Research Foundation (DFG) [BA 2168/13-1 SPP 1590, BA 2168/23-1 SPP 2141]
  5. Baden-Wuerttemberg Ministry of Science, Research and Art
  6. University of Freiburg

向作者/读者索取更多资源

Background: CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for genome engineering in eukaryotic models. Results: We introduce CRISPRcasIdentifier, a new machine learning-based tool that combines regression and classification models for the prediction of potentially missing proteins in instances of CRISPR-Cas systems and the prediction of their respective subtypes. In contrast to other available tools, CRISPRcasIdentifier can both detect cas genes and extract potential association rules that reveal functional modules for CRISPR-Cas systems. In our experimental benchmark on the most recently published and comprehensive CRISPR-Cas system dataset, CRISPRcasIdentifier was compared with recent and state-of-the-art tools. According to the experimental results, CRISPRcasIdentifier presented the best Cas protein identification and subtype classification performance. Conclusions: Overall, our tool greatly extends the classification of CRISPR cassettes and, for the first time, predicts missing Cas proteins and association rules between Cas proteins. Additionally, we investigated the properties of CRISPR subtypes. The proposed tool relies not only on the knowledge of manual CRISPR annotation but also on models trained using machine learning.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据