4.8 Article

CRISPRidentify: identification of CRISPR arrays using machine learning approach

期刊

NUCLEIC ACIDS RESEARCH
卷 49, 期 4, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkaa1158

关键词

-

资金

  1. German Research Foundation (DFG) [BA 2168/13-1 SPP 1590]
  2. Much more than Defence: the Multiple Functions and Facets of CRISPR-Cas
  3. Baden-Wuerttemberg Ministry of Science, Research and Art
  4. University of Freiburg
  5. Intel

向作者/读者索取更多资源

CRISPR-Cas are adaptive immune systems that rely on RNA components to degrade foreign genetic elements. Our tool, CRISPRidentify, uses machine learning to detect and differentiate true CRISPR arrays from false ones based on multiple features, providing detailed annotations to users. This approach not only accurately identifies known CRISPR arrays but also detects previously undetected CRISPR array candidates with a significantly reduced false positive rate.
CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRidentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据