Journal
CRISPR JOURNAL
Volume 4, Issue 4, Pages 558-574Publisher
MARY ANN LIEBERT, INC
DOI: 10.1089/crispr.2021.0021
Keywords
-
Categories
Funding
- NC State Univer-sity
- NC Ag Foundation
Ask authors/readers for more resources
In this study, a machine learning approach was developed to detect and classify CRISPR loci using repeat sequences, allowing for efficient identification of CRISPR loci when cas gene information is unavailable. By utilizing biological attributes of CRISPR repeats and methods from natural language processing, the model achieved accurate classification of CRISPR loci in extensive metagenomic datasets.
Detection and classification of CRISPR-Cas systems in metagenomic data have become increasingly prevalent in recent years due to their potential for diverse applications in genome editing. Traditionally, CRISPR-Cas systems are classified through reference-based identification of proximate cas genes. Here, we present a machine learning approach for the detection and classification of CRISPR loci using repeat sequences in a cas-independent context, enabling identification of unclassified loci missed by traditional cas-based approaches. Using biological attributes of the CRISPR repeat, the core element in CRISPR arrays, and leveraging methods from natural language processing, we developed a machine learning model capable of accurate classification of CRISPR loci in an extensive set of metagenomes, resulting in an F1 measure of 0.82 across all predictions and an F1 measure of 0.97 when limiting to classifications with probabilities >0.85. Furthermore, assessing performance on novel repeats yielded an F1 measure of 0.96. Although the performance of cas-based identification will exceed that of a repeat based approach in many cases, CRISPRclassify provides an efficient approach to classification of CRISPR loci for cases in which cas gene information is unavailable, such as metagenomes and fragmented genome assemblies.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available