4.7 Article

A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

期刊

SCIENTIFIC REPORTS
卷 9, 期 -, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41598-018-38197-9

关键词

-

资金

  1. VIROGENESIS project
  2. European Union [634650]
  3. BBSRC [BB/M001121/1]
  4. BBSRC [BB/M001121/1] Funding Source: UKRI
  5. MRC [MC_UU_12014/12] Funding Source: UKRI

向作者/读者索取更多资源

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local 'texture' changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their `texture' compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at http://github.com/skouchaki/MrGBP.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据