☆ 4.0 Article

Sequence embedding for fast construction of guide trees for multiple sequence alignment

ALGORITHMS FOR MOLECULAR BIOLOGY (2010)

期刊

ALGORITHMS FOR MOLECULAR BIOLOGY

卷 5, 期 -, 页码 -

出版社

BMC

DOI: 10.1186/1748-7188-5-21

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

资金

Science Foundation Ireland [07/IN.1/B1783]
Science Foundation Ireland (SFI) [07/IN.1/B1783] Funding Source: Science Foundation Ireland (SFI)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N-2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results: In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions: We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.

Sequence embedding for fast construction of guide trees for multiple sequence alignment

期刊

ALGORITHMS FOR MOLECULAR BIOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Sequence embedding for fast construction of guide trees for multiple sequence alignment

期刊

ALGORITHMS FOR MOLECULAR BIOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文