☆ 4.5 Article

Accelerating sequence searching: dimensionality reduction method

KNOWLEDGE AND INFORMATION SYSTEMS (2009)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 20, 期 3, 页码 301-322

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-008-0180-0

关键词

Sequence similarity search; Sequence embedding; Index; Dimension reduction

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

National Natural Science Foundation of China [60703066]
National High-Tech Research and Development Plan of China (863) [2006AA12Z217]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Similarity search over long sequence dataset becomes increasingly popular in many emerging applications, such as text retrieval, genetic sequences exploring, etc. In this paper, a novel index structure, namely Sequence Embedding Multiset tree (SEM - tree), has been proposed to speed up the searching process over long sequences. The SEM-tree is a multi-level structure where each level represents the sequence data with different compression level of multiset, and the length of multiset increases towards the leaf level which contains original sequences. The multisets, obtained using sequence embedding algorithms, have the desirable property that they do not need to keep the character order in the sequence, i.e. shorter representation, but can reserve the majority of distance information of sequences. Each level of the tree serves to prune the search space more efficiently as the multisets utilize the predicability to finish the searching process beforehand and reduce the computational cost greatly. A set of comprehensive experiments are conducted to evaluate the performance of the SEM-tree, and the experimental results show that the proposed method is much more efficient than existing representative methods.

Accelerating sequence searching: dimensionality reduction method

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accelerating sequence searching: dimensionality reduction method

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文