☆ 4.7 Article

A data mining approach to discover unusual folding regions in genome sequences

KNOWLEDGE-BASED SYSTEMS (2002)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 15, 期 4, 页码 243-250

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/S0950-7051(01)00146-0

关键词

data mining; statistical model; RNA/DNA folding; UFR

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Numerous experiments and analyses of RNA structures have revealed that the local distinct structure closely correlates with the biological function. In this study, we present a data mining approach to discover such unusual folding regions (UFRs) in genome sequences. Our approach is a three-step procedure. During the first step, the quality of a local structure different from a random folding in a genomic sequence is evaluated by two z-scores, significance score (SIGSCR) and stability score (STBSCR) of the local segment. The two scores are computed by sliding a fixed window stepped a base along the sequence from the start to end position. Next, based on the non-central Student's t distribution theory we derive a linearly transformed non-central Student's t distribution (LTNSTD) to describe the distribution of SIGSCR and STBSCR computed in the sequence. In the third step, we extract these significant UFRs from the sequence whose SIGSCR and/or STBSCR are greater or less than a given threshold calculated from the derived LTNSTD. Our data mining approach is successfully applied to the complete genome of Mycoplasma genitalium (M. gen) and discovers these statistical extremes in the genome. By comparisons with the two scores computed from randomly shuffled sequences of the entire M. gen genome, our results demonstrate that the UFRs in the M. gen sequence are not selected by chance. These UFRs may imply an important structure role involved in their sequence information. (C) 2002 Elsevier Science B.V. All rights reserved.

A data mining approach to discover unusual folding regions in genome sequences

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A data mining approach to discover unusual folding regions in genome sequences

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文