☆ 4.6 Article Proceedings Paper

A scalable machine-learning approach to recognize chemical names within large text databases

BMC BIOINFORMATICS (2006)

期刊

BMC BIOINFORMATICS

卷 7, 期 -, 页码 -

出版社

BIOMED CENTRAL LTD

DOI: 10.1186/1471-2105-7-S2-S3

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Motivation: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. Results: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding similar to 93% recall in recognizing chemical terms and similar to 99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid.

A scalable machine-learning approach to recognize chemical names within large text databases

期刊

BMC BIOINFORMATICS

出版社

BIOMED CENTRAL LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A scalable machine-learning approach to recognize chemical names within large text databases

期刊

BMC BIOINFORMATICS

出版社

BIOMED CENTRAL LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文