☆ 4.6 Article

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

PLOS COMPUTATIONAL BIOLOGY (2018)

期刊

PLOS COMPUTATIONAL BIOLOGY

卷 14, 期 2, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pcbi.1005962

关键词

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

Danish e-lnfrastructure Cooperation (ActionableBiomarkersDK)
Novo Nordisk Foundation [NNF14CC0001]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein +/- protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文