4.7 Article

Annotating genes and genomes with DNA sequences extracted from biomedical articles

期刊

BIOINFORMATICS
卷 27, 期 7, 页码 980-986

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btr043

关键词

-

资金

  1. European Science Foundation
  2. Biotechnology and Biological Sciences Research Council [2076, BB/G000093/1, BB/E012868/1]
  3. European Commission [HEALTH-F4-2008-223210]
  4. BBSRC [BB/G000093/1, BB/E012868/1] Funding Source: UKRI
  5. Biotechnology and Biological Sciences Research Council [BB/E012868/1, BB/G000093/1] Funding Source: researchfish

向作者/读者索取更多资源

Motivation: Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study. Results: Here, we report the results of a novel text-mining approach that extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. We find that similar to 20% of open access articles in PubMed central (PMC) have extractable DNA sequences that can be accurately mapped to the correct gene (91%) and genome (96%). We illustrate the utility of data extracted by text2genome from more than 150 000 PMC articles for the interpretation of ChIP-seq data and the design of quantitative reverse transcriptase (RT)-PCR experiments. Conclusion: Our approach links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据