☆ 4.6 Article

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature

JOURNAL OF BIOMEDICAL INFORMATICS (2014)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 51, 期 -, 页码 191-199

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2014.05.013

关键词

Text mining; Drug side effect; Drug discovery; Drug repositioning; Drug toxicity prediction

类别

Computer Science, Interdisciplinary Applications Medical Informatics

资金

CWRU/Cleveland Clinic CTSA [UL1 RR024989]
Computational Genomic Epidemiology of Cancer (CoGEC)
ThinTek LLC

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for drug target discovery, drug repositioning, and drug toxicity prediction. However, currently available drug-SE association databases are far from being complete. Herein, in an effort to increase the data completeness of current drug-SE relationship resources, we present an automatic learning approach to accurately extract drug-SE pairs from the vast amount of published biomedical literature, a rich knowledge source of side effect information for commercial, experimental, and even failed drugs. For the text corpus, we used 119,085,682 MEDLINE sentences and their parse trees. We used known drug-SE associations derived from US Food and Drug Administration (FDA) drug labels as prior knowledge to find relevant sentences and parse trees. We extracted syntactic patterns associated with drug-SE pairs from the resulting set of parse trees. We developed pattern-ranking algorithms to prioritize drug-SE-specific patterns. We then selected a set of patterns with both high precisions and recalls in order to extract drug-SE pairs from the entire MEDLINE. In total, we extracted 38,871 drug-SE pairs from MEDLINE using the learned patterns, the majority of which have not been captured in FDA drug labels to date. On average, our knowledge-driven pattern-learning approach in extracting drug-SE pairs from MEDLINE has achieved a precision of 0.833, a recall of 0.407, and an F1 of 0.545. We compared our approach to a support vector machine (SVM)-based machine learning and a co-occurrence statistics-based approach. We show that the pattern-learning approach is largely complementary to the SVM- and co-occurrence-based approaches with significantly higher precision and F1 but lower recall. We demonstrated by correlation analysis that the extracted drug side effects correlate positively with both drug targets, metabolism, and indications. (C) 2014 Elsevier Inc. All rights reserved.

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文