4.6 Article

LabeledIn: Cataloging labeled indications for human drugs

期刊

JOURNAL OF BIOMEDICAL INFORMATICS
卷 52, 期 -, 页码 448-456

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2014.08.004

关键词

Corpus annotation; Drug labels; Drug indications; Natural language processing

资金

  1. Intramural Research Program of the NIH - National Library of Medicine
  2. National Key Technology R&D Program of China [2013BA106B01]
  3. Fundamental Research Funds for the Central Universities [13R0101]

向作者/读者索取更多资源

Drug-disease treatment relationships, i.e., which drug(s) are indicated to treat which disease(s), are among the most frequently sought information in PubMed (R). Such information is useful for feeding the Google Knowledge Graph, designing computational methods to predict novel drug indications, and validating clinical information in EMRs. Given the importance and utility of this information, there have been several efforts to create repositories of drugs and their indications. However, existing resources are incomplete. Furthermore, they neither label indications in a structured way nor differentiate them by drug-specific properties such as dosage form, and thus do not support computer processing or semantic interoperability. More recently, several studies have proposed automatic methods to extract structured indications from drug descriptions: however, their performance is limited by natural language challenges in disease named entity recognition and indication selection. In response, we report Labeledln: a human-reviewed, machine-readable and source-linked catalog of labeled indications for human drugs. More specifically, we describe our semi-automatic approach to derive Labeledln from drug descriptions through human annotations with aids from automatic methods. As the data source, we use the drug labels (or package inserts) submitted to the FDA by drug manufacturers and made available in DailyMed. Our machine-assisted human annotation workflow comprises: (i) a grouping method to remove redundancy and identify representative drug labels to be used for human annotation, (ii) an automatic method to recognize and normalize mentions of diseases in drug labels as candidate indications, and (iii) a two-round annotation workflow for human experts to judge the precomputed candidates and deliver the final gold standard. In this study, we focused on 250 highly accessed drugs in PubMed Health, a newly developed public web resource for consumers and clinicians on prevention and treatment of diseases. These 250 drugs corresponded to more than 8000 drug labels (500 unique) in DailyMed in which 2950 candidate indications were pre-tagged by an automatic tool. After being reviewed independently by two experts, 1618 indications were selected, and additional 97 (missed by computer) were manually added, with an inter-annotator agreement of 88.35% as measured by the Kappa coefficient. Our final annotation results in Labeledln consist of 7805 drug-disease treatment relationships where drugs are represented as a triplet of ingredient, dose form, and strength. A systematic comparison of Labeledln with an existing computer-derived resource revealed significant discrepancies, confirming the need to involve humans in the creation of such a resource. In addition, Labeledln is unique in that it contains detailed textual context of the selected indications in drug labels, making it suitable for the development of advanced computational methods for the automatic extraction of indications from free text. Finally, motivated by the studies on drug nomenclature and medication errors in EMRs, we adopted a fine-grained drug representation scheme, which enables the automatic identification of drugs with indications specific to certain dose forms or strengths. Future work includes expanding our coverage to more drugs and integration with other resources. The Labeledln dataset and the annotation guidelines are available at http://ftp.ncbi.nlm.nih.gov/pub/lu/LabeledIn/. Published by Elsevier Inc.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据