4.6 Article

An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature

Journal

PLOS ONE
Volume 9, Issue 7, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0102039

Keywords

-

Funding

  1. National High-Tech Research & Development Program of China 863 Program [2012AA011103]
  2. National Program on Key Basic Research Project of China 973 Program [2014CB347600]
  3. National Natural Science Foundation of China [61203312]
  4. Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry [1206c0805039]
  5. Key Science, Technology Program of Anhui Province [1206c0805039]
  6. Ministry of Education, Science, Sports and Culture, Japan
  7. Program for New Century Excellent Talents in University [NCET-12-0836]
  8. Open Project Program of the National Laboratory of Pattern Recognition (NLPR)
  9. [22240021]
  10. Grants-in-Aid for Scientific Research [22240021] Funding Source: KAKEN

Ask authors/readers for more resources

The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein-protein interactions extraction, and (2) Gene-suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available