4.7 Article

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity

期刊

BIOINFORMATICS
卷 25, 期 15, 页码 1944-1951

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btp338

关键词

-

资金

  1. BIRD of Japan Science and Technology Agency (JST)
  2. Shanghai Committee of Science and Technology, China [08DZ2271800, 09DZ2272800]
  3. State Key Lab of Bio-Organic & Natural Products Chemistry, CAS

向作者/读者索取更多资源

Motivation: Clustering MEDLINE documents is usually conducted by the vector space model, which computes the content similarity between two documents by basically using the inner-product of their word vectors. Recently, the semantic information of MeSH (Medical Subject Headings) thesaurus is being applied to clustering MEDLINE documents by mapping documents into MeSH concept vectors to be clustered. However, current approaches of using MeSH thesaurus have two serious limitations: first, important semantic information may be lost when generating MeSH concept vectors, and second, the content information of the original text has been discarded. Methods: Our new strategy includes three key points. First, we develop a sound method for measuring the semantic similarity between two documents over the MeSH thesaurus. Second, we combine both the semantic and content similarities to generate the integrated similarity matrix between documents. Third, we apply a spectral approach to clustering documents over the integrated similarity matrix. Results: Using various 100 datasets of MEDLINE records, we conduct extensive experiments with changing alternative measures and parameters. Experimental results show that integrating the semantic and content similarities outperforms the case of using only one of the two similarities, being statistically significant. We further find the best parameter setting that is consistent over all experimental conditions conducted. We finally show a typical example of resultant clusters, confirming the effectiveness of our strategy in improving MEDLINE document clustering.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据