4.7 Article

Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 36, 期 5, 页码 9095-9104

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2008.12.046

关键词

Genetic algorithm; Text clustering; Ontology; Wordnet; Latent semantic indexing

资金

  1. Korea Research Foundation [KRF-2006-321-A00012]
  2. third stage of Brain Korea 21
  3. Korea Technology & Information Promotion Agency for SMEs (TIPA) [산협-07-02-113] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)
  4. National Research Foundation of Korea [과C6B1618, 2006-321-A00012, 2009-0076721] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

This paper proposes a self-organized genetic algorithm for text clustering based on ontology method. The common problem in the fields of text clustering is that the document is represented as a bag of words, while the conceptual similarity is ignored. We take advantage of thesaurus-based and corpus-based ontology to overcome this problem. However, the traditional corpus-based method is rather difficult to tackle. A transformed latent semantic indexing (LSI) model which can appropriately capture the associated semantic similarity is proposed and demonstrated as corpus-based ontology in this article. To investigate how ontology methods could be used effectively in text clustering, two hybrid strategies using various similarity measures are implemented. Experiments results show that our method of genetic algorithm in conjunction with the ontology strategy, the combination of the transformed LSI-based measure with the thesaurus-based measure, apparently outperforms that with traditional similarity measures. Our clustering algorithm also efficiently enhances the performance in comparison with standard GA and k-means in the same similarity environments. (C) 2008 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据