☆ 4.5 Article

Knowledge-based vector space model for text clustering

KNOWLEDGE AND INFORMATION SYSTEMS (2010)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 25, 期 1, 页码 35-55

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-009-0256-5

关键词

Text clustering; Knowledge-based VSM; Term similarity; Semantic relationship

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

National Natural Science Foundation of China [90820013, 60875031, 60905028]
973 project [2007CB311002]
Program for New Century Excellent Talents in University [NCET-06-0078]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM.

Knowledge-based vector space model for text clustering

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Knowledge-based vector space model for text clustering

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文