☆ 4.5 Article

On ontology-driven document clustering using core semantic features

KNOWLEDGE AND INFORMATION SYSTEMS (2011)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 28, 期 2, 页码 395-421

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-010-0370-4

关键词

Clustering; Information gain; Semantic features; Ontology; Dimensionality reduction

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Incorporating semantic knowledge from an ontology into document clustering is an important but challenging problem. While numerous methods have been developed, the value of using such an ontology is still not clear. We show in this paper that an ontology can be used to greatly reduce the number of features needed to do document clustering. Our hypothesis is that polysemous and synonymous nouns are both relatively prevalent and fundamentally important for document cluster formation. We show that nouns can be efficiently identified in documents and that this alone provides improved clustering. We next show the importance of the polysemous and synonymous nouns in clustering and develop a unique approach that allows us to measure the information gain in disambiguating these nouns in an unsupervised learning setting. In so doing, we can identify a core subset of semantic features that represent a text corpus. Empirical results show that by using core semantic features for clustering, one can reduce the number of features by 90% or more and still produce clusters that capture the main themes in a text corpus.

On ontology-driven document clustering using core semantic features

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On ontology-driven document clustering using core semantic features

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文