☆ 4.7 Article

An improved K-nearest-neighbor algorithm for text categorization

EXPERT SYSTEMS WITH APPLICATIONS (2012)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 39, Issue 1, Pages 1503-1509

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2011.08.040

Keywords

Text categorization; KNN text categorization; One-pass clustering; Spam filtering

Funding

National Natural Science Foundation of China [60673191]
Guangdong Province's Institutes of Higher Education [06Z012]
National Natural Science Foundation of Guangdong Province of China [9151026005000002]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications. (C) 2011 Elsevier Ltd. All rights reserved.

An improved K-nearest-neighbor algorithm for text categorization

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An improved K-nearest-neighbor algorithm for text categorization

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper