4.7 Article

Sample cutting method for imbalanced text sentiment classification based on BRC

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 37, Issue -, Pages 451-461

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2012.09.003

Keywords

Imbalanced text set; Text sentiment classification; Sample cutting algorithm; Boundary region; Feature weight

Funding

  1. National Natural Science Foundation [61175067, 60970014, 61272095, 71031006]
  2. Natural Science Foundation of Shanxi Province [2010011021-1]
  3. Shanxi Foundation of Tackling Key Problem in Science and Technology [20110321027-02]
  4. Foundation of Doctoral Program Research of Ministry of Education of China [200801080006]

Ask authors/readers for more resources

The vast subjective texts spreading all over the Internet promoted the demand for text sentiment classification technology. A well-known fact that often weakens the performance of classifiers is the distribution imbalance of review texts on the positive-negative classes. In this paper, we pay attention to the sentiment classification problem of imbalanced text sets. With regards to this problem, the algorithm BRC for clarifying the disorder boundary is proposed by cutting the majority class samples in the dense boundary region. The classifier is constructed based on Support Vector Machine. In order to find the better feature weight scheme, combination strategy of sample cutting, and parameters in BRC, three groups of experiments are designed on six text sets about five domains. The experimental results show that the feature weight scheme Presence has the best performance. And the combination strategy BRC + RS can give a tradeoff between the evaluation measures, Precision and Recall on two categories and make the synthetical evaluation measure Accuracy obtain a larger increase. It should be noted that the method of determining the parameters alpha and beta in BRC is empirical. Although the boundary region cutting algorithm BRC is aimed to text sentiment classification we believe that it is also suitable to any two-category classification problem with imbalanced sample data. (C) 2012 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available