Journal
INTELLIGENT DATA ANALYSIS
Volume 23, Issue 1, Pages 133-158Publisher
IOS PRESS
DOI: 10.3233/IDA-173740
Keywords
Sentiment analysis; metaheuristic algorithm; ant colony optimization; k-nearest neighbour; text feature selection
Categories
Funding
- Universiti Pertahanan Nasional Malaysia
- Ministry of Education Malaysia
- Fundamental Research Grant Scheme [FRGS/1/2016/ICT02/UKM/01/2]
Ask authors/readers for more resources
In sentiment analysis, the high dimensionality of the feature vector is a key problem because it can decrease the accuracy of sentiment classification and make it difficult to obtain the optimum subset of features. To solve this problem, this study proposes a new text feature selection method that uses a wrapper approach, integrated with ant colony optimization (ACO) to guide the feature selection process. It also uses the k-nearest neighbour (KNN) as a classifier to evaluate and generate a candidate subset of optimum features. To test the subset of optimum features, algorithm dependency relations were used to find the relationship between the feature and the sentiment word in customer reviews. The output of the feature subset, which was derived using the proposed ACO-KNN algorithm, was used as an input to identify and extract sentiment words from sentences in customer reviews. The resulting relationship between features and sentiment words was tested and evaluated to determine the accuracy based on precision, recall, and F-score. The performance of the proposed ACO-KNN algorithm on customer review datasets was evaluated and compared with that of two hybrid algorithms from the literature, namely, the genetic algorithm with information gain and information gain with rough set attribute reduction. The results of the experiments showed that the proposed ACO-KNN algorithm was able to obtain the optimum subset of features and can improve the accuracy of sentiment classification.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available