☆ 3.8 Proceedings Paper

Detection of Opinion Spam with Character n-grams

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II (2015)

期刊

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II

卷 9042, 期 -, 页码 285-294

出版社

SPRINGER-VERLAG BERLIN

DOI: 10.1007/978-3-319-18117-2_21

关键词

Opinion spam; deceptive detection; character n-grams; word n-grams

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods Robotics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character n-grams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the effectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2.3% and 2.1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Naive Bayes classifier showed F-1 values up to 0.80 for both opinion polarities.

Detection of Opinion Spam with Character n-grams

期刊

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II

出版社

SPRINGER-VERLAG BERLIN

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Detection of Opinion Spam with Character n-grams

期刊

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II

出版社

SPRINGER-VERLAG BERLIN

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文