4.5 Article

Data Analytics for the Identification of Fake Reviews Using Supervised Learning

Journal

CMC-COMPUTERS MATERIALS & CONTINUA
Volume 70, Issue 2, Pages 3189-3204

Publisher

TECH SCIENCE PRESS
DOI: 10.32604/cmc.2022.019625

Keywords

E-commerce; fake reviews detection; methodologies; machine learning; hotel reviews

Ask authors/readers for more resources

Fake reviews have gained importance due to the increase in online marketing transactions. This study proposes an intelligent system using n-grams and sentiment scores to detect and classify fake reviews on e-commerce platforms. Four different machine learning techniques were used, and the results outperformed existing methods in terms of accuracy.
Fake reviews, also known as deceptive opinions, are used to mislead people and have gained more importance recently. This is due to the rapid increase in online marketing transactions, such as selling and purchasing. E-commerce provides a facility for customers to post reviews and comment about the product or service when purchased. New customers usually go through the posted reviews or comments on the website before making a purchase decision. However, the current challenge is how new individuals can distinguish truthful reviews from fake ones, which later deceives customers, inflicts losses, and tarnishes the reputation of companies. The present paper attempts to develop an intelligent system that can detect fake reviews on e commerce platforms using n-grams of the review text and sentiment scores given by the reviewer. The proposed methodology adopted in this study used a standard fake hotel review dataset for experimenting and data preprocessing methods and a term frequency-Inverse document frequency (TF-IDF) approach for extracting features and their representation. For detection and classification, n-grams of review texts were inputted into the constructed models to be classified as fake or truthful. However, the experiments were carried out using four different supervised machine-learning techniques and were trained and tested on a dataset collected from the Trip Advisor website. The classification results of these experiments showed that naive Bayes (NB), support vector machine (SVM), adaptive boosting (AB), and random forest (RF) received 88%, 93%, 94%, and 95%, respectively, based on testing accuracy and tje F1-score. The obtained results were compared with existing works that used the same dataset, and the proposed methods outperformed the comparable methods in terms of accuracy.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available