☆ 4.7 Article

Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style

EXPERT SYSTEMS WITH APPLICATIONS (2013)

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Volume 40, Issue 9, Pages 3756-3763

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2012.12.082

Keywords

Text mining; Text classification; Plagiarism; Copy detection; Intrinsic plagiarism detection

Funding

INNOVA CORFO project [11DL2-10399]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Plagiarism detection is of special interest to educational institutions, and with the proliferation of digital documents on the Web the use of computational systems for such a task has become important. While traditional methods for automatic detection of plagiarism compute the similarity measures on a document-to-document basis, this is not always possible since the potential source documents are not always available. We do text mining, exploring the use of words as a linguistic feature for analyzing a document by modeling the writing style present in it. The main goal is to discover deviations in the style, looking for segments of the document that could have been written by another person. This can be considered as a classification problem using self-based information where paragraphs with significant deviations in style are treated as outliers. This so-called intrinsic plagiarism detection approach does not need comparison against possible sources at all, and our model relies only on the use of words, so it is not language specific. We demonstrate that this feature shows promise in this area, achieving reasonable results compared to benchmark models. (C) 2013 Elsevier Ltd. All rights reserved.

Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style

Journal

EXPERT SYSTEMS WITH APPLICATIONS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper