☆ 4.7 Article

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

INFORMATION PROCESSING & MANAGEMENT (2018)

期刊

INFORMATION PROCESSING & MANAGEMENT

卷 54, 期 3, 页码 408-432

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2018.01.008

关键词

Natural language processing; Plagiarism detection; Syntactic-semantic; POS tagging; Chunking; Semantic role labelling

类别

Computer Science, Information Systems Information Science & Library Science

资金

Department of Science and Technology (DST), Govt. of India under SERB [SB/FTP/ETA-0212/2013]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The proposed work aims to explore and compare the potency of syntactic-semantic based linguistic structures in plagiarism detection using natural language processing techniques. The current work explores linguistic features, viz., part of speech tags, chunks and semantic roles in detecting plagiarized fragments and utilizes a combined syntactic-semantic similarity metric, which extracts the semantic concepts from WordNet lexical database. The linguistic information is utilized for effective pre-processing and for availing semantically relevant comparisons. Another major contribution is the analysis of the proposed approach on plagiarism cases of various complexity levels. The impact of plagiarism types and complexity levels, upon the features extracted is analyzed and discussed. Further, unlike the existing systems, which were evaluated on some limited data sets, the proposed approach is evaluated on a larger scale using the plagiarism corpus provided by PAN(1) competition from 2009 to 2014. The approach presented considerable improvement in comparison with the top-ranked systems of the respective years. The evaluation and analysis with various cases of plagiarism also reflected the supremacy of deeper linguistic features for identifying manually plagiarized data.

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文