☆ 4.5 Article

Paraphrase type identification for plagiarism detection using contexts and word embeddings

INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION (2021)

期刊

INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION

卷 18, 期 1, 页码 -

出版社

SPRINGER

DOI: 10.1186/s41239-021-00277-8

关键词

Plagiarism; Plagiarism detection; Paraphrase types; Synonymous substitution; Word reordering; Context matching; Word embeddings

类别

Education & Educational Research

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This research examines the importance of paraphrase types in plagiarism detection and proposes a three staged approach that utilizes context matching and pretrained word embeddings to identify synonymous substitution and word reordering. Experimental results suggest that using the Smith Waterman Algorithm and ConceptNet Numberbatch pretrained word embeddings achieves the best performance in terms of F1 scores.

Paraphrase types have been proposed by researchers as the paraphrasing mechanisms underlying acts of plagiarism. Synonymous substitution, word reordering and insertion/deletion have been identified as some of the common paraphrasing strategies used by plagiarists. However, similarity reports generated by most plagiarism detection systems provide a similarity score and produce matching sections of text with their possible sources. In this research we propose methods to identify two important paraphrase types - synonymous substitution and word reordering in paraphrased, plagiarised sentence pairs. We propose a three staged approach that uses context matching and pretrained word embeddings for identifying synonymous substitution and word reordering. Our proposed approach indicates that the use of Smith Waterman Algorithm for Plagiarism Detection and ConceptNet Numberbatch pretrained word embeddings produces the best performance in terms of F1 scores. This research can be used to complement similarity reports generated by currently available plagiarism detection systems by incorporating methods to identify paraphrase types for plagiarism detection.

Paraphrase type identification for plagiarism detection using contexts and word embeddings

期刊

INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Paraphrase type identification for plagiarism detection using contexts and word embeddings

期刊

INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文