☆ 4.7 Article

Understanding and predicting Web content credibility using the Content Credibility Corpus

INFORMATION PROCESSING & MANAGEMENT (2017)

期刊

INFORMATION PROCESSING & MANAGEMENT

卷 53, 期 5, 页码 1043-1061

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2017.04.003

关键词

Web credibility; Crowdsourcing; Evaluating web site content; Credibility evaluation; Credibility issues

类别

Computer Science, Information Systems Information Science & Library Science

资金

Polish National Science Centre [2015/19/13/ST6/03179]
European Unions Horizon research and innovation programme under the Marie Skodowska-Curie [690962]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The goal of our research is to create a predictive model of Web content credibility evaluations, based on human evaluations. The model has to be based on a comprehensive set of independent factors that can be used to guide user's credibility evaluations in crowd-sourced systems like WOT, but also to design machine classifiers of Web content credibility. The factors described in this article are based on empirical data. We have created a dataset obtained from an extensive crowdsourced Web credibility assessment study (over 15 thousand evaluations of over 5000 Web pages from over 2000 participants). First, online participants evaluated a multi-domain corpus of selected Web pages. Using the acquired data and text mining techniques we have prepared a code book and conducted another crowdsourcing round to label textual justifications of the former responses. We have extended the list of significant credibility assessment factors described in previous research and analyzed their relationships to credibility evaluation scores. Discovered factors that affect Web content credibility evaluations are also weakly correlated, which makes them more useful for modeling and predicting credibility evaluations. Based on the newly identified factors, we propose a predictive model for Web content credibility. The model can be used to determine the significance and impact of discovered factors on credibility evaluations. These findings can guide future research on the design of automatic or semiautomatic systems for Web content credibility evaluation support. This study also contributes the largest credibility dataset currently publicly available for research: the Content Credibility Corpus (C3). (C) 2017 The Authors. Published by Elsevier Ltd.

Understanding and predicting Web content credibility using the Content Credibility Corpus

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Understanding and predicting Web content credibility using the Content Credibility Corpus

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文