☆ 4.7 Article

Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2016)

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Volume 28, Issue 8, Pages 1986-1999

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2016.2559481

Keywords

Data fusion; truth discovery; heterogeneous data

Funding

US National Science Foundation under Grant US National Science Foundation [IIS-1319973, CNS-1566374]
US Army Research Laboratory [W911NF-09-2-0053]
US Army Research Office [W911NF-13-1-0193]
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1566374] Funding Source: National Science Foundation
Div Of Information & Intelligent Systems
Direct For Computer & Info Scie & Enginr [1017362, 1320617] Funding Source: National Science Foundation
Div Of Information & Intelligent Systems
Direct For Computer & Info Scie & Enginr [1319973, 1618481] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.

Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper