4.6 Article

CLARITY: comparing heterogeneous data using dissimilarity

期刊

ROYAL SOCIETY OPEN SCIENCE
卷 8, 期 12, 页码 -

出版社

ROYAL SOC
DOI: 10.1098/rsos.202182

关键词

linguistics; visualization; comparitive statistics

资金

  1. Wellcome Trust
  2. Royal Society Sir Henry Dale Fellowship [WT104125MA]
  3. OCSEAN - EU Research Executive Agency (Horizon 2020MSCA RISE 2019) [873207]
  4. German Research Foundation (DFG) [FOR 2237]
  5. European Research Council (ERC) under the Horizon 2020 research and innovation programme [834050]
  6. German Research Foundation (DFG) under Emmy-Noether [FOR 2237, NWG 391377018]
  7. European Research Council (ERC) [834050] Funding Source: European Research Council (ERC)

向作者/读者索取更多资源

Integrating datasets from different disciplines is challenging due to qualitative differences in data. The CLARITY method quantifies consistency, identifies inconsistencies, and allows comparison of similarity matrices. It is robust to noise, scales, and makes weak assumptions about data generation.
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation versus expression, evolution of language sounds versus word use, and country-level economic metrics versus cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a 'structural' component analogous to a clustering, and an underlying 'relationship' between those structures. This allows a 'structural comparison' between two similarity matrices using their predictability from 'structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from github.com/danjlawson/CLARITY.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据