4.7 Article

The effect of statistical normalization on network propagation scores

Journal

BIOINFORMATICS
Volume 37, Issue 6, Pages 845-852

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa896

Keywords

-

Funding

  1. Spanish Ministry of Economy and Competitiveness (MINECO) [TEC2014-60337-R, DPI2017-89827-R]
  2. National Institutes of Health (NIH) [R01GM104400]
  3. Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), initiatives of Instituto de Investigacion Carlos III (ISCIII)
  4. Share4Rare project [780262]

Ask authors/readers for more resources

This study analyzed the statistical properties and bias of diffusion scores, finding that diffusion scores starting from binary labels are affected by label codification and have problem-dependent topological bias that can be removed by statistical normalization. Parametric and non-parametric normalization methods address the bias sources of mean value and variance, improving performance when the sought positive labels are not aligned with the bias. The decision on bias removal should be data-driven based on quantitative analysis of the bias and its relation to positive entities.
Motivation: Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterized some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results: Diffusion scores starting from binary labels were affected by the label codification and exhibited a problem-dependent topological bias that could be removed by the statistical normalization. Parametric and non-parametric normalization addressed both points by being codification-independent and by equalizing the bias. We identified and quantified two sources of bias-mean value and variance-that yielded performance differences when normalizing the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalization was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available