4.6 Article

Denoising large-scale biological data using network filters

期刊

BMC BIOINFORMATICS
卷 22, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-021-04075-x

关键词

Networks; Denoising; Machine learning

资金

  1. Interdisciplinary Quantitative Biology (IQ Biology) Program (NSF IGERT) at the BioFrontiers Institute, University of Colorado, Boulder [1144807]
  2. National Science Foundation [IIS-1452718]

向作者/读者索取更多资源

The study introduces a method for reducing noise in large-scale biological data sets by using an interaction network to identify related measurements that can be combined or filtered. Applying network filters before machine learning tasks improves accuracy in predicting changes in protein expression.
BackgroundLarge-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.ResultsWe describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or filtered to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.ConclusionsNetwork filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据