4.6 Article

DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes

期刊

PLOS ONE
卷 7, 期 5, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0037840

关键词

-

资金

  1. Research Fund of Katholieke Universiteit Leuven [SymBioSys: CoE EF/05/007, OPTEC: CoE EF/05/006, GOA/2005/04, GOA-MaNet, CIF1, STRT1/08/023]
  2. IWT-Flanders [IWT/060045/SBO Bioframe]
  3. Fund for Scientific Research - Flanders [G.0321.06]
  4. Research Community (ICCoS)
  5. Research Community (ANMMM)
  6. Research Community (MLDM)
  7. Belgian Federal Science Policy Office [IUAP P6/03, P6/04]
  8. European Union (ERNSI)

向作者/读者索取更多资源

Background: In systems biology it is common to obtain for the same set of biological entities information from multiple sources. Examples include expression data for the same set of orthologous genes screened in different organisms and data on the same set of culture samples obtained with different high-throughput techniques. A major challenge is to find the important biological processes underlying the data and to disentangle therein processes common to all data sources and processes distinctive for a specific source. Recently, two promising simultaneous data integration methods have been proposed to attain this goal, namely generalized singular value decomposition (GSVD) and simultaneous component analysis with rotation to common and distinctive components (DISCO-SCA). Results: Both theoretical analyses and applications to biologically relevant data show that: (1) straightforward applications of GSVD yield unsatisfactory results, (2) DISCO-SCA performs well, (3) provided proper pre-processing and algorithmic adaptations, GSVD reaches a performance level similar to that of DISCO-SCA, and (4) DISCO-SCA is directly generalizable to more than two data sources. The biological relevance of DISCO-SCA is illustrated with two applications. First, in a setting of comparative genomics, it is shown that DISCO-SCA recovers a common theme of cell cycle progression and a yeast-specific response to pheromones. The biological annotation was obtained by applying Gene Set Enrichment Analysis in an appropriate way. Second, in an application of DISCO-SCA to metabolomics data for Escherichia coli obtained with two different chemical analysis platforms, it is illustrated that the metabolites involved in some of the biological processes underlying the data are detected by one of the two platforms only; therefore, platforms for microbial metabolomics should be tailored to the biological question. Conclusions: Both DISCO-SCA and properly applied GSVD are promising integrative methods for finding common and distinctive processes in multisource data. Open source code for both methods is provided.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据