4.5 Article

Hub Discovery in Partial Correlation Graphs

Journal

IEEE TRANSACTIONS ON INFORMATION THEORY
Volume 58, Issue 9, Pages 6064-6078

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIT.2012.2200825

Keywords

Asymptotic Poisson limits; correlation networks; discovery rate phase transitions; Gaussian graphical models (GGMs); nearest neighbor dependence; node degree and connectivity; p-value trajectories

Funding

  1. DIGITEO
  2. U.S. Army Research Office (ARO) [W911NF-11-1-0391]
  3. U.S. National Science Foundation (NSF) [CCF 0830490]
  4. NSF [DMS-05-05303, DMS-09-06392, SUFSC08-SUSHSTF09-SMSCVISG0906]
  5. Direct For Computer & Info Scie & Enginr [1217880, 0830490] Funding Source: National Science Foundation
  6. Direct For Mathematical & Physical Scien
  7. Division Of Mathematical Sciences [1106642] Funding Source: National Science Foundation
  8. Division of Computing and Communication Foundations [1217880, 0830490] Funding Source: National Science Foundation
  9. Division Of Mathematical Sciences
  10. Direct For Mathematical & Physical Scien [1025465] Funding Source: National Science Foundation

Ask authors/readers for more resources

One of the most important problems in large-scale inference problems is the identification of variables that are highly dependent on several other variables. When dependence is measured by partial correlations, these variables identify those rows of the partial correlation matrix that have several entries with large magnitudes, i.e., hubs in the associated partial correlation graph. This paper develops theory and algorithms for discovering such hubs from a few observations of these variables. We introduce a hub screening framework in which the user specifies both a minimum (partial) correlation rho and a minimum degree delta to screen the vertices. The choice of rho and delta can be guided by our mathematical expressions for the phase transition correlation threshold rho(c) governing the average number of discoveries. They can also be guided by our asymptotic expressions for familywise discovery rates under the assumption of large number of variables, fixed number of multivariate samples, and weak dependence. Under the null hypothesis that the dispersion (covariance) matrix is sparse, these limiting expressions can be used to enforce familywise error constraints and to rank the discoveries in order of increasing statistical significance. For n << p, the computational complexity of the proposed partial correlation screening method is low and is therefore highly scalable. Thus, it can be applied to significantly larger problems than previous approaches. The theory is applied to discovering hubs in a high-dimensional gene microarray dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available