4.4 Article

GRAPHICAL MODELS FOR ZERO-INFLATED SINGLE CELL GENE EXPRESSION

Journal

ANNALS OF APPLIED STATISTICS
Volume 13, Issue 2, Pages 848-873

Publisher

INST MATHEMATICAL STATISTICS
DOI: 10.1214/18-AOAS1213

Keywords

Gene network; single cell gene expression; graphical model; group lasso

Funding

  1. National Institute of Biomedical Imaging and Bioengineering, US National Institutes of Health [R01 EB008400]
  2. Bill and Melinda Gates foundation
  3. Vaccine and Immunology Statistical Center (VISC) [OPP1032317]
  4. NSF [DMS-1561814]

Ask authors/readers for more resources

Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene coregulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods, or in bulk data sets. A R implementation is available at https://github.com/amcdavid/HurdleNormal.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available