4.6 Article

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph

Journal

ENTROPY
Volume 22, Issue 3, Pages -

Publisher

MDPI
DOI: 10.3390/e22030296

Keywords

data approximation; principal graphs; principal trees; topological grammars; software

Funding

  1. Ministry of Science and Higher Education of the Russian Federation [14.Y26.31.0022]
  2. Agence Nationale de la Recherche in the program Investissements d'Avenir (PRAIRIE 3IA Institute) [ANR-19-P3IA-0001]
  3. European Union [826121]
  4. Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation [2018-182734]
  5. ITMO Cancer SysBio program (MOSAIC)
  6. INCa PLBIO program (CALYS) [INCA_11692]
  7. Association Science et Technologie
  8. Institut de Recherches Internationales Servier
  9. doctoral school Frontieres de l'Innovation en Recherche et Education Programme Bettencourt

Ask authors/readers for more resources

Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available