4.7 Article

RCDPeaks: memory-efficient density peaks clustering of long molecular dynamics

Journal

BIOINFORMATICS
Volume 38, Issue 7, Pages 1863-1869

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btac021

Keywords

-

Funding

  1. Eiffel Scholarship Program of Excellence of Campus France [P744468L]
  2. Project Hubert Curien-Carlos J. Finlay [41814TM]
  3. Fondo Nacional de Desarrollo Cientifico y Tecnologico [CONICYT FONDECYT/INACH/POSTDOCTORADO] [3170107]

Ask authors/readers for more resources

This study introduces a novel clustering algorithm, DP+, that reduces the memory consumption in MD simulations. It outperforms the existing alternatives in terms of efficiency. The study also presents RCDPeaks as an improved variant, with features such as automatic parameter selection, candidate center screening, and optimized clustering results.
Motivation: Density Peaks is a widely spread clustering algorithm that has been previously applied to Molecular Dynamics (MD) simulations. Its conception of cluster centers as elements displaying both a high density of neighbors and a large distance to other elements of high density, particularly fits the nature of a geometrical converged MD simulation. Despite its theoretical convenience, implementations of Density Peaks carry a quadratic memory complexity that only permits the analysis of relatively short trajectories. Results: Here, we describe DP+, an exact novel implementation of Density Peaks that drastically reduces the RAM consumption in comparison to the scarcely available alternatives designed for MD. Based on DP+, we developed RCDPeaks, a refined variant of the original Density Peaks algorithm. Through the use of DP+, RCDPeaks was able to cluster a one-million frames trajectory using less than 4.5 GB of RAM, a task that would have taken more than 2 TB and about 3x more time with the fastest and less memory-hunger alternative currently available. Other key features of RCDPeaks include the automatic selection of parameters, the screening of center candidates and the geometrical refining of returned clusters. Availability and implementation: The source code and documentation of RCDPeaks are free and publicly available on GitHub (https://github.com/LQCT/RCDPeaks.git). Contact: roy_gonzalez@fq.uh.cu or daniel.platero@fq.uh.cu Supplementary information: Supplementary data are available at Bioinformatics online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available