4.5 Article

Scalable Algorithms for Large Competing Risks Data

Journal

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
Volume 30, Issue 3, Pages 685-693

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/10618600.2020.1841650

Keywords

Broken adaptive ridge; Fine-Gray model; l(0)-regularization; Massive sample size; Model selection/variable selection; Oracle property; Subdistribution hazard

Funding

  1. National Institutes of Health [K23DK103972]
  2. National Institute of Health [P30CA-16042, UL1TR000124-02, P50CA211015, U19AI135995]

Ask authors/readers for more resources

This article presents two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. The first contribution involves accelerating the broken adaptive ridge method (BAR) and addressing computational issues related to fitting the PSH model. New algorithms are introduced, including a cycBAR regression algorithm and a forward-backward scan algorithm, which significantly reduce computation costs and yield impressive speedups. The combination of these two algorithms can result in over 1000 fold speedups over the original BAR algorithm, demonstrating impressive scalability for large competing risks data.
This article develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate-based iteratively reweighted -penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In particular, we derive a new algorithm for BAR regression, named cycBAR, that performs cyclic updates of each coordinate using an explicit thresholding formula. The new cycBAR algorithm effectively avoids fitting multiple reweighted-penalizations and thus yields impressive speedups over the original BAR algorithm. Second, we address a pivotal computational issue related to fitting the PSH model. Specifically, the cost of computing the log-pseudo likelihood and its derivatives grows at the rate of with the sample size n in current implementations. We propose a novel forward-backward scan algorithm that reduces the computation costs to O(n). The proposed method applies to both unpenalized and penalized estimation for the PSH model and has exhibited drastic speedups over current implementations. Finally, combining the two algorithms can yield > 1000 fold speedups over the original BAR algorithm. Illustrations of the impressive scalability of our proposed algorithm for large competing risks data are given using both simulations and data from the United States Renal Data System. for this article are available online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available