4.5 Article

Partial sufficient variable screening with categorical controls

Journal

COMPUTATIONAL STATISTICS & DATA ANALYSIS
Volume 187, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.csda.2023.107784

Keywords

Categorical data; Conditional independence; Sufficient dimension reduction; Sure screening; Ultrahigh dimensional data analysis

Ask authors/readers for more resources

Variable screening is an important tool for dimension reduction in ultrahigh dimensional data analysis. This study proposes a partial sufficient variable screening method for the presence of control variables, which aims to reduce the predictive set without losing regression information. The method achieves variable screening by constraining the reduction of continuous variables using the subpopulations identified by categorical variables. The effectiveness of the method is demonstrated through simulation studies and an application in gene screening for diffuse large-B-cell lymphoma prognosis.
Variable screening as a fast and effective dimension reduction tool plays an important role in analyzing ultrahigh dimensional data. While a very large number of actual datasets contain both continuous and categorical variables, existing methods are mostly designed for continuous data. Partial sufficient variable screening, which aims to reduce the predictive set of primary interest without loss of regression information in the presence of some control variables, is proposed with theoretical guarantees. Specifically, for regression analyses involving mixed types of predictors, variable screening is approached under the notion of sufficiency by constraining the reduction of the continuous variables through the subpopulations identified by the categorical variables. The effectiveness of the proposed method is demonstrated through simulation studies encompassing a variety of regression and classification models, and an application in prognostic gene screening for diffuse large-B-cell lymphoma.Published by Elsevier B.V.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available