4.5 Article

Multiple multi-sample testing under arbitrary covariance dependency

Journal

STATISTICS IN MEDICINE
Volume 42, Issue 17, Pages 2944-2961

Publisher

WILEY
DOI: 10.1002/sim.9761

Keywords

false discovery proportion; hyperspectral imaging data; matrix-assisted laser desorption; ionization; multinomial regression; multiple marginal models

Ask authors/readers for more resources

Modern high-throughput biomedical devices generate large-scale data, and analyzing high-dimensional datasets is common in biomedical studies. This article proposes a procedure to simultaneously evaluate the strength of associations between a categorical response variable and multiple features. The proposed approach involves multiple testing under arbitrary correlation dependency among test statistics. It offers a trade-off between the expected numbers of true and false findings. The practical application of the method on hyperspectral imaging data obtained through a MALDI instrument is demonstrated.
Modern high-throughput biomedical devices routinely produce data on a large scale, and the analysis of high-dimensional datasets has become commonplace in biomedical studies. However, given thousands or tens of thousands of measured variables in these datasets, extracting meaningful features poses a challenge. In this article, we propose a procedure to evaluate the strength of the associations between a nominal (categorical) response variable and multiple features simultaneously. Specifically, we propose a framework of large-scale multiple testing under arbitrary correlation dependency among test statistics. First, marginal multinomial regressions are performed for each feature individually. Second, we use an approach of multiple marginal models for each baseline-category pair to establish asymptotic joint normality of the stacked vector of the marginal multinomial regression coefficients. Third, we estimate the (limiting) covariance matrix between the estimated coefficients from all marginal models. Finally, our approach approximates the realized false discovery proportion of a thresholding procedure for the marginal p-values for each baseline-category logit pair. The proposed approach offers a sensible trade-off between the expected numbers of true and false findings. Furthermore, we demonstrate a practical application of the method on hyperspectral imaging data. This dataset is obtained by a matrix-assisted laser desorption/ionization (MALDI) instrument. MALDI demonstrates tremendous potential for clinical diagnosis, particularly for cancer research. In our application, the nominal response categories represent cancer (sub-)types.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available