4.2 Review

Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature

Journal

PHARMACOEPIDEMIOLOGY AND DRUG SAFETY
Volume 31, Issue 9, Pages 932-943

Publisher

WILEY
DOI: 10.1002/pds.5500

Keywords

causal inference; confounding; machine learning

Funding

  1. International Society of Pharmacoepidemiology

Ask authors/readers for more resources

This paper surveys current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies, focusing on feature generation, covariate prioritization, selection, and adjustment, as well as diagnostic assessment. While there is a large literature on methods for high-dimensional confounder prioritization/selection, there is relatively little written on best practices for feature generation and diagnostic assessment, indicating particular limitations and challenges in these areas. Machine-learning algorithms are showing promise in supplementing investigator-specified variables to improve confounding control in pharmacoepidemiologic studies, but further research is needed on best practices for feature generation and diagnostic assessment in this context.
Purpose Supplementing investigator-specified variables with large numbers of empirically identified features that collectively serve as 'proxies' for unspecified or unmeasured factors can often improve confounding control in studies utilizing administrative healthcare databases. Consequently, there has been a recent focus on the development of data-driven methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic research. In this paper, we survey current approaches and recent advancements for high-dimensional proxy confounder adjustment in healthcare database studies. Methods We discuss considerations underpinning three areas for high-dimensional proxy confounder adjustment: (1) feature generation-transforming raw data into covariates (or features) to be used for proxy adjustment; (2) covariate prioritization, selection, and adjustment; and (3) diagnostic assessment. We discuss challenges and avenues of future development within each area. Results There is a large literature on methods for high-dimensional confounder prioritization/selection, but relatively little has been written on best practices for feature generation and diagnostic assessment. Consequently, these areas have particular limitations and challenges. Conclusions There is a growing body of evidence showing that machine-learning algorithms for high-dimensional proxy-confounder adjustment can supplement investigator-specified variables to improve confounding control compared to adjustment based on investigator-specified variables alone. However, more research is needed on best practices for feature generation and diagnostic assessment when applying methods for high-dimensional proxy confounder adjustment in pharmacoepidemiologic studies.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available