☆ 4.7 Article

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

SCIENTIFIC REPORTS (2021)

Journal

SCIENTIFIC REPORTS

Volume 11, Issue 1, Pages -

Publisher

NATURE RESEARCH

DOI: 10.1038/s41598-021-81110-0

Keywords

Funding

French National Research Agency (ANR) [ANR-16-LCV1-0003-01]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper introduces an approach that combines machine learning and G-computation to deal with binary outcomes and exposures in small sample sizes. The super learner method outperformed other approaches in estimating individual outcome probabilities in counterfactual worlds, especially in terms of bias and variance, making it a powerful tool for drawing causal inferences even from small sample sizes.

In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Journal

SCIENTIFIC REPORTS

Publisher

NATURE RESEARCH

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Journal

SCIENTIFIC REPORTS

Publisher

NATURE RESEARCH

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper