☆ 4.6 Article

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

COMPLEX & INTELLIGENT SYSTEMS (2021)

Journal

COMPLEX & INTELLIGENT SYSTEMS

Volume 7, Issue 1, Pages 41-59

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s40747-020-00169-w

Keywords

Analytics; Evolutionary computing; Swarm optimization; Machine learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The study introduces a new method for achieving balance between treatment and control samples by minimizing the imbalance in covariate distributions, which is essential for drawing causal inference. By optimizing AUCs, the proposed approach demonstrates superior balancing compared to existing methods. Particle swarm optimization shows promising performance in minimizing covariate imbalance, utilizing evolutionary optimization techniques.

We suggest and evaluate a method for optimal construction of synthetic treatment and control samples for the purpose of drawing causal inference. The balance optimization subset selection problem, which formulates minimization of aggregate imbalance in covariate distributions to reduce bias in data, is a new area of study in operations research. We investigate a novel metric, cross-validated area under the receiver operating characteristic curve (AUC) as a measure of balance between treatment and control groups. The proposed approach provides direct and automatic balancing of covariate distributions. In addition, the AUC-based approach is able to detect subtler distributional differences than existing measures, such as simple empirical mean/variance and count-based metrics. Thus, optimizing AUCs achieves a greater balance than the existing methods. Using 5 widely used real data sets and 7 synthetic data sets, we show that optimization of samples using existing methods (Chi-square, mean variance differences, Kolmogorov-Smirnov, and Mahalanobis) results in samples containing imbalance that is detectable using machine learning ensembles. We minimize covariate imbalance by minimizing the absolute value of the distance of the maximum cross-validated AUC on M folds from 0.50, using evolutionary optimization. We demonstrate that particle swarm optimization (PSO) outperforms modified cuckoo swarm (MCS) for a gradient-free, non-linear noisy cost function. To compute AUCs, we use supervised binary classification approaches from the machine learning and credit scoring literature. Using superscore ensembles adds to the classifier-based two-sample testing literature. If the mean cross-validated AUC based on machine learning is 0.50, the two groups are indistinguishable and suitable for causal inference.

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

Journal

COMPLEX & INTELLIGENT SYSTEMS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Optimal subset selection for causal inference using machine learning ensembles and particle swarm optimization

Journal

COMPLEX & INTELLIGENT SYSTEMS

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper