4.7 Article

Approximating XGBoost with an interpretable decision tree

Journal

INFORMATION SCIENCES
Volume 572, Issue -, Pages 522-542

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2021.05.055

Keywords

Classification trees; Decision forest; Ensemble learning

Ask authors/readers for more resources

The increasing use of machine-learning models in critical domains has highlighted the importance of interpretable machine-learning models. Decision forests, especially Gradient Boosting Decision Trees (GBDT), are considered state-of-the-art in many classification challenges. This paper introduces a novel method for transforming any decision forest into an interpretable decision tree, providing transparency without compromising predictive performance like XGBoost.
The increasing usage of machine-learning models in critical domains has recently stressed the necessity of interpretable machine-learning models. In areas like healthcare, finary - the model consumer must understand the rationale behind the model output in order to use it when making a decision. For this reason, it is impossible to use black-box models in these scenarios, regardless of their high predictive performance. Decision forests, and in particular Gradient Boosting Decision Trees (GBDT), are examples of this kind of model. GBDT models are considered the state-of-the-art in many classification challenges, reflected by the fact that the majority of Kaggle's recent winners used GBDT methods as a part of their solution (such as XGBoost). But despite their superior predictive perfor-mance, they cannot be used in tasks that require transparency. This paper presents a novel method for transforming a decision forest of any kind into an interpretable decision tree. The method extends the tool-set available for machine learning practitioners, who want to exploit the interpretability of decision trees without significantly impairing the predic-tive performance gained by GBDT models like XGBoost. We show in an empirical evalua-tion that in some cases the generated tree is able to approximate the predictive performance of a XGBoost model while enabling better transparency of the outputs. (c) 2021 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available