4.7 Review

Interpretability in the medical field: A systematic mapping and review study

Journal

APPLIED SOFT COMPUTING
Volume 117, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2021.108391

Keywords

Interpretability; Explainability; XAI; Medicine; Artificial intelligence; Machine learning; Systematic review

Ask authors/readers for more resources

The field of machine learning has been rapidly growing, especially in the medical field. However, the interpretability of ML models remains a challenge, hindering its adoption by physicians. This study conducted a systematic review of interpretability techniques applied in the medical field. The results showed an increase in studies on interpretability, with a focus on solution proposals and experiment-based evaluations. Diagnosis, oncology, and classification were the most frequently studied medical tasks and ML objectives. Artificial neural networks were the most commonly used ML black-box techniques. Global interpretability techniques, such as rules, were dominant in explanations. The study suggests the need for further research in disciplines beyond diagnosis and classification, the exploration of local interpretability techniques, and quantitative evaluation and physician involvement to gain trust in black-box models in medical environments.
Context: Recently, the machine learning (ML) field has been rapidly growing, mainly owing to the availability of historical datasets and advanced computational power. This growth is still facing a set of challenges, such as the interpretability of ML models. In particular, in the medical field, interpretability is a real bottleneck to the use of ML by physicians. Therefore, numerous interpretability techniques have been proposed and evaluated to help ML gain the trust of its users. Methods: This review was carried out according to the well-known systematic map and review process to analyze the literature on interpretability techniques when applied in the medical field with regard to different aspects: publication venues and publication year, contribution and empirical types, medical and ML disciplines and objectives, ML black-box techniques interpreted, interpretability techniques investigated, their performance and the best performing techniques, and lastly, the datasets used when evaluating interpretability techniques. Results: A total of 179 articles (1994-2020) were selected from six digital libraries: ScienceDirect, IEEE Xplore, ACM Digital Library, SpringerLink, Wiley, and Google Scholar. The results showed that the number of studies dealing with interpretability increased over the years with a dominance of solution proposals and experiment-based empirical type. Diagnosis, oncology, and classification were the most frequent medical task, discipline, and ML objective studied, respectively. Artificial neural networks were the most widely used ML black-box techniques investigated for interpretability. Additionally, global interpretability techniques focusing on a specific black-box model, such as rules, were the dominant explanation types, and most of the metrics used to evaluate interpretability were accuracy, fidelity, and number of rules. Moreover, the variety of the techniques used by the selected papers did not allow categorization at the technique level, and the high number of the sum of evaluations (671) of the articles raised a suspicion of subjectivity. Datasets that contained numerical and categorical attributes were the most frequently used in the selected studies. Conclusions: Further effort is needed in disciplines other than diagnosis and classification. Global techniques such as rules are the most used because of their comprehensibility to doctors, but new local techniques should be explored more in the medical field to gain more insights into the model's behavior. More experiments and comparisons against existing techniques are encouraged to determine the best performing techniques. Lastly, quantitative evaluation of interpretability and physicians' implications in interpretability techniques evaluation is highly recommended to evaluate how the techniques will perform in real-world scenarios. It can ensure the soundness of the techniques and help gain trust in black-box models in medical environments. (C) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available