4.5 Article

Boundary sampling to boost mutation testing for deep learning models

期刊

INFORMATION AND SOFTWARE TECHNOLOGY
卷 130, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.infsof.2020.106413

关键词

Software testing; Deep learning; Mutation testing; Boundary; Neural network

资金

  1. National Key R&D Program of China [2018YFB1003901]
  2. National Natural Science Foundation of China [61932012, 61872177, 61832009, 61772263, 61772259]

向作者/读者索取更多资源

The study introduces boundary sample selection (BSS) approach to select a smaller, sensitive, representative, and efficient subset of the test dataset for promoting mutation testing in DL models. The experimental results show that the subsets generated by BSS are smaller in size, superior in observing mutation effects, replaceable to a high degree in mutation score, and have better Mean Reciprocal Rank (MRR) values compared to the whole test sets. BSS can help reduce labeling cost, run mutation testing quickly, and identify killed mutants early.
Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is extremely important to gain accurate assessment of DL models. Recently, researchers have introduced mutation testing into DL testing, which applies mutation operators to generate mutants from DL models, and observes whether the test data can identify mutants to check the quality of test dataset. However, there still exist many factors (e.g., huge labeling efforts and high running cost) hindering the implementation of mutation testing for DL models. Objective: We desire for an approach to selecting a smaller, sensitive, representative and efficient subset of the whole test dataset to promote the current mutation testing (e.g., reduce labeling and running cost) for DL Models. Method: We propose boundary sample selection (BSS), which employs the distance of samples to decision boundary of DL models as the indicator to construct the appropriate subset. To evaluate the performance of BSS, we conduct an extensive empirical study with two widely-used datasets, three popular DL models, and 14 up-to-date DL mutation operators. Results : We observe that (1) The sizes of our subsets generated by BSS are much smaller (about 3%-20% of the whole test set). (2) Under most mutation operators, our subsets are superior (about 9.94-21.63) than the whole test sets in observing mutation effects. (3) Our subsets could replace the whole test sets to a very high degree (higher than 97%) when considering mutation score. (4) The MRR values of our proposed subsets are clearly better (about 2.28-13.19 times higher) than that of the whole test sets. Conclusions: The result shows that BSS can help testers save labelling cost, run mutation testing quickly and identify killed mutants early.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据