4.5 Article

Boundary sampling to boost mutation testing for deep learning models

Journal

INFORMATION AND SOFTWARE TECHNOLOGY
Volume 130, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.infsof.2020.106413

Keywords

Software testing; Deep learning; Mutation testing; Boundary; Neural network

Funding

  1. National Key R&D Program of China [2018YFB1003901]
  2. National Natural Science Foundation of China [61932012, 61872177, 61832009, 61772263, 61772259]

Ask authors/readers for more resources

The study introduces boundary sample selection (BSS) approach to select a smaller, sensitive, representative, and efficient subset of the test dataset for promoting mutation testing in DL models. The experimental results show that the subsets generated by BSS are smaller in size, superior in observing mutation effects, replaceable to a high degree in mutation score, and have better Mean Reciprocal Rank (MRR) values compared to the whole test sets. BSS can help reduce labeling cost, run mutation testing quickly, and identify killed mutants early.
Context: The prevalent application of Deep Learning (DL) models has raised concerns about their reliability. Due to the data-driven programming paradigm, the quality of test datasets is extremely important to gain accurate assessment of DL models. Recently, researchers have introduced mutation testing into DL testing, which applies mutation operators to generate mutants from DL models, and observes whether the test data can identify mutants to check the quality of test dataset. However, there still exist many factors (e.g., huge labeling efforts and high running cost) hindering the implementation of mutation testing for DL models. Objective: We desire for an approach to selecting a smaller, sensitive, representative and efficient subset of the whole test dataset to promote the current mutation testing (e.g., reduce labeling and running cost) for DL Models. Method: We propose boundary sample selection (BSS), which employs the distance of samples to decision boundary of DL models as the indicator to construct the appropriate subset. To evaluate the performance of BSS, we conduct an extensive empirical study with two widely-used datasets, three popular DL models, and 14 up-to-date DL mutation operators. Results : We observe that (1) The sizes of our subsets generated by BSS are much smaller (about 3%-20% of the whole test set). (2) Under most mutation operators, our subsets are superior (about 9.94-21.63) than the whole test sets in observing mutation effects. (3) Our subsets could replace the whole test sets to a very high degree (higher than 97%) when considering mutation score. (4) The MRR values of our proposed subsets are clearly better (about 2.28-13.19 times higher) than that of the whole test sets. Conclusions: The result shows that BSS can help testers save labelling cost, run mutation testing quickly and identify killed mutants early.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available