4.4 Article

Achieving scalable mutation-based generation of whole test suites

Journal

EMPIRICAL SOFTWARE ENGINEERING
Volume 20, Issue 3, Pages 783-812

Publisher

SPRINGER
DOI: 10.1007/s10664-013-9299-z

Keywords

Mutation testing; Test case generation; Search-based testing; Testing classes; Unit testing

Funding

  1. Google Focused Research Award on Test Amplification
  2. Norwegian Research Council

Ask authors/readers for more resources

Without complete formal specification, automatically generated software tests need to be manually checked in order to detect faults. This makes it desirable to produce the strongest possible test set while keeping the number of tests as small as possible. As commonly applied coverage criteria like branch coverage are potentially weak, mutation testing has been proposed as a stronger criterion. However, mutation based test generation is hampered because usually there are simply too many mutants, and too many of these are either trivially killed or equivalent. On such mutants, any effort spent on test generation would per definition be wasted. To overcome this problem, our search-based EvoSuite test generation tool integrates two novel optimizations: First, we avoid redundant test executions on mutants by monitoring state infection conditions, and second we use whole test suite generation to optimize test suites towards killing the highest number of mutants, rather than selecting individual mutants. These optimizations allowed us to apply EvoSuite to a random sample of 100 open source projects, consisting of a total of 8,963 classes and more than two million lines of code, leading to a total of 1,380,302 mutants. The experiment demonstrates that our approach scales well, making mutation testing a viable test criterion for automated test case generation tools, and allowing us to analyze the relationship of branch coverage and mutation testing in detail.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available