3.8 Proceedings Paper

Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness?

Publisher

IEEE
DOI: 10.1109/ICST.2019.00033

Keywords

test suites; coverage criteria; empirical study

Funding

  1. National 973 Program of China [2015CB352201]
  2. Natural Science Foundation of China [61872008, 61861130363]
  3. National Science Foundation [CCF-1566589, CCF-1763906]
  4. Royal Society Newton Advanced Fellowship [NAF\R1\180142]
  5. Royal Society International Exchanges Cost Share [IE150982]
  6. ERC [741278]

Ask authors/readers for more resources

Code coverage is the most widely adopted criteria for measuring test effectiveness in software quality assurance. The performance of coverage criteria (in indicating test suites' effectiveness) has been widely studied in prior work. Most of the studies use randomly constructed pseudo test suites to facilitate data collection for correlation analysis, yet no previous work has systematically studied whether pseudo test suites would lead to inflated correlation results. This paper focuses on the potentially wide-spread threat with a study over 123 real-world Java projects. Following the typical experimental process of studying coverage criteria, we investigate the correlation between statement/assertion coverage and mutation score using both pseudo and original test suites. Except for direct correlation analysis, we control the number of assertions and the test suite size to conduct partial correlation analysis. The results reveal that 1) the correlation (between coverage criteria and mutation score) derived from pseudo test suites is much higher than from original test suites (from 0.21 to 0.39 higher in Kendall tau(b) value); 2) contrary to previously reported, statement coverage has a stronger correlation with mutation score than assertion coverage.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available