4.5 Article

Comparison of 8 methods for univariate statistical exclusion of pathological subpopulations for indirect reference intervals and biological variation studies

Journal

CLINICAL BIOCHEMISTRY
Volume 103, Issue -, Pages 16-24

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.clinbiochem.2022.02.006

Keywords

Biological variation; Reference intervals; Indirect approach; Data mining; Outlier; Outlier exclusion

Ask authors/readers for more resources

This study compares the performance of eight statistical methods in identifying and excluding data from pathological subpopulations. The Kosmic method, Vdl method 1, and Tukey's rule perform the best. High proportions and spreads of pathological subpopulations lead to reduced performance in statistical exclusion. It is important for laboratories to use clinical criteria to minimize the proportion of pathological subpopulations before analysis and choose the appropriate statistical method.
Background: Indirect reference intervals and biological variation studies heavily rely on statistical methods to separate pathological and non-pathological subpopulations within the same dataset. In recognition of this, we compare the performance of eight univariate statistical methods for identification and exclusion of values originating from pathological subpopulations. Methods: The eight approaches examined were: Tukey's rule with and without Box-Cox transformation; median absolute deviation; double median absolute deviation; Gaussian mixture models; van der Loo (Vdl) methods 1 and 2; and the Kosmic approach. Using four scenarios including lognormal distributions and varying the conditions through the number of pathological populations, central location, spread and proportion for a total of 256 simulated mixed populations. A performance criterion of +/- 0.05 fractional error from the true underlying lower and upper reference interval was chosen. Results: Overall, the Kosmic method was a standout with the highest number of scenarios lying within the acceptable error, followed by Vdl method 1 and Tukey's rule. Kosmic and Vdl method 1 appears to discriminate better the non-pathological reference population in the case of log-normal distributed data. When the proportion and spread of pathological subpopulations is high, the performance of statistical exclusion deteriorated considerably. Discussions: It is important that laboratories use a priori defined clinical criteria to minimise the proportion of pathological subpopulation in a dataset prior to analysis. The curated dataset should then be carefully examined so that the appropriate statistical method can be applied.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available