4.6 Article

Testing of the Effect of Missing Data Estimation and Distribution in Morphometric Multivariate Data Analyses

Journal

SYSTEMATIC BIOLOGY
Volume 61, Issue 6, Pages 941-954

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/sys047

Keywords

Crocodilia; deformation; fossil; incomplete; morphology; ordination; PCA; Procrustes; shape; taxonomy

Funding

  1. National Science and Engineering Research Council of Canada
  2. NSERC
  3. Doris O. and Samuel P. Welles Fund Travel Grant (UCMP)
  4. Dinosaur Research Institute Student Project Grant
  5. University of Toronto

Ask authors/readers for more resources

Missing data are an unavoidable problem in biological data sets and the performance of missing data deletion and estimation techniques in morphometric data sets is poorly understood. Here, a novel method is used to measure the introduced error of multiple techniques on a representative sample. A large sample of extant crocodilian skulls was measured and analyzed with principal component analysis (PCA). Twenty-three different proportions of missing data were introduced into the data set, estimated, analyzed, and compared with the original result using Procrustes superimposition. Previous work investigating the effects of missing data input missing values randomly, a non-biological phenomenon. Here, missing data were introduced into the data set using three methodologies: purely at random, as a function of the Euclidean distance between respective measurements (simulating anatomical regions), and as a function of the portion of the sample occupied by each taxon (simulating unequal missing data in rare taxa). Gower's distance was found to be the best performing non-estimation method, and Bayesian PCA the best performing estimation method. Specimens of the taxa with small sample sizes and those most morphologically disparate had the highest estimation error. Distribution of missing data had a significant effect on the estimation error for almost all methods and proportions. Taxonomically biased missing data tended to show similar trends to random, but with higher error rates. Anatomically biased missing data showed a much greater deviation from random than the taxonomic bias, and with magnitudes dependent on the estimation method.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available