4.6 Article

Automatic specimen identification of Harpacticoids (Crustacea: Copepoda) using Random Forest and MALDI-TOF mass spectra, including a post hoc test for false positive discovery

期刊

METHODS IN ECOLOGY AND EVOLUTION
卷 9, 期 6, 页码 1421-1434

出版社

WILEY
DOI: 10.1111/2041-210X.13000

关键词

false positive; machine learning tools; MALDI-TOF MS; Meiobenthos; proteomic fingerprint; random forest; species identification

类别

资金

  1. Land Niedersachsen [IBR B7]

向作者/读者索取更多资源

1. Ecological studies require accurate identification of specimens. This is very time consuming when processing plankton, meiobenthos or soil biota samples due to the presence of a high number of minute specimens. A solution to this problem may be MALDI-TOF MS, an emerging technique for identification of metazoan species. As an alternative to factory delivered software or clustering approaches, Random Forest (RF) models can be trained to identify species, using MALDI-TOF data. However, in a real-world scenario, RF models will fail in detecting species which were not included in the training dataset as well, thus producing false positives (misidentifications). 2. We produced MALDI-TOF MS spectra for meiofauna species and trained RF models, using MALDI-TOF bins as predictors and species as multi-level target class. We used the empirical beta distribution of the probability of class assignment in the model to design a post hoc test for false positive discovery. Two strategies increase the final accuracy of species identification: (1) class smoothing consisting of in silico observations of classes, created by bootstrapping the value of each predictor within each class and: (2) adding a null class to the training dataset by bootstrapping predictor values and shuffling predictor labels creating a class without multivariate signal. 3. We prove that RF is an excellent method for species identification, using MALDI-TOF MS data. The models are flexible enough to correctly classify observations created in silico by smoothing the classes. Our post hoc test unmasks false positive classifications successfully. Smoothing the classes and adding a null class to the training model attracts assignment of false positives to this class. In our example, a 100% false positive discovery could be achieved, while maintaining very high overall prediction accuracy. 4. Combining MALDI-TOF MS and RF models is a step towards a fully automatic species identification workflow that is particularly necessary for species-rich communities of small organism for ecological studies but also for routine monitoring. The post hoc test for false positive discovery can be applied to any RF multilevel classification model, not only in a biological context.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据