☆ 4.7 Article

Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

ECOLOGICAL INFORMATICS (2022)

期刊

ECOLOGICAL INFORMATICS

卷 72, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.ecoinf.2022.101896

关键词

Machine learning; Computer vision; Image classification; Macroecology; Terrestrial invertebrates

类别

Ecology

资金

NSERC Discovery grant
NSF REU program
[DEB 1702426]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study presents a practical methodology of using machine learning in ecological data acquisition pipelines, training and testing algorithms to classify a large number of terrestrial invertebrate specimens. The study addresses issues of inconsistent taxonomic label specificity and unknown taxa classification. The results show that complex machine learning methods are not necessarily more accurate than traditional methods, and the inclusion of contextual metadata improves accuracy.

Despite growing concerns over the health of global invertebrate diversity, terrestrial invertebrate monitoring efforts remain poorly geographically distributed. Machine-assisted classification has been proposed as a potential solution to quickly gather large amounts of data; however, previous studies have often used unrealistic or idealized datasets to train and test their models. In this study, we describe a practical methodology for including machine learning in ecological data acqui-sition pipelines. Here we train and test machine learning algorithms to classify over 72,000 terrestrial inverte-brate specimens from morphometric data and contextual metadata. All vouchered specimens were collected in pitfall traps by the National Ecological Observatory Network (NEON) at 45 locations across the United States from 2016 to 2019. Specimens were photographed, and two separate machine learning paradigms were used to classify them. In the first, we used a convolutional neural network (ResNet-50), and in the second, we extracted morphometric data as feature vectors using ImageJ and used traditional machine learning methods to classify specimens. Issues stemming from inconsistent taxonomic label specificity were resolved by making classifications at the lowest identified taxonomic level (LITL). Taxa with too few specimens to be included in the training dataset were classified by the model using zero-shot classification. When classifying specimens that were known and seen by our models, we reached a maximum accuracy of 72.7% using eXtreme Gradient Boosting (XGBoost) at the LITL. This nearly matched the maximum accuracy achieved by the CNN of 72.8% at the LITL. Models that were trained without contextual metadata under-performed models with contextual metadata. We also classified invertebrate taxa that were unknown to the model using zero-shot classification, reaching a maximum accuracy of 65.5% when using the ResNet-50, compared to 39.4% when using XGBoost. The general methodology outlined here represents a realistic application of machine learning as a tool for ecological studies. We found that more advanced and complex machine learning methods such as convolutional neural networks are not necessarily more accurate than traditional machine learning methods. Hierarchical and LITL classifications allow for flexible taxonomic specificity at the input and output layers. These methods also help address the 'long tail' problem of underrepresented taxa missed by machine learning models. Finally, we encourage researchers to consider more than just morphometric data when training their models, as we have shown that the inclusion of contextual metadata can provide significant improvements to accuracy.

Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

期刊

ECOLOGICAL INFORMATICS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Embracing imperfection: Machine-assisted invertebrate classification in real-world datasets

期刊

ECOLOGICAL INFORMATICS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文