☆ 4.6 Article

Classifying the unknown: Insect identification with deep hierarchical Bayesian learning

METHODS IN ECOLOGY AND EVOLUTION (2023)

Journal

METHODS IN ECOLOGY AND EVOLUTION

Volume 14, Issue 6, Pages 1515-1530

Publisher

WILEY

DOI: 10.1111/2041-210X.14104

Keywords

biodiversity; classification; computer vision; deep learning; machine learning; undescribed species

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Machine learning can be used to create an accurate and efficient method for classifying insect species, including both described and undescribed species. A deep hierarchical Bayesian model is proposed, which can classify samples based on the taxonomic hierarchy of insects. The combination of image and DNA data in the model leads to significant improvement in classification accuracy.

Classifying insect species involves a tedious process of identifying distinctive morphological insect characters by taxonomic experts. Machine learning can harness the power of computers to potentially create an accurate and efficient method for performing this task at scale, given that its analytical processing can be more sensitive to subtle physical differences in insects, which experts may not perceive. However, existing machine learning methods are designed to only classify insect samples into described species, thus failing to identify samples from undescribed species. We propose a novel deep hierarchical Bayesian model for insect classification, given the taxonomic hierarchy inherent in insects. This model can classify samples of both described and undescribed species; described samples are assigned a species while undescribed samples are assigned a genus, which is a pivotal advancement over just identifying them as outliers. We demonstrated this proof of concept on a new database containing paired insect image and DNA barcode data from four insect orders, including 1040 species, which far exceeds the number of species used in existing work. A quarter of the species were excluded from the training set to simulate undescribed species. With the proposed classification framework using combined image and DNA data in the model, species classification accuracy for described species was 96.66% and genus classification accuracy for undescribed species was 81.39%. Including both data sources in the model resulted in significant improvement over including image data only (39.11% accuracy for described species and 35.88% genus accuracy for undescribed species), and modest improvement over including DNA data only (73.39% genus accuracy for undescribed species). Unlike current machine learning methods, the proposed deep hierarchical Bayesian learning approach can simultaneously classify samples of both described and undescribed species, a functionality that could become instrumental in biodiversity monitoring across the globe. This framework can be customized for any taxonomic classification problem for which image and DNA data can be obtained, thus making it relevant for use across all biological kingdoms.

Classifying the unknown: Insect identification with deep hierarchical Bayesian learning

Journal

METHODS IN ECOLOGY AND EVOLUTION

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Classifying the unknown: Insect identification with deep hierarchical Bayesian learning

Journal

METHODS IN ECOLOGY AND EVOLUTION

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper