3.8 Article

Computational metadata generation methods for biological specimen image collections

期刊

出版社

SPRINGER
DOI: 10.1007/s00799-022-00342-1

关键词

Bioinformatics; Metadata; Image analysis; Applied machine learning; Contrast enhancement

资金

  1. NSF Office of Advanced Cyberinfrastructure (OAC) [1940233, 1940322]
  2. Office of Advanced Cyberinfrastructure (OAC)
  3. Direct For Computer & Info Scie & Enginr [1940322, 1940233] Funding Source: National Science Foundation

向作者/读者索取更多资源

Metadata is crucial for applying machine learning to digitized biological specimens. This research extends previous work and improves computational methods to generate accurate metadata for fish specimens, demonstrating the ability of computational methods to enhance digital library services worldwide.
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据