☆ 4.0 Article

CharaParser for Fine-Grained Semantic Annotation of Organism Morphological Descriptions

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY (2012)

期刊

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

卷 63, 期 4, 页码 738-754

出版社

WILEY

DOI: 10.1002/asi.22618

关键词

类别

Computer Science, Information Systems Information Science & Library Science

资金

National Science Foundation [EF0849982]
Emerging Frontiers
Direct For Biological Sciences [0849982] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Biodiversity information organization is looking beyond the traditional document-level metadata approach and has started to look into factual content in textual documents to support more intelligent and semantic-based access. This article reports the development and evaluation of CharaParser, a software application for semantic annotation of morphological descriptions. CharaParser annotates semistructured morphological descriptions in such a detailed manner that all stated morphological characters of an organ are marked up in Extensible Markup Language1 format. Using an unsupervised machine learning algorithm and a general purpose syntactic parser as its key annotation tools, CharaParser requires minimal additional knowledge engineering work and seems to perform well across different description collections and/or taxon groups. The system has been formally evaluated on over 1,000 sentences randomly selected from Volume 19 of Flora of North American and Part H of Treatise on Invertebrate Paleontology. CharaParser reaches and exceeds 90% in sentence-wise recall and precision, exceeding other similar systems reported in the literature. It also significantly outperforms a heuristic rule-based system we developed earlier. Early evidence that enriching the lexicon of a syntactic parser with domain terms alone may be sufficient to adapt the parser for the biodiversity domain is also observed and may have significant implications.

CharaParser for Fine-Grained Semantic Annotation of Organism Morphological Descriptions

期刊

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CharaParser for Fine-Grained Semantic Annotation of Organism Morphological Descriptions

期刊

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文