☆ 4.5 Article

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement

PLANT PHENOMICS (2020)

期刊

PLANT PHENOMICS

卷 2020, 期 -, 页码 -

出版社

AMER ASSOC ADVANCEMENT SCIENCE

DOI: 10.34133/2020/1963251

关键词

类别

Agronomy Plant Sciences Remote Sensing

资金

Iowa State University Presidential Interdisciplinary Research Seed Grant
Iowa State University Plant Sciences Institute Faculty Scholar Award
Predictive Plant Phenomics NSF Research Traineeship [DGE-1545453]
National Science Foundation [OAC-1636865]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Many newly observed phenotypes are first described, then experimentally manipulated. These language-based descriptions appear in both the literature and in community datastores. To standardize phenotypic descriptions and enable simple data aggregation and analysis, controlled vocabularies and specific data architectures have been developed. Such simplified descriptions have several advantages over natural language: they can be rigorously defined for a particular context or problem, they can be assigned and interpreted programmatically, and they can be organized in a way that allows for semantic reasoning (inference of implicit facts). Because researchers generally report phenotypes in the literature using natural language, curators have been translating phenotypic descriptions into controlled vocabularies for decades to make the information computable. Unfortunately, this methodology is highly dependent on human curation, which does not scale to the scope of all publications available across all of plant biology. Simultaneously, researchers in other domains have been working to enable computation on natural language. This has resulted in new, automated methods for computing on language that are now available, with early analyses showing great promise. Natural language processing (NLP) coupled with machine learning (ML) allows for the use of unstructured language for direct analysis of phenotypic descriptions. Indeed, we have found that these automated methods can be used to create data structures that perform as well or better than those generated by human curators on tasks such as predicting gene function and biochemical pathway membership. Here, we describe current and ongoing efforts to provide tools for the plant phenomics community to explore novel predictions that can be generated using these techniques. We also describe how these methods could be used along with mobile speech-to-text tools to collect and analyze in-field spoken phenotypic descriptions for association genetics and breeding applications.

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement

期刊

PLANT PHENOMICS

出版社

AMER ASSOC ADVANCEMENT SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement

期刊

PLANT PHENOMICS

出版社

AMER ASSOC ADVANCEMENT SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文