4.6 Article

A Machine Reading System for Assembling Synthetic Paleontological Databases

期刊

PLOS ONE
卷 9, 期 12, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0113523

关键词

-

资金

  1. NSF EarthCube award [ACI-1343760]
  2. Defense Advanced Research Projects Agency (DARPA) XDATA Program [FA8750-12-2-0335]
  3. DEFT Program [FA8750-13-2-0039]
  4. DARPA's MEMEX program
  5. National Science Foundation (NSF) CAREER Award [IIS-1353606]
  6. Office of Naval Research (ONR) [N000141210041, N000141310129]
  7. Sloan Research Fellowship
  8. Moore Foundation
  9. American Family Insurance
  10. Google
  11. Toshiba
  12. Directorate For Geosciences
  13. ICER [1343760] Funding Source: National Science Foundation

向作者/读者索取更多资源

Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据