4.6 Article

DeepDive: Declarative Knowledge Base Construction

期刊

COMMUNICATIONS OF THE ACM
卷 60, 期 5, 页码 93-102

出版社

ASSOC COMPUTING MACHINERY
DOI: 10.1145/3060586

关键词

-

资金

  1. Defense Advanced Research Projects Agency (DARPA) XDATA program [FA8750-12-2-0335]
  2. Defense Advanced Research Projects Agency (DARPA) DEFT program [FA8750-13-2-0039]
  3. Defense Advanced Research Projects Agency (DARPA) DARPA's MEMEX program
  4. Defense Advanced Research Projects Agency (DARPA) SIMPLEX program
  5. National Science Foundation (NSF) CAREER Award [IIS-1353606]
  6. Office of Naval Research (ONR) [N000141210041, N000141310129]
  7. National Institutes of Health Grant - National Institute of Biomedical Imaging and Bioengineering (NIBIB) through trans-NIH Big Data to Knowledge (BD2K) initiative [U54EB020405]
  8. Sloan Research Fellowship
  9. Moore Foundation
  10. American Family Insurance
  11. Google
  12. Toshiba

向作者/读者索取更多资源

The dark data extraction or knowledge base construction (KBC) problem is to populate a relational database with information from unstructured data sources, such as emails, webpages, and PDFs. KBC is a long-standing problem in industry and research that encompasses problems of data extraction, cleaning, and integration. We describe DeepDive, a system that combines database and machine learning ideas to help to develop KBC systems. The key idea in DeepDive is to frame traditional extract-transform-load (ETL) style data management problems as a single large statistical inference task that is declaratively defined by the user. DeepDive leverages the effectiveness and efficiency of statistical inference and machine learning for difficult extraction tasks, whereas not requiring users to directly write any probabilistic inference algorithms. Instead, domain experts interact with DeepDive by defining features or rules about the domain. DeepDive has been successfully applied to domains such as pharmacogenomics, paleobiology, and antihuman trafficking enforcement, achieving human-caliber quality at machine-caliber scale. We present the applications, abstractions, and techniques used in DeepDive to accelerate the construction of such dark data extraction systems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据