☆ 4.4 Article

ScaLeKB: scalable learning and inference over large knowledge bases

VLDB JOURNAL (2016)

期刊

VLDB JOURNAL

卷 25, 期 6, 页码 893-918

出版社

SPRINGER

DOI: 10.1007/s00778-016-0444-3

关键词

Knowledge bases; Databases; Rule mining; Probabilistic reasoning

类别

Computer Science, Hardware & Architecture Computer Science, Information Systems

资金

NSF IIS Award [1526753]
DARPA [FA8750-12-2-0348-2]
Direct For Computer & Info Scie & Enginr
Div Of Information & Intelligent Systems [1526753] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Recent years have seen a drastic rise in the construction of web knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to the limitations of human knowledge, web corpora, and information extraction algorithms, the knowledge bases are still far from complete. To infer the missing knowledge, we propose the Ontological Pathfinding (OP) algorithm to mine first-order inference rules from these web knowledge bases. The OP algorithm scales up via a series of optimization techniques, including a new parallel-rule-mining algorithm, a pruning strategy to eliminate unsound and inefficient rules before applying them, and a novel partitioning algorithm to break the learning task into smaller independent sub-tasks. Combining these techniques, we develop a first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 h; no existing system achieves this scale. Based on the mining algorithm and the optimizations, we develop an efficient inference engine. As a result, we infer 0.9 billion new facts from Freebase in 17.19 h. We use cross validation to evaluate the inferred facts and estimate a degree of expansion by 0.6 over Freebase, with a precision approaching 1.0. Our approach outperforms state-of-the-art mining algorithms and inference engines in terms of both performance and quality.

ScaLeKB: scalable learning and inference over large knowledge bases

期刊

VLDB JOURNAL

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ScaLeKB: scalable learning and inference over large knowledge bases

期刊

VLDB JOURNAL

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文