☆ 4.7 Article

Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

INFORMATION PROCESSING & MANAGEMENT (2020)

期刊

INFORMATION PROCESSING & MANAGEMENT

卷 57, 期 6, 页码 -

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.ipm.2020.102269

关键词

Knowledge-based Systems; Algorithmic Metadata; Algorithm Search; Deep Learning; Bi-Directional LSTM; Information Retrieval; Full-text Articles

类别

Computer Science, Information Systems Information Science & Library Science

资金

NRPU Grant - Higher Education Commission of Pakistan [6857]
Thailand Science Research and Innovation (TSRI) [RSA6280105]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The advancements of search engines for traditional text documents have enabled the effective retrieval of massive textual information in a resource-efficient manner. However, such conventional search methodologies often suffer from poor retrieval accuracy especially when documents exhibit unique properties that behoove specialized and deeper semantic extraction. Recently, AlgorithmSeer, a search engine for algorithms has been proposed, that extracts pseudo-codes and shallow textual metadata from scientific publications and treats them as traditional documents so that the conventional search engine methodology could be applied. However, such a system fails to facilitate user search queries that seek to identify algorithm-specific information, such as the datasets on which algorithms operate, the performance of algorithms, and runtime complexity, etc. In this paper, a set of enhancements to the previously proposed algorithm search engine are presented. Specifically, we propose a set of methods to automatically identify and extract algorithmic pseudo-codes and the sentences that convey related algorithmic metadata using a set of machine-learning techniques. In an experiment with over 93,000 text lines, we introduce 60 novel features, comprising content-based, font style based and structure-based feature groups, to extract algorithmic pseudo-codes. Our proposed pseudo-code extraction method achieves 93.32% F1-score, outperforming the state-of-the-art techniques by 28%. Additionally, we propose a method to extract algorithmic-related sentences using deep neural networks and achieve an accuracy of 78.5%, outperforming a Rule-based model and a support vector machine model by 28% and 16%, respectively.

Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Learning-based Extraction of Algorithmic Metadata in Full-Text Scholarly Documents

期刊

INFORMATION PROCESSING & MANAGEMENT

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文