4.6 Article

Extracting and Analyzing Inorganic Material Synthesis Procedures in the Literature

期刊

IEEE ACCESS
卷 10, 期 -, 页码 31524-31537

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3160201

关键词

Data mining; Pipelines; Materials science and technology; Inorganic materials; Informatics; Material properties; Sodium; Artificial intelligence; data mining; information extraction; materials; materials science and technology; materials informatics; natural language processing; neural networks

向作者/读者索取更多资源

Materials informatics requires large-scale collection and analysis of material synthesis procedures described in the literature. This study constructs a pipeline system that extracts synthesis procedures from text and analyzes them as flow graphs. Evaluation and application of the system confirm the reasonableness of the extracted procedures.
Materials informatics requires large-scale collection and analysis of material synthesis procedures described in the literature for designing materials using computational methods. However, existing studies have not performed the paragraph-level analysis of the procedures. Moreover, since most of the synthesis procedures are described in natural language in articles and technical documents, it is necessary to structure them in a format that can be handled by computers through information extraction. Therefore, in this study, we construct a pipeline system that extracts synthesis procedures from text in the form of a flow graph and analyzes each procedure as a flow graph rather than a set of processes. The extraction system extracts entities by the deep learning model and relations between entities by the rule-based extractor from all paragraphs in the literature and selects procedures that include valid structures of entities and relations. Our evaluation of a benchmark dataset gave micro-averaged F-scores of 0.807, 0.830, and 0.609 for the entity extractor, relation extractor, and pipeline extractor, respectively. We applied this system to a large amount of literature and extracted approximately 90,000 flow graphs (procedures) containing approximately 4 million entities and 3 million relations. We performed several analyses, including taking statistics of the extracted graphs and checking frequent subgraphs for the extracted graphs. Commonly used methods in materials science were confirmed from our analyses; for example, ethanol is often dried by heating at 60 degrees C, and less-reactive noble gases are rarely included in the products. As a result, we experimentally confirmed that the extracted procedures were reasonable.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据