☆ 4.8 Article

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

NPJ COMPUTATIONAL MATERIALS (2023)

Journal

NPJ COMPUTATIONAL MATERIALS

Volume 9, Issue 1, Pages -

Publisher

NATURE PORTFOLIO

DOI: 10.1038/s41524-023-01003-w

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study used natural language processing methods to extract material property data from polymer literature abstracts. By training the MaterialsBERT language model, we obtained around 300,000 material property records for analysis in various applications such as fuel cells, supercapacitors, and polymer solar cells.

The ever-increasing number of materials science articles makes it hard to infer chemistry-structure-property relations from literature. We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature. As a component of our pipeline, we trained MaterialsBERT, a language model, using 2.4 million materials science abstracts, which outperforms other baseline models in three out of five named entity recognition datasets. Using this pipeline, we obtained similar to 300,000 material property records from similar to 130,000 abstracts in 60 hours. The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells to recover non-trivial insights. The data extracted through our pipeline is made available at which can be used to locate material property data recorded in abstracts. This work demonstrates the feasibility of an automatic pipeline that starts from published literature and ends with extracted material property information.

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

Journal

NPJ COMPUTATIONAL MATERIALS

Publisher

NATURE PORTFOLIO

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing

Journal

NPJ COMPUTATIONAL MATERIALS

Publisher

NATURE PORTFOLIO

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper