☆ 4.1 Article

CustFRE: An annotated dataset for extraction of family relations from English text

DATA IN BRIEF (2022)

Journal

DATA IN BRIEF

Volume 41, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.dib.2022.107980

Keywords

Natural language processing; Relation classification; Machine learning; Family relations

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Meaningful Information extraction is a crucial task, and it requires annotated datasets which are scarce. This manuscript presents a dataset, CustFRE, for extracting family relations from text, which can be used as a benchmark for evaluating and training family relation extraction systems.

Meaningful Information extraction is an extremely important and challenging task due to the ever growing size of data. Training and evaluating automated systems for the task requires annotated datasets which are rarely available because of the great amount of human effort and time required for annotating data. The dataset described in this manuscript, CustFRE, is meant for systems that learn extracting family relations from text. Sentences having at least two persons have been collected from the internet. The texts are first processed using Stanford's NLP pipeline for basic NLP tagging. Next, a team of natural language processing experts annotated the dataset. All family relations among persons in the texts have been annotated, or a no_relation is annotated if no family relation between two persons can be inferred from the text. After annotation, the dataset was verified by an NLP expert for completeness and correctness. CustFRE contains in total 2,716 annotations. The dataset can be used by information extraction researchers as a benchmark for evaluating their systems, and can also be used for training and evaluating family relation extraction systems. (C) 2022 The Authors. Published by Elsevier Inc.

CustFRE: An annotated dataset for extraction of family relations from English text

Journal

DATA IN BRIEF

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

CustFRE: An annotated dataset for extraction of family relations from English text

Journal

DATA IN BRIEF

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper