☆ 4.7 Article Data Paper

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

SCIENTIFIC DATA (2022)

期刊

SCIENTIFIC DATA

卷 9, 期 1, 页码 -

出版社

NATURE PORTFOLIO

DOI: 10.1038/s41597-022-01432-0

关键词

类别

Multidisciplinary Sciences

资金

Projekt DEAL

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The past decades have witnessed substantial growth in digital data on the world's languages, leading to an increasing demand for cross-linguistic datasets. However, the lack of standardization in published datasets makes comparison difficult. This study presents a new approach to improve comparability by converting datasets to Cross-Linguistic Data Formats and demonstrates the benefits through automatic inference of phonological and lexical features.

The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

期刊

SCIENTIFIC DATA

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

期刊

SCIENTIFIC DATA

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文