4.7 Article Data Paper

Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

期刊

SCIENTIFIC DATA
卷 9, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41597-022-01432-0

关键词

-

资金

  1. Projekt DEAL

向作者/读者索取更多资源

The past decades have witnessed substantial growth in digital data on the world's languages, leading to an increasing demand for cross-linguistic datasets. However, the lack of standardization in published datasets makes comparison difficult. This study presents a new approach to improve comparability by converting datasets to Cross-Linguistic Data Formats and demonstrates the benefits through automatic inference of phonological and lexical features.
The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据