4.7 Article

Solvable null model for the distribution of word frequencies

期刊

PHYSICAL REVIEW E
卷 70, 期 4, 页码 -

出版社

AMER PHYSICAL SOC
DOI: 10.1103/PhysRevE.70.042901

关键词

-

资金

  1. Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [99/09644-9] Funding Source: FAPESP

向作者/读者索取更多资源

Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery. Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely it is to be used again. We argue that this model is equivalent to the neutral infinite-alleles model of population genetics and so the degeneracy of the different words composing a sample of text is given by the celebrated Ewens sampling formula [Theor. Pop. Biol. 3, 87 (1972)], which we show to produce an exponential distribution of word frequencies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据