论文标题

单词长度的最佳性。理论基础和实证研究

The optimality of word lengths. Theoretical foundations and an empirical study

论文作者

Petrini, Sonia, Casas-i-Muñoz, Antoni, Cluet-i-Martinell, Jordi, Wang, Mengxue, Bentz, Christian, Ferrer-i-Cancho, Ramon

论文摘要

ZIPF的缩写定律,即更短的单词的趋势,被视为压缩的体现,即形式的最小化 - 一种自然交流的普遍原则。尽管对语言进行优化的说法已经变得时尚,但试图衡量语言优化程度的尝试却相当稀缺。在这里,我们提出了两个符合双重的最佳分数,即,它们相对于最小值和随机基线都进行了归一化。我们分析这些和其他分数的理论和统计优缺点。利用最佳分数,我们首次量化了语言中单词长度的最佳程度。这表明当单词长度以字符衡量时,语言平均为62%或67%(取决于源),当单词长度及时测量时,语言平均得分为62%。通常,口语持续时间比字符中的书面单词长度更优化。我们的作品铺平了衡量其他物种发声或手势的最佳程度,并将其与书面,口语或签名的人类语言进行比较的方式。

Zipf's law of abbreviation, namely the tendency of more frequent words to be shorter, has been viewed as a manifestation of compression, i.e. the minimization of the length of forms -- a universal principle of natural communication. Although the claim that languages are optimized has become trendy, attempts to measure the degree of optimization of languages have been rather scarce. Here we present two optimality scores that are dualy normalized, namely, they are normalized with respect to both the minimum and the random baseline. We analyze the theoretical and statistical pros and cons of these and other scores. Harnessing the best score, we quantify for the first time the degree of optimality of word lengths in languages. This indicates that languages are optimized to 62 or 67 percent on average (depending on the source) when word lengths are measured in characters, and to 65 percent on average when word lengths are measured in time. In general, spoken word durations are more optimized than written word lengths in characters. Our work paves the way to measure the degree of optimality of the vocalizations or gestures of other species, and to compare them against written, spoken, or signed human languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源