☆ 4.6 Article

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

ENTROPY (2022)

期刊

ENTROPY

卷 24, 期 2, 页码 -

出版社

MDPI

DOI: 10.3390/e24020259

关键词

information theoretic models; synthetic language; entropy; Zipf-Mandelbrot-Li law; language models; behavior prediction

类别

Physics, Multidisciplinary

资金

University of Queensland and Trusted Autonomous Systems Defence Cooperative Research Centre [2019002828]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper discusses the importance of identifying probabilistic symbols and introducing symbolization methods for entropy-based models and synthetic languages. New symbolization algorithms are proposed and demonstrated with real-world data.

An important aspect of using entropy-based models and proposed synthetic languages, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf-Mandelbrot-Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

期刊

ENTROPY

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文