3.8 Article

Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts

Journal

Publisher

ALLERTON PRESS INC
DOI: 10.3103/S0005105517030116

Keywords

classification; thematic classification of texts; information theory; text compression; arXiv.org; CyberLeninka

Ask authors/readers for more resources

A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available