4.6 Article

Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy

期刊

ENTROPY
卷 24, 期 10, 页码 -

出版社

MDPI
DOI: 10.3390/e24101398

关键词

feature extraction; tsallis entropy; biological sequence; information theory

资金

  1. Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES) [001]
  2. Google (LARA-2021)
  3. Universidade de Sao Paulo (USP)
  4. Sao Paulo Research Foundation (FAPESP) [2013/07375-0, 2021/08561-8]

向作者/读者索取更多资源

In recent years, the exponential growth in sequencing projects has posed new challenges for biological sequence analysis. Machine learning algorithms have been explored for analyzing and classifying biological sequences, despite the difficulty in finding suitable methods. This study proposes a novel Tsallis entropy-based feature extractor for classifying biological sequences, which has been proven to be effective and robust in terms of generalization through five case studies.
In recent years, there has been an exponential growth in sequencing projects due to accelerated technological advances, leading to a significant increase in the amount of data and resulting in new challenges for biological sequence analysis. Consequently, the use of techniques capable of analyzing large amounts of data has been explored, such as machine learning (ML) algorithms. ML algorithms are being used to analyze and classify biological sequences, despite the intrinsic difficulty in extracting and finding representative biological sequence methods suitable for them. Thereby, extracting numerical features to represent sequences makes it statistically feasible to use universal concepts from Information Theory, such as Tsallis and Shannon entropy. In this study, we propose a novel Tsallis entropy-based feature extractor to provide useful information to classify biological sequences. To assess its relevance, we prepared five case studies: (1) an analysis of the entropic index q; (2) performance testing of the best entropic indices on new datasets; (3) a comparison made with Shannon entropy and (4) generalized entropies; (5) an investigation of the Tsallis entropy in the context of dimensionality reduction. As a result, our proposal proved to be effective, being superior to Shannon entropy and robust in terms of generalization, and also potentially representative for collecting information in fewer dimensions compared with methods such as Singular Value Decomposition and Uniform Manifold Approximation and Projection.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据