4.3 Article

ClearF plus plus : Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction

期刊

BIOENGINEERING-BASEL
卷 10, 期 7, 页码 -

出版社

MDPI
DOI: 10.3390/bioengineering10070824

关键词

feature selection; feature scoring; information theory; entropy; mutual information (MI); dimension reduction; low-dimensional embedding; reconstruction error; principal component analysis (PCA); clustering

向作者/读者索取更多资源

Feature selection methods are crucial for accurate disease classification and identifying informative biomarkers. ClearF++ addresses the limitations of previous methods by using reconstruction error from low-dimensional embeddings as a proxy for entropy term and incorporating feature-wise clustering. It outperforms other commonly used methods in terms of prediction accuracy and stability, making it valuable for biomedical data analysis.
Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据