4.3 Article

ClearF plus plus : Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction

Journal

BIOENGINEERING-BASEL
Volume 10, Issue 7, Pages -

Publisher

MDPI
DOI: 10.3390/bioengineering10070824

Keywords

feature selection; feature scoring; information theory; entropy; mutual information (MI); dimension reduction; low-dimensional embedding; reconstruction error; principal component analysis (PCA); clustering

Ask authors/readers for more resources

Feature selection methods are crucial for accurate disease classification and identifying informative biomarkers. ClearF++ addresses the limitations of previous methods by using reconstruction error from low-dimensional embeddings as a proxy for entropy term and incorporating feature-wise clustering. It outperforms other commonly used methods in terms of prediction accuracy and stability, making it valuable for biomedical data analysis.
Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available