4.5 Article

Missing value estimation of microarray data using Sim-GAN

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 64, 期 10, 页码 2661-2687

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-022-01718-0

关键词

Microarray data; Missing value; Structural similarity; Functional similarity; Semantic similarity; Generative adversarial network

向作者/读者索取更多资源

Microarray data analysis is important in cancer study. However, the complexity of the data extraction process leads to missing values which disrupt the analysis. This study proposes a novel method, Sim-GAN, which utilizes similarity index and generative adversarial network to estimate missing values. Experimental results show that the proposed method performs well in predicting meaningful expression values and outperforms existing techniques.
Microarray data analysis needs utmost care as it plays a significant role in cancer study. Due to the excessive complexity of the data extraction process, it loses some relevant information (missing values) which leads to a significant irrecoverable disruption from the actual scenario. The imputation of missing values is a crucial preprocessing step in analyzing microarray data. Currently, numerous methodologies have been designed to resolve the problem, but the unsatisfactory outcome is obtained with high missing rates of data. In order to estimate the missing expression to complete the dataset, a novel method has been proposed based on the similarity index and generative adversarial network (Sim-GAN). Firstly, the raw dataset has been divided into two subsets, i.e., the target set (which contains genes with missing expression values) and the candidate set (contains without missing values). In the next step, the similarity index between target genes and candidate genes has been obtained. As microarray data represents several biological factors, three similarity matrices (structural similarity, functional similarity, and semantic similarity) have been derived to find the small subset of candidate genes for each target gene. In structural similarity, a novel approach has been used to reduce the time complexity is O(1) as well as tackle the nonlinearity. Now, the obtained subsets are fed into a generative adversarial network to compute the missing values of the targeted genomes. The experimental outcomes consolidate the claim that the proposed methodology gives a satisfactory performance in terms of meaningful expression values. A detailed comparative study based on several statistical (i.e., NRMSE, AUROC, etc.) and biological (i.e., CPP, BLCI) metrics to confirm that the proposed Sim-GAN outperforms the existing missing value estimation techniques.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据