4.7 Article

Methodology to identify a gene expression signature by merging microarray datasets

期刊

COMPUTERS IN BIOLOGY AND MEDICINE
卷 159, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.compbiomed.2023.106867

关键词

Microarray data; Gene expression signature; Random forest; LSVM; Neural network; Heart failure; Autism spectrum disorder

向作者/读者索取更多资源

A large amount of microarray datasets have been produced to identify differentially expressed genes and gene expression signatures, which can contribute to disease diagnosis, prognosis, and therapeutic response. However, most datasets have limited statistical power due to their small sample sizes. To address this issue, we propose a methodology that merges microarray datasets and uses statistical methods along with supervised machine learning algorithms to identify gene expression signatures. This methodology has been validated in heart failure and autism spectrum disorder datasets, achieving high classification accuracy.
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据