4.6 Article

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

期刊

BMC BIOINFORMATICS
卷 23, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-022-05065-3

关键词

Machine learning; Scalable data science; Gene expression; Transcriptomics; Data analysis

资金

  1. Portuguese funding agency, FCT-Fundacao para a Ciencia e a Tecnologia [LA/P/0063/2020]
  2. Portuguese National Network for Advanced Computing [CPCA/A2/2640/2020]
  3. Portuguese Foundation for Science and Technology [SFRH/BD/145707/2019]

向作者/读者索取更多资源

This paper reviews the main steps and concepts in machine learning pipelines and scalable data science for gene expression analysis. It discusses the benefits of using the Dask framework and provides case studies to demonstrate its effectiveness in boosting data science applications.
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据