4.7 Article

Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously

Related references

Note: Only part of the references are listed.
Article Biochemical Research Methods

Benchmarking atlas-level data integration in single-cell genomics

Malte D. Luecken et al.

Summary: This study benchmarked 68 method and preprocessing combinations on 85 batches of gene expression data, highlighting the importance of highly variable gene selection in improving method performance. When dealing with complex integration tasks, scANVI, Scanorama, scVI, and scGen consistently performed well, while the performance of single-cell ATAC-sequencing integration was strongly influenced by the choice of feature space.

NATURE METHODS (2022)

Article Biochemical Research Methods

meGPS: a multi-omics signature for hepatocellular carcinoma detection integrating methylome and transcriptome data

Qiong Wu et al.

Summary: This study introduces a novel strategy using DNA methylation and RNA expression data to discriminate hepatocellular carcinoma (HCC). Immune genes with negative correlations between expression and promoter methylation are identified as candidates for HCC detection. A methylation GPS (mGPS) and an expression GPS (eGPS) are separately constructed and then assembled into a meGPS, which successfully detects and predicts HCC with reliable performance validated by independent datasets. This study provides potential molecular targets for the detection and therapy of HCC.

BIOINFORMATICS (2022)

Article Biotechnology & Applied Microbiology

Widespread redundancy in -omics profiles of cancer mutation states

Jake Crawford et al.

Summary: This study focuses on predicting cancer mutation status and compares the predictive ability of different -omics readouts. RNA sequencing is found to be the most effective predictor, and other data types also show similar effectiveness for most genes. There is more variability in prediction performance between mutations than between data types for the same mutation. Combining different data types into a single model does not significantly improve predictive ability. Therefore, there are multiple -omics types that can serve as effective readouts for studying cancer mutation function, with gene expression being a reasonable default option.

GENOME BIOLOGY (2022)

Article Multidisciplinary Sciences

Uniform genomic data analysis in the NCI Genomic Data Commons

Zhenyu Zhang et al.

Summary: The goal of National Cancer Institute's Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and clinical data to support precision medicine through data sharing and collaborative analysis. The initial dataset includes various data types from NCI TCGA and TARGET projects, and data production started in June 2015 using an OpenStack-based private cloud. The GDC has analyzed more than 50,000 raw sequencing data inputs and generated different data types using the latest human genome reference build GRCh38, which are available for download and exploratory analysis at GDC Data Portal and Legacy Archive.

NATURE COMMUNICATIONS (2021)

Review Biotechnology & Applied Microbiology

Computational principles and challenges in single-cell data integration

Ricard Argelaguet et al.

Summary: The development of single-cell multimodal assays has provided a powerful tool for investigating cellular heterogeneity in multiple dimensions. Data integration is a key challenge in analyzing single-cell multimodal data, with existing strategies utilizing similar mathematical ideas but having distinct goals and principles.

NATURE BIOTECHNOLOGY (2021)

Article Biochemistry & Molecular Biology

Integrated analysis of multimodal single-cell data

Yuhan Hao et al.

Summary: The study introduces a weighted-nearest neighbor analysis framework to learn the relative utility of each data type in each cell, enabling integrative analysis of multiple modalities. Applied to a CITE-seq dataset, the method constructs a multimodal reference atlas of the circulating immune system and successfully identifies and validates previously unreported lymphoid subpopulations.
Article Biochemistry & Molecular Biology

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes

Christopher A. Mancuso et al.

NUCLEIC ACIDS RESEARCH (2020)

Article Biochemistry & Molecular Biology

Comprehensive Integration of Single-Cell Data

Tim Stuart et al.

Article Biochemical Research Methods

Pathway-level information extractor (PLIER) for gene expression data

Weiguang Mao et al.

NATURE METHODS (2019)

Article Biochemistry & Molecular Biology

ArrayExpress update - from bulk to single-cell expression data

Awais Athar et al.

NUCLEIC ACIDS RESEARCH (2019)

Article Biotechnology & Applied Microbiology

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

Christoph Hafemeister et al.

GENOME BIOLOGY (2019)

Article Biochemical Research Methods

Conditional generative adversarial network for gene expression inference

Xiaoqian Wang et al.

BIOINFORMATICS (2018)

Article Biotechnology & Applied Microbiology

Integrating single-cell transcriptomic data across different conditions, technologies, and species

Andrew Butler et al.

NATURE BIOTECHNOLOGY (2018)

Article Biochemistry & Molecular Biology

Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

Kyle Ellrott et al.

CELL SYSTEMS (2018)

Article Computer Science, Interdisciplinary Applications

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C plus plus and R

Marvin N. Wright et al.

JOURNAL OF STATISTICAL SOFTWARE (2017)

Article Biochemical Research Methods

Gene expression inference with deep learning

Yifei Chen et al.

BIOINFORMATICS (2016)

Editorial Material Medicine, General & Internal

Toward a Shared Vision for Cancer Genomic Data

Robert L. Grossman et al.

NEW ENGLAND JOURNAL OF MEDICINE (2016)

Article Cell Biology

Robust classification of bacterial and viral infections via integrated host gene expression diagnostics

Timothy E. Sweeney et al.

SCIENCE TRANSLATIONAL MEDICINE (2016)

Article Multidisciplinary Sciences

Cross-platform normalization of microarray and RNA-seq data for machine learning applications

Jeffrey A. Thompson et al.

PEERJ (2016)

Article Multidisciplinary Sciences

CrossNorm: a novel normalization strategy for microarray data in cancers

Lixin Cheng et al.

SCIENTIFIC REPORTS (2016)

Article Biotechnology & Applied Microbiology

Spatial reconstruction of single-cell gene expression data

Rahul Satija et al.

NATURE BIOTECHNOLOGY (2015)

Article Genetics & Heredity

Understanding multicellular function and disease with human tissue-specific networks

Casey S. Greene et al.

NATURE GENETICS (2015)

Article Biochemistry & Molecular Biology

The prognostic landscape of genes and infiltrating immune cells across human cancers

Andrew J. Gentles et al.

NATURE MEDICINE (2015)

Article Multidisciplinary Sciences

Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability

Karolis Uziela et al.

PLOS ONE (2015)

Article Biochemistry & Molecular Biology

ArrayExpress update-simplifying data submissions

Nikolay Kolesnikov et al.

NUCLEIC ACIDS RESEARCH (2015)

Article Biotechnology & Applied Microbiology

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

Charity W. Law et al.

GENOME BIOLOGY (2014)

Article Biochemistry & Molecular Biology

The Somatic Genomic Landscape of Glioblastoma

Cameron W. Brennan et al.

Editorial Material Genetics & Heredity

The Cancer Genome Atlas Pan-Cancer analysis project

John N. Weinstein et al.

NATURE GENETICS (2013)

Article Biochemical Research Methods

Gene-pair expression signatures reveal lineage control

Merja Heinaeniemi et al.

NATURE METHODS (2013)

Article Biochemistry & Molecular Biology

NCBI GEO: archive for functional genomics data sets-update

Tanya Barrett et al.

NUCLEIC ACIDS RESEARCH (2013)

Article Mathematical & Computational Biology

Using control genes to correct for unwanted variation in microarray data

Johann A. Gagnon-Bartsch et al.

BIOSTATISTICS (2012)

Article Biotechnology & Applied Microbiology

A single-sample microarray normalization method to facilitate personalized-medicine workflows

Stephen R. Piccolo et al.

GENOMICS (2012)

Article Multidisciplinary Sciences

Comprehensive molecular portraits of human breast tumours

Daniel C. Koboldt et al.

NATURE (2012)

Article Biochemical Research Methods

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Bo Li et al.

BMC BIOINFORMATICS (2011)

Article Computer Science, Interdisciplinary Applications

Regularization Paths for Generalized Linear Models via Coordinate Descent

Jerome Friedman et al.

JOURNAL OF STATISTICAL SOFTWARE (2010)

Article Behavioral Sciences

Rank-Based Inverse Normal Transformations are Increasingly Used, But are They Merited?

T. Mark Beasley et al.

BEHAVIOR GENETICS (2009)

Review Genetics & Heredity

RNA-Seq: a revolutionary tool for transcriptomics

Zhong Wang et al.

NATURE REVIEWS GENETICS (2009)

Article Genetics & Heredity

Capturing heterogeneity in gene expression studies by surrogate variable analysis

Jeffrey T. Leek et al.

PLOS GENETICS (2007)

Article Mathematical & Computational Biology

Adjusting batch effects in microarray expression data using empirical Bayes methods

W. Evan Johnson et al.

BIOSTATISTICS (2007)

Article Biochemistry & Molecular Biology

Gene Expression Omnibus: NCBI gene expression and hybridization array data repository

R Edgar et al.

NUCLEIC ACIDS RESEARCH (2002)