4.6 Article

SMIXnorm: Fast and Accurate RNA-Seq Data Normalization for Formalin-Fixed Paraffin-Embedded Samples

Journal

FRONTIERS IN GENETICS
Volume 12, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA
DOI: 10.3389/fgene.2021.650795

Keywords

RNA-sequencing; normalization; FFPE; formalin-fixed paraffin-embedded samples; archived samples; statistical methods

Funding

  1. NIH [R15GM131390, R35 GM136375, P50CA70907, P30CA142543, 1R01GM115473, 1R01CA172211]
  2. Cancer Prevention and Research Institute of Texas [RP190107, RP180805]

Ask authors/readers for more resources

RNA-seq provides comprehensive quantification of transcriptomic activities in biological samples. This study introduces a new normalization method, SMIXnorm, for FFPE RNA-seq data, which simplifies the computational process, improves efficiency, and maintains superior performance compared to existing methods. Furthermore, a web-based tool, RSeqNorm, is developed to compute normalized RNA-seq data for both FFPE and FF samples, offering users a simple workflow for downstream analysis.
RNA-sequencing (RNA-seq) provides a comprehensive quantification of transcriptomic activities in biological samples. Formalin-Fixed Paraffin-Embedded (FFPE) samples are collected as part of routine clinical procedure, and are the most widely available biological sample format in medical research and patient care. Normalization is an essential step in RNA-seq data analysis. A number of normalization methods, though developed for RNA-seq data from fresh frozen (FF) samples, can be used with FFPE samples as well. The only extant normalization method specifically designed for FFPE RNA-seq data, MlXnorm, which has been shown to outperform the normalization methods, but at the cost of a complex mixture model and a high computational burden. It is therefore important to adapt MlXnorm for simplicity and computational efficiency while maintaining superior performance. Furthermore, it is critical to develop an integrated tool that performs commonly used normalization methods for both FF and FFPE RNA-seq data. We developed a new normalization method for FFPE RNA-seq data, named SMIXnorm, based on a simplified two-component mixture model compared to MlXnorm to facilitate computation. The expression levels of expressed genes are modeled by normal distributions without truncation, and those of non-expressed genes are modeled by zero-inflated Poisson distributions. The maximum likelihood estimates of the model parameters are obtained by a nested Expectation-Maximization algorithm with a less complicated latent variable structure, and closed-form updates are available within each iteration. Real data applications and simulation studies show that SMIXnorm greatly reduces computing time compared to MlXnorm, without sacrificing the performance. More importantly, we developed a web-based tool, RNA-seq Normalization (RSeqNorm), that offers a simple workflow to compute normalized RNA-seq data for both FFPE and FF samples. It includes SMIXnorm and MlXnorm for FFPE RNA-seq data, together with five commonly used normalization methods for FF RNA-seq data. Users can easily upload a raw RNA-seq count matrix and select one of the seven normalization methods to produce a downloadable normalized expression matrix for any downstream analysis. The R package is available at https://github.com/S-YIN/RSEQNORM . The web-based tool, RSeqNorm is available at http://lce.biohpc.swmed.eduirseqnorm with no restriction to use or redistribute.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available