Skip to main content

Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter



The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, the choice of the normalization method is often not explicitly motivated although this choice may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as (1) per-library-size, (2) per-total-RNA, and (3) per-sample-size (per-mg-tissue).


Initially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 27% and 20% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 4% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively).


Scientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.

Peer Review reports


In gene expression analyses, data normalization can be performed using a multitude of approaches, acting as a significant determinant of the validity and reliability of interpretations [1,2,3,4]. For any data set, available normalization strategies are, at least partially, predetermined by the technique used for data acquisition. Still, normalization always involves a myriad of explicit choices that may profoundly affect analytical outcomes. For example, for studies utilizing quantitative PCR (qPCR), the selection of internal reference genes will largely define downstream analyses and conclusions, and the utilization of non-validated reference genes will lead to substantial bias [4]. Analogous to this, for studies involving RNA sequencing, appropriate library size scaling will determine the comparability of samples in downstream statistical analysis [1, 3]. Overall, data normalization essentially targets sources of technique-specific artifacts and non-biological variation. In addition, it also defines the biological perspective from which data are interpreted [2, 5]. This means that the choice of normalization mode will define the biological output of the experiment. Indeed, transcript abundances can be modeled using either of three distinctly different approaches; abundances relative to the overall mRNA pool (1; i.e., using geometric averaging; per-library-size), abundances relative to the total amount of RNA (2; per-total-RNA), or abundances relative to amounts of tissue or numbers of cells used in the experiment (3; per-sample-size) [2, 5].

While each of these perspectives holds biological merit, providing gene expression data that can be interpreted and compared between samples (e.g., changes from before to after a specific treatment), they do so in different manners. First, the per-library-size approach provides data that assess the relative abundances of transcripts relative to all other transcripts, arguably enabling assessment of transcript expression that compares their ability to compete for slots on ribosomes. Second, the per-total-RNA approach provides data that assess the absolute level of transcripts relative to the entire pool of RNA, enabling assessment of transcript expression that compares their ability to recognize and bind to ribosomes. Third, the per-sample-size approach provides data that assess the overall abundances of transcripts in the biological sample at hand, and thus their content per-cell or per-tissue weight. Consequently, the three different normalization scenarios set the stage for interpretations with different biological perspectives. These differences will be exacerbated in experimental models and designs involving large degrees of cellular perturbations and plasticity, with accompanying changes in the overall mRNA and total RNA expression [5,6,7]. Despite this, the analytical consequence of using a specific normalization strategy is rarely explicitly addressed in the biomedical literature, even though it represents an old and ever-present issue in mRNA-based analyses [2, 5].

The present study aimed to investigate the consequences of using each of three normalization modes (per-library-size, per-total-RNA, and per-sample-size) for transcriptome profiling of RNA-seq data from a highly plastic model of human biology. Briefly, twenty-five human participants conducted twelve weeks of high-load resistance training using a within-participant study design. Each participant performed exercise training with either low or moderate volume, allocated to either leg [8]. Overall, both study protocols led to substantial changes in muscle strength, mass, and phenotype. The latter was evaluated from bilateral muscle biopsies (m. vastus lateralis) sampled at baseline and after two and twelve weeks of training. Biopsy samples showed marked increases in overall total RNA and mRNA abundances, arguably making it an adequate experimental system for the proposed comparison [8, 9]. In the current analyses, the first objective was to identify a subset of gene transcripts that show relative stability within participants across all time points, measured as transcript abundances per-library-size, per-total-RNA, and per-sample-size, respectively. Secondly, we used the resulting reference gene sets to normalize the entire RNA-seq dataset, ultimately providing estimates of transcript abundances corresponding to each of the three perspectives of normalization.


Study overview

Thirty-four participants completed a 12-week progressive resistance training intervention with legs randomly allocated to either low (one set per exercise) or moderate-volume (three sets per exercise) training, as previously described [8]. The training intervention consisted of leg-press, leg-curl, and knee-extension. Bilateral muscle biopsies were obtained before the intervention and after two and twelve weeks of training. Total RNA was extracted from the biopsy material (TRIzol, ThermoFisher Scientific, Oslo, Norway) [8], and samples were selected for analysis based on RNA integrity scores. Twenty-five participants had a complete set of samples with integrity scores \(\ge\) 7 (Average RQI 9, SD: 0.4; Experion Automated Electrophoresis Station using RNA StdSens Assay, Bio-Rad, Norway) and were selected for the RNA-sequencing experiment [9]. A fixed amount of total RNA (1000 ng) was depleted of ribosomal RNA and used for RNA-seq library preparations and subjected to Paired-end sequencing (Illumina HiSeq 3000, Illumina, San Diego, CA USA), as detailed elsewhere [9]. For the present analyses, data from the two legs/volume conditions were treated as biological replicates during data modeling, with interpretations focusing on the effects of resistance training per se rather than on differential effects of the two training volume conditions.

Preprocessing, read alignment, and quantification

Before alignment, Trimmomatic (version 0.39) [10] was used to filter out low-quality reads and remove poor-quality bases and adaptor sequences using default settings. The quality of filtered files was calculated using FastQC (version 0.11.4). After quality filtering, reads were aligned to the human genome and quantified on the level of transcripts using RSEM (version 1.3.1) [11] and GRCh38 release-97 (downloaded from

Identification of stable genes and modeling of transcript counts

The overall assumption of the analyses was that modeling of transcript expression to crude estimates of the three biological denominators mRNA, total RNA, and sample size would be formative for downstream analyses due to incomparable scaling and measurement errors between normalization modes, and thus would affect interpretations (Fig. 1A).

Fig. 1
figure 1

A Overview of data transformation and analyzes used. Raw counts were transformed to represent normalized data per-library-size, per-total-RNA and per-sample-size (tissue mass). Transformed counts were used to identify stable reference genes free from systematic effect and with subsequent ranking by intra-class correlation. Normalization factors comprised of 10 transcripts from each normalization approach was used in differential expression analysis. B Fold-changes of sample references (average of the top ten stable transcripts per normalization mode) ratios with numerators plotted over columns and denominators over rows. Error bars represent 95% CI. C Transcripts identified as differentially up and down-regulated over time (differences from Week 0 to Week 2 and 12 respectively) from generalized linear models with each normalization factor used as a model offset. Percentages represents proportions of all transcripts identified as differentially expressed regardless of normalization approach. Up- and down-regulation determined from false discovery rate-adjusted p values (p < 0.05). Black points represent intersections, e.g., where the same transcript has been identified from in one or more normalization perspective

Initially, we, therefore, identified internal reference genes (stable genes) to create comparable normalization factors between normalization modes for subsequent analyses. Stable genes were selected from a subset of transcripts that showed robust expression across all samples, filtered with the minimum count per sample set to 30. After filtering, 5687 genes remained in the data set for assessment of within-participant stability. For each normalization mode (per-library-size, per-total-RNA and per-sample-size), stable genes were then assessed using transcript counts transformed to counts per million (CPM), calculated as counts per scaled library size (total counts scaled by trimmed mean of M-values [1]), counts per amount of tissue (mg of tissue \(\times {10}^{6}\)) and as counts per total RNA (\(1\times {10}^{6}\), assuming equal total RNA in each reaction), respectively. CPM values were log-transformed before being fitted to linear mixed-effects models on a target-by-target basis. Models were subsequently used to (1) assess the effects of the intervention on transcript abundances over time and to (2) determine the intraclass correlation coefficient (ICC), defined as the amount of variance attributed to between-participant variation relative to the total variance. For each normalization mode, maximal t-values calculated from model coefficients representing study conditions (time and exercise volume) were used to remove transcripts showing indices of the intervention's systematic effects. Transcripts with absolute t-values < 1.5 were subjected to subsequent ranking based on ICC values. The top ten transcripts from each normalization mode were then defined as stable reference genes, which were deemed suitable for calculating normalization denominators. Selected transcripts were scaled (\({x}_{1}/max(x)\)) and averaged per sample to form the sample reference. To compare sample references from each normalization mode, ratios were evaluated over time from estimates obtained from linear mixed-effects models.

Thereafter, the complete set of transcripts (excluding reference genes, filtered with minimum count = 1, n = 12,066) was modeled on a target-by-target basis using negative binomial generalized linear mixed models (GLMM) [9, 12], with normalization mode-specific normalization factors being used as offsets in each model fit to express gene counts per-library-size, per-total-RNA and per-sample-size. Model fits were used to assess the effects of study conditions on relative gene counts. For the sake of this analysis, samples from each leg were considered biological replicates to determine the impact of exercise training per se (time-effects). Differentially expressed genes were defined as significantly different from baseline on a target-by-target basis (p < 0.05 adjusted for false discovery rate, FDR).


The training intervention led to robust increases in muscle mass and strength (on average 4% and 25%, respectively [8]). This was accompanied by an increase in total RNA from baseline to weeks two and twelve (on average 27% and 17%, respectively) and an increase in the sequenced library size, despite a lower amount of tissue being used during library preparations [8, 9]. In the present RNA-seq dataset, the initial modeling of data, providing gene expression estimates relative to per-library-size, per-total-RNA, and per-sample-size, led to the identification of three unique clusters of stable gene transcripts across untrained and resistance-trained muscle specimens. Each of these gene clusters consisted of genes that showed unaltered expression across muscle biopsies sampled from each participant (and as such was not affected by the resistance training). The number of genes in the three clusters varied substantially between modes of modeling (per-library-size, n = 1266; per-total-RNA, n = 90; per-sample-size, n = 18), with per-library-size normalization being associated with higher ICC estimates, suggesting higher degrees of consistency between samples. Based on ICC estimates, the top ten most stable transcripts from each modeling scenario were then identified (Table 1). While there was no overlap between the per-library-size cluster and either of the two other clusters among these transcripts, per-total-RNA and per-samples-size datasets contained two overlapping transcripts (Table 1).

Table 1 Genes selected as stable reference genes from each normalization scenario

Based on the three clusters of top-ten stable genes, we then computed a scaled average of stable transcript expression for each mode of modeling. In these analyses, the stable transcripts identified in the per-library-size dataset showed clear training-associated increases in abundances in per-total-RNA and per-sample-size datasets (Fig. 1B, right panels). Similarly, stable genes identified in the per-sample-size dataset showed decreased expression in the per-total-RNA dataset (Fig. 1B, upper left panel). These differences were most pronounced in samples obtained after two weeks of resistance training (Fig. 1B).

We utilized the three clusters of stable transcripts for normalization of the entire RNA-seq dataset to identify training-induced differentially expressed transcripts per normalization mode on a transcript-by-transcript basis. Figure 1C illustrates the overlap of transcripts identified as differentially expressed (up and down-regulated from baseline) across normalization modes with black filled circles indicating which normalization modes are included in each set. Most transcripts identified as differentially up-regulated (31% and 21% of transcripts identified as differentially expressed from baseline to weeks 2 and 12, respectively) were commonly identified in all three normalization modes (Fig. 1C, first column). In contrast, 27% and 20% of the transcripts identified as differentially expressed, differential expression was only identified in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario (Fig. 1C, second column). Furthermore, 15% and 7% of the genes identified as differentially expressed were uniquely identified as up-regulated using the per-sample-size normalization mode (Fig. 1C, third column) whereas most transcripts identified as uniquely down-regulated were found in the per-library-size normalization mode (Fig. 1C, fourth column). At week 2, discrepancies between normalization modes led to transcripts (4%) being identified as up-regulated in the per-sample-size mode but down-regulated in the per-library-size mode.


The present study demonstrates that the choice of normalization modality will affect the outcome of gene expression analyses in models of load-induced skeletal muscle plasticity in humans. The three modes of normalization (per-library-size, per-total-RNA, and per-sample size) were associated with different patterns of training-associated changes in gene expression. In general, per-library-size-based normalization was associated with the underestimation of mRNA abundances compared to the two other approaches. This underestimation was attributed to an overall increase in total RNA and mRNA expression in the muscle samples [8], with per-library-size-based analyses inherently assuming global transcript expression to remain unchanged across samples. Despite this, library size-based normalization remains the point of reference for most transcriptome studies [3, 13], which is also true for studies investigating responses to exercise training (e.g., metamex) [14], even though such treatments typically lead to global-scale changes in transcription [8, 15, 16]. These observations advocate that the choice of normalization mode must be carefully evaluated in any study involving gene expression analyses to ensure adequate biological interpretations [7]. Indeed, for experimental models involving large degrees of cellular perturbations and plasticity, and thus potentially global transcription amplification, per-library-normalization will lead to underestimation of transcript abundances [15].

While the results from the present study indicate the impact of using different normalization strategies, they do not inform us on the rights and wrongs of normalization choices. The correct use of a specific normalization strategy depends on the research questions and model systems under study. For example, in systems where relative transcript abundances are of primary interest, or in models where whole-transcriptome expression remains stable, it seems prudent to use per-library-size normalization [1, 3, 5]. This will provide data with the appropriate biological viewpoint, essentially informing about the competitive ability of transcripts to recognize and associate with ribosomes. In addition, per-library-size normalization arguably leads to more explicit correction of the technical variation occurring during sample preparations (e.g., during RNA extraction). However, as exemplified in the present study, for systems where these assumptions are not met and global changes occur for variables such as total RNA and global mRNA expression, it may lead to biased conclusions [5]. In such studies, absolute transcript abundances provide an alternative and perhaps more suitable outcome. Indeed, cellular growth is likely to be associated with dose-dependent changes in protein accretion and speculatively also transcript abundances, leaving per-total-RNA and per-samples-size normalization as the most beneficial approach for interpreting cellular characteristics.

In most gene expression studies, a limited number of mRNA species is investigated using methods such as qPCR, with normalization generally being performed using geometric averaging of a presumed set of stable genes [4, 17] In such studies, the necessity of complying with the logics of the present study is reinforced. Indeed, they typically involve statistical analyses that do not adjust for the presence of multiple observations, as is the case in RNA-seq experiments. This amplifies the likelihood of detecting differential gene expression patterns between different modes of normalization. For such analyses, particular care is thus needed during data normalization, reiterating the need for selecting stable genes that adequately represent the biological viewpoint of desire (i.e., per-library-size vs. per-total-RNA vs. per-sample-size). For datasets involving resistance-trained human skeletal muscle, the stable gene clusters identified in the current analyses pose as potential candidates. However, their representativeness and stability must be validated separately in any given study, as they are likely to be affected by tweaks in study variables such as treatment protocols (e.g., differences in the resistance training modalities) and the nature of the biological model (e.g., variation in participant age and disease status) [8, 9, 18].


In the present study, we show that the choice of normalization modality (per-library-size vs. per-total-RNA vs. per-sample-size) affects interpretations of transcriptome responses in a human model of load-induced skeletal muscle plasticity. For any gene expression study, data normalization should be conducted and evaluated with care and intent, ensuring the stability and representativity of normalization denominators, and importantly, the biological viewpoint of outcome measures.

Availability of data and materials

The codes and the count data used in analysis during the current study are available from the corresponding author on request. The raw sequence data is not publicly available due to patient confidentiality.



Ribonucleic acid


Generalized linear mixed models


Intraclass correlation coefficient


Counts per million


  1. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.

    Article  CAS  Google Scholar 

  2. Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes Immun. 2005;6(4):279–84.

    Article  CAS  Google Scholar 

  3. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25.

    Article  Google Scholar 

  4. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002;3(7):RESEARCH0034.

    Article  Google Scholar 

  5. Coate JE, Doyle JJ. Variation in transcriptome size: are we getting the message? Chromosoma. 2015;124(1):27–43.

    Article  CAS  Google Scholar 

  6. Hansen MC, Nielsen AK, Molin S, Hammer K, Kilstrup M. Changes in rRNA levels during stress invalidates results from mRNA blotting: fluorescence in situ rRNA hybridization permits renormalization for estimation of cellular mRNA levels. J Bacteriol. 2001;183(16):4747–51.

    Article  CAS  Google Scholar 

  7. Loven J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA. Revisiting global gene expression analysis. Cell. 2012;151(3):476–82.

    Article  CAS  Google Scholar 

  8. Hammarstrom D, Ofsteng S, Koll L, Hanestadhaugen M, Hollan I, Apro W, Whist JE, Blomstrand E, Ronnestad BR, Ellefsen S. Benefits of higher resistance-training volume are related to ribosome biogenesis. J Physiol. 2020;598(3):543–65.

    Article  Google Scholar 

  9. Khan Y, Hammarstrom D, Ronnestad BR, Ellefsen S, Ahmad R. Increased biological relevance of transcriptome analyses in human skeletal muscle using a model-specific pipeline. BMC Bioinform. 2020;21(1):548.

    Article  CAS  Google Scholar 

  10. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    Article  CAS  Google Scholar 

  11. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011;12:323–323.

    Article  CAS  Google Scholar 

  12. Cui S, Ji T, Li J, Cheng J, Qiu J. What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment. Stat Appl Genet Mol Biol. 2016;15(2):87–105.

    Article  CAS  Google Scholar 

  13. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550–550.

    Article  Google Scholar 

  14. Pillon NJ, Gabriel BM, Dollet L, Smith JAB, Sardon Puig L, Botella J, Bishop DJ, Krook A, Zierath JR. Transcriptomic profiling of skeletal muscle adaptations to exercise and inactivity. Nat Commun. 2020;11(1):470.

    Article  CAS  Google Scholar 

  15. Chaillou T, Malgoyre A, Banzet S, Chapot R, Koulmann N, Pugniere P, Beaudry M, Bigard X, Peinnequin A. Pitfalls in target mRNA quantification for real-time quantitative RT-PCR in overload-induced skeletal muscle hypertrophy. Physiol Genom. 2011;43(4):228–35.

    Article  CAS  Google Scholar 

  16. Figueiredo VC, McCarthy JJ. Regulation of ribosome biogenesis in skeletal muscle hypertrophy. Physiology. 2019;34(1):30–42.

    Article  CAS  Google Scholar 

  17. Andersen CL, Jensen JL, Orntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004;64(15):5245–50.

    Article  CAS  Google Scholar 

  18. Brook MS, Wilkinson DJ, Mitchell WK, Lund JN, Phillips BE, Szewczyk NJ, Greenhaff PL, Smith K, Atherton PJ. Synchronous deficits in cumulative muscle protein synthesis and ribosomal biogenesis underlie age-related anabolic resistance to exercise in humans. J Physiol. 2016;594(24):7399–417.

    Article  CAS  Google Scholar 

Download references


Not applicable.


The work presented here was supported by a grant from Innlandet Hospital Trust (Grant Number 150282) and by grants from the Inland University of Applied Sciences.

Author information

Authors and Affiliations



RA, SE, and YK planned the study. DH, RA, and YK performed analyses. All authors provided useful input during the interpretation of data and contributed to drafting and finalizing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rafi Ahmad.

Ethics declarations

Ethics approval and consent to participate

All procedures were approved by the local ethics committee at Inland Norway University of Applied Sciences (nr 2013-11-22:2), and the study design was pre-registered at (Identifier: NCT02179307). The first registration was undertaken on 01/07/2014. The study was conducted in accordance with the Declaration of Helsinki. Informed consent was obtained from all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, Y., Hammarström, D., Ellefsen, S. et al. Normalization of gene expression data revisited: the three viewpoints of the transcriptome in human skeletal muscle undergoing load-induced hypertrophy and why they matter. BMC Bioinformatics 23, 241 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • RNA-seq
  • Skeletal muscle
  • Normalization
  • Resistance training