Skip to main content

Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research



The selection of reference genes is essential for quantifying gene expression. Theoretically they should be expressed stably and not regulated by experimental or pathological conditions. However, identification and validation of reference genes for human cancer research are still being regarded as a critical point, because cancerous tissues often represent genetic instability and heterogeneity. Recent pan-cancer studies have demonstrated the importance of the appropriate selection of reference genes for use as internal controls for the normalization of gene expression; however, no stably expressed, consensus reference genes valid for a range of different human cancers have yet been identified.


In the present study, we used large-scale cancer gene expression datasets from The Cancer Genome Atlas (TCGA) database, which contains 10,028 (9,364 cancerous and 664 normal) samples from 32 different cancer types, to confirm that the expression of the most commonly used reference genes is not consistent across a range of cancer types. Furthermore, we identified 38 novel candidate reference genes for the normalization of gene expression, independent of cancer type. These genes were found to be highly expressed and highly connected to relevant gene networks, and to be enriched in transcription-translation regulation processes. The expression stability of the newly identified reference genes across 29 cancerous and matched normal tissues were validated via quantitative reverse transcription PCR (RT-qPCR).


We reveal that most commonly used reference genes in current cancer studies cannot be appropriate to serve as representative control genes for quantifying cancer-related gene expression levels, and propose in this study three potential reference genes (HNRNPL, PCBP1, and RER1) to be the most stably expressed across various cancerous and normal human tissues.


To understand how genetic alterations driving tumorigenesis lead to the formation of complex cellular networks and induce biological process variation, recent research into cancer genetics has focused on the identification of molecular differences between cancerous and normal tissues [1, 2]. Recent high-throughput transcriptomic studies [3] have offered the opportunity to explore the molecular complexity of human cancer, and have provided evidence for classifying human cancer data into normal, benign, and malignant classes, based on their gene expression patterns. Nevertheless, the expression levels of transcriptionally identified candidate cancer genes require experimental verification via molecular methods such as quantitative reverse transcription PCR (RT-qPCR). One of the most important factors ensuring the accuracy of RT-qPCR analyses is the normalization of the identified target-gene expression level to that of a consistently expressed reference gene. To date, cancer researchers have predominantly used the GAPDH and β-actin reference genes as internal reference controls, because their mRNA expression levels are established to be high and constant in many different cells and tissues [4, 5]. However, cancerous tissues often exhibit a higher level of gene expression variability than normal tissues, due to tumor heterogeneity, genetic instability, and the fact that genetic alterations in diverse cancer types may differentially affect cellular processes at the transcriptome level. Thus, it is a challenging to determine which reference genes would best serve as internal reference controls for a range of different human cancers. Indeed, an increasing number of researches have shown the striking expression variability of known reference genes in human cancers, and subsequently recommended novel reference genes for gene expression studies in each specific human cancer type [6, 7]. These efforts with in silico analysis (e.g., geNorm, NormFinder, and Bestkeeper [8,9,10]) are ongoing; however, to date, no transcriptome-wide analysis for the identification of the most stably expressed consensus reference genes has been reported.

The primary objective of the present study was to conduct a screen for the most stable reference genes for the study of cancer gene expression. We exploited large-scale gene expression data from The Cancer Genome Atlas (TCGA) database, which contains 10,028 (9,364 cancerous and 664 normal) samples from 32 different cancer types. We identified novel reference genes that exhibited both a high expression and low expression-variation level across various cancerous and normal tissue types, and then demonstrated the effectiveness of these newly identified reference genes for use in RT-qPCR. Thus, the results of the present study promote a better understanding of gene expression changes in different cancer types, and will be of considerable use in facilitating the normalization of target-gene expression levels in future cancer research.


Data collection and bioinformatics analysis

The overall workflow of the present study is shown in Fig. 1. We downloaded RNA-sequence (RNA-seq) V2 data (level 3) of 34 different cancer types from the TCGA database ( The TCGA RNA-seq pipeline has used two distinct measurement methods, comprising RPKM (Reads Per Kilobase per Million mapped reads) [11] and TPM (Transcripts Per Million) [12, 13], to obtain expression levels from RNA-seq data. Given that TPM is established to produce more comparable results across various sample types than RPKM [13, 14], we used TPM-generated data for 32 of the 34 cancer types for further analyses [esophageal carcinoma (ESCA) and stomach adenocarcinoma (STAD) were excluded, since only RPKM-generated data were available for these cancer types]. Unless otherwise stated, all gene expression levels used in our analyses represent the unit of transformed (multiplied by 106) normalized read counts (extracted from TCGA files with the extension “rsem.genes.normalized_results”).

Fig. 1
figure 1

The overall workflow of the present study

The human protein interaction network data were collected from the Human Protein Reference Database (HPRD release 9, [15], which includes 30,047 protein entries and 41,327 protein-protein interactions (PPIs). We extracted all binary PPIs from the HPRD, and counted the number of interactions for each protein without redundancy to estimate the size of the protein complex.

We categorized the selected reference genes according to gene ontology groups using PANTHER ( [16] and DAVID ( [17] tools.

Human specimens

The validity of all matched human cancerous and normal tissues was confirmed via patient clinical diagnosis. In total, 58 matched sample pairs were obtained for analysis, of which the cancerous tissue sample in each was isolated from patient breast (n = 18), colon (n = 12), thyroid (n = 8), lung (n = 8), liver (n = 8), kidney (n = 2), or cervical (n = 2) cancer tissues. All human tissue was trimmed to 0.5 cm2 immediately after removal from the patient and stored in 5 volumes of RNAlater solution (ThermoFisher Scientific, USA) at − 80 °C. For the experiment, samples were used within 3 years of storage. These all utilized human specimens and data were provided by the Biobank of Chungnam University Hospital (Korea Biobank Network).

RNA preparation and RT-qPCR

Total RNA was extracted using a eCube Tissue RNA Mini Kit (PhileKorea, Korea) according to the manufacturer’s instructions, and reverse-transcribed using M-MLV reverse transcriptase (Promega, USA) with random hexamers. RT-qPCR was performed with a SYBR-Green fluorescent dye (GENET BIO, Korea) and the AriaMx PCR System (Agilent, USA). All reactions occurred under identical cycling conditions, comprising 40 cycles of amplification with denaturation (95 °C, 20 s), annealing (58 °C, 20 s), and elongation (72 °C, 20 s). The specificity of the products generated by each primer set was confirmed by both gel electrophoresis and a melting curve analysis (Additional file 1: Table S1 and Additional file 2: Figure S1).

Results and discussion

Commonly used reference genes exhibit a high level of expression variation in both tumorous and normal tissue samples

To assess the gene expression variability within human cancerous and normal tissues, we collected gene expression data from the TCGA database, which contains 10,028 (9,364 cancerous and 664 normal) samples isolated from 32 different cancer types. We used TPM-generated data to calculate the coefficient of variation (CV, calculated as the standard deviation divided by the mean), for target gene expression levels across the analyzed samples. We initially evaluated the gene expression variability of commonly used reference genes (Table 1) [18], and found all 12 analyzed genes to exhibit a CV-value greater than 45% (Table 1). Most (23/31, 74%; Tables 2 and 3) of the experimentally selected reference genes expressed in cancer tissues were observed to exhibit a similar level of gene expression variability. We repeated this process to separately analyze cancerous and normal samples, so as to eliminate potential error caused by sample size bias (since 9,364 cancerous, but only 664 normal tissue samples were analyzed). The results of this second analysis showed the same trends in each cancer and normal group, whereby all 12 commonly used reference genes and 74% (23/31) of the experimentally selected reference genes were found to exhibit a CV value greater than 45% in both groups together (Additional file 3: Table S2). These results suggest that the reference genes most commonly used in current cancer studies may not be appropriate to serve as representative reference genes, and thus, their use may lead to erroneous quantification of cancer-related gene expression levels.

Table 1 List of commonly used reference genes and their gene expression variability in 10,028 analyzed samples from TCGA database
Table 2 List of experimentally selected reference genes
Table 3 Gene expression variability of experimentally selected reference genes in 10,028 TCGA database

Selection of novel reference gene candidates from the TCGA database

Because genetic alterations in diverse cancer types may differentially affect cellular processes at the transcriptome level, we investigated whether reference genes defined by analysis of a single type of cancerous tissue could be applied to other cancer types. Thus, we calculated and compared the CV values of > 40 samples (and their matched normal tissue samples) from nine cancer types (BRCA, COAD, HNSC, LUAD, LUSC, LIHC, PRAD, THCA, and KIRC; Additional file 4: Figure S2), that were contained within the TCGA database. Among a total set of 20 top-ranked (by CV) genes from each cancer type, no genes (1) were included in the list of commonly used reference genes, and (2) were found in more than 50% (5 out of 9) of cancer types (Fig. 2 and Additional file 5: Table S3), indicating the dependency of reference genes on cancer types.

Fig. 2
figure 2

Distribution of coefficient of variation (CV) of gene expression levels in nine cancer data sets. Red color indicates top-ranked (by CV) 20 genes. Green and Blue colors indicate commonly used and experimentally selected reference genes (Tables 1 and 2), respectively

To newly determine suitable novel genes appropriate to act as internal controls for the normalization of target gene expression in cancer research, we selected a number of genes identified (1) to exhibit unvarying expression levels across both cancerous and normal tissue samples, (2) to have a CV value < 35%, (3) a minimum TPM > 0, (4) and an average of TPM value ≥1 across all tissue samples. Of the 10,028 analyzed samples from the 32 different cancer types, we identified 38 candidate novel cancer-research reference genes (Fig. 3a, Additional file 1: Table S4). We subsequently evaluated whether these newly identified reference genes had the same functional characteristics as the previously established, commonly used reference genes. We found the average expression level of the newly identified reference genes to be significantly higher than that of the others (115.06 versus 42.93; P < 0.0413, using an empirical permutation test with 10,000 replications). This result is consistent with previously reported expression levels for the established reference genes [4]. Next, we determined that, as expected [4, 5, 19], the newly identified reference genes were significantly enriched in functional categories associated with transcription-translation processes, such as polyA-RNA, ribonucleoprotein, and RNA-binding (FDR < 5%, Fig. 3b). The established reference genes have been previously demonstrated to act as the ‘hubs’ of the highly connected protein-protein interaction (PPI) networks [20,21,22]. In the present study, we observed the newly identified reference genes to be characterized by a greater number of PPI network-interaction partners than the other genes (8.42 versus 3.67; P < 0.0185, using an empirical permutation test with 10,000 replications), indicating their functional importance for biological systems.

Fig. 3
figure 3

a Distribution of the coefficient of variation (CV) of gene expression levels in the analyzed cancerous and normal tissues. Red color indicates newly identified reference genes that have a CV value < 35%. Blue color indicates commonly used reference genes (Table 1). b Gene Ontology (GO) analysis of the newly identified reference genes

RT-qPCR validation of the newly identified reference genes in human cancer tissues

We next sought to confirm the validity of the newly identified candidates as reference genes for the normalization of RT-qPCR expression data in the context of human cancer. Therefore, we compared the RT-qPCR analysis results for two commonly used reference genes (GAPDH and β-actin) with those for the 11 most highly expressed of the newly identified reference genes (PCBP1, HNRNPC, HNRNPL, EMC4, SNX17, MRPL43, IST1, FAM32A, PFDN1, RNF10, and RER1) across 29 patient samples including breast, colon, liver, lung, and/or thyroid cancer types. Each human tissue was immersed in RNAlater solution immediately after extraction from the patient and stored at -80 °C to minimize RNA degradation. In addition, 2 μg of total RNA extracted from tissues was electrophoresed on 1.5% denaturing agarose gel and only 28S/18S ratio of > 2 confirmed RNA was used in the experiment. The specificity of the products generated by each primer set was confirmed by both gel electrophoresis and a melting curve analysis (Additional file 1: Table S1 and Additional file 2: Figure S1).

Since optimal references genes for cancer-transcriptome analysis should exhibit a low level of expression variability between cancerous and normal tissue samples, we isolated total RNA from each cancerous and normal sample from a single patient and compared their CT values (where, CT is the “Cycle Threshold”, defined as the number of cycles required for the fluorescence signal to exceed background level, and is inversely correlated with the amount of target nucleic acid in the sample). Of the 11 newly identified genes, HNRNPL (ΔCT = 0.37), PCBP1 (ΔCT = 0.42), PFDN1 (ΔCT = 0.46), and RER1 (ΔCT = 0.48) were found to have a lower average CT difference (ΔCT = CT [cancer] - CT [normal]) between cancerous and normal tissue samples than β-actin (ΔCT = 0.58) and/or GAPDH (ΔCT = 0.60), suggesting their suitability for use as consensus reference genes for gene expression studies in human cancer (Fig. 4). To ensure the reliability and robustness of these results, we reconfirmed whether these reference genes had lower ΔCT values than β-actin and/or GAPDH in each cancer sample. HNRNPL was identified to have a ΔCT value lower than that of both β-actin and GAPDH in four (breast, colon, liver, and lung) of five cancer sample types. Similarly, PCBP1 and RER1 had lower ΔCT values than β-actin and GAPDH in all cancer sample types except liver cancer tissue, and PFDN1 exhibited a lower ΔCT value than β-actin and GAPDH in two cancer sample types (breast and lung, Fig. 4).

Fig. 4
figure 4

Validation of the gene expression variability of the novel reference genes by RT-qPCR. RT-qPCR analyses for two commonly used reference genes (GAPDH and β-actin, light blue-colored box), and 11 newly identified reference genes (PCBP1, HNRNPC, HNRNPL, EMC4, SNX17, MRPL43, IST1, FAM32A, PFDN1, RNF10, and RER1) that were highly ranked among the 38 analyzed genes according to their expression levels (indicated by their calculated CV values). ΔCT indicates average difference of CT value between cancerous and normal tissue samples (i.e., CT [cancer] - CT [normal]). Newly identified reference genes whose ΔCT value was found to be lower than that of both β-actin and GAPDH in all samples are highlighted (red-colored box). *Samples from seven types of cancerous tissues, including breast (n = 9), colon (n = 6), liver (n = 4), lung (n = 4), thyroid (n = 4), kidney (n = 1), and cervical (n = 1) were combined. Note that kidney and cervical tissues have not been separately represented in the box plot


In summary, cancer is a disease characterized by complex molecular networks, in which highly heterogeneous and multifocal tumor cells cooperate with host cells within their microenvironment. Recent gene expression studies have been conducted to investigate the intricate interplay of gene expression patterns that regulate cancer invasion and metastasis at the transcriptional level; however, their accurate quantification of gene expression level is dependent upon the selection and use of reliable and appropriate reference genes for the normalization of target gene expression levels. Thus, in the present study, we performed in silico bioinformatics analyses and experimental validation to identify HNRNPL, PCBP1 and RER1 as novel candidate reference genes, whose expression is predominantly consistent, independent of cancer type, stage, and treatment status, and of patient age and gender. Although a larger sample size and more cancer types are needed for more reliable results, these novel reference genes will be invaluable for diagnosis and the prediction of patient prognosis, in a wide range of human cancers.



Breast invasive carcinoma


Colon adenocarcinoma


Coefficient of variation


False discovery rate


Head and neck squamous cell carcinoma


Institutional review board


Kidney renal clear cell carcinoma


Liver hepatocellular carcinoma


Lung adenocarcinoma


Lung squamous cell carcinoma


Moloney murine leukemia virus


Prostate adenocarcinoma


Reads per kilobase per million mapped reads


Quantitative reverse transcription PCR


The cancer genome atlas


Thyloid carcinoma


Transcripts per million


  1. Hornberg JJ, Bruggeman FJ, Westerhoff HV, Lankelma J. Cancer: a systems biology disease. Biosystems. 2006;83(2–3):81–90.

    Article  CAS  Google Scholar 

  2. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR. A census of human cancer genes. Nat Rev Cancer. 2004;4(3):177–83.

    Article  CAS  Google Scholar 

  3. Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20.

    Article  Google Scholar 

  4. Zhu J, He F, Hu S, Yu J. On the nature of human housekeeping genes. Trends Genet. 2008;24(10):481–4.

    Article  CAS  Google Scholar 

  5. Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29(10):569–74.

    Article  CAS  Google Scholar 

  6. Sharan RN, Vaiphei ST, Nongrum S, Keppen J, Ksoo M. Consensus reference gene(s) for gene expression studies in human cancers: end of the tunnel visible? Cell Oncol (Dordr). 2015;38(6):419–31.

    Article  CAS  Google Scholar 

  7. Jacob F, Guertler R, Naim S, Nixdorf S, Fedier A, Hacker NF, Heinzelmann-Schwarz V. Careful selection of reference genes is required for reliable performance of RT-qPCR in human normal and cancer cell lines. PLoS One. 2013;8(3):e59180.

    Article  CAS  Google Scholar 

  8. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol. 2002;3(7):RESEARCH0034.

    Article  Google Scholar 

  9. Andersen CL, Jensen JL, Orntoft TF. Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004;64(15):5245–50.

    Article  CAS  Google Scholar 

  10. Pfaffl MW, Tichopad A, Prgomet C, Neuvians TP. Determination of stable housekeeping genes, differentially regulated target genes and sample integrity: BestKeeper--excel-based tool using pair-wise correlations. Biotechnol Lett. 2004;26(6):509–15.

    Article  CAS  Google Scholar 

  11. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.

    Article  CAS  Google Scholar 

  12. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38(18):e178.

    Article  Google Scholar 

  13. Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010;26(4):493–500.

    Article  Google Scholar 

  14. Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci. 2012;131(4):281–5.

    Article  CAS  Google Scholar 

  15. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al. Human protein reference database--2009 update. Nucleic Acids Res. 2009;37(Database):D767–72.

    Article  CAS  Google Scholar 

  16. Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13(9):2129–41.

    Article  CAS  Google Scholar 

  17. Huang d W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57.

    Article  CAS  Google Scholar 

  18. de Jonge HJ, Fehrmann RS, de Bont ES, Hofstra RM, Gerbens F, Kamps WA, de Vries EG, van der Zee AG, te Meerman GJ, ter Elst A. Evidence based selection of housekeeping genes. PLoS One. 2007;2(9):e898.

    Article  Google Scholar 

  19. Eisenberg E, Levanon EY. Human housekeeping genes are compact. Trends Genet. 2003;19(7):362–5.

    Article  CAS  Google Scholar 

  20. Lin WH, Liu WC, Hwang MJ. Topological and organizational properties of the products of house-keeping and tissue-specific genes in protein-protein interaction networks. BMC Syst Biol. 2009;3:32.

    Article  Google Scholar 

  21. Jeong H, Mason SP, Barabasi AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411(6833):41–2.

    Article  CAS  Google Scholar 

  22. Alemu EY, Carl JW Jr, Corrada Bravo H, Hannenhalli S. Determinants of expression variability. Nucleic Acids Res. 2014;42(6):3503–14.

    Article  CAS  Google Scholar 

  23. Lyng MB, Laenkholm AV, Pallisgaard N, Ditzel HJ. Identification of genes for normalization of real-time RT-PCR data in breast carcinomas. BMC Cancer. 2008;8:20.

    Article  Google Scholar 

  24. McNeill RE, Miller N, Kerin MJ. Evaluation and validation of candidate endogenous control genes for real-time quantitative PCR studies of breast cancer. BMC Mol Biol. 2007;8:107.

    Article  Google Scholar 

  25. Gur-Dedeoglu B, Konu O, Bozkurt B, Ergul G, Seckin S, Yulug IG. Identification of endogenous reference genes for qRT-PCR analysis in Normal matched breast tumor tissues. Oncol Res Featuring Preclinical Clin Cancer Ther. 2009;17(8):353–65.

    Google Scholar 

  26. Maltseva DV, Khaustova NA, Fedotov NN, Matveeva EO, Lebedev AE, Shkurnikov MU, Galatenko VV, Schumacher U, Tonevitsky AG. High-throughput identification of reference genes for research and clinical RT-qPCR analysis of breast cancer samples. J Clin Bioinformatics. 2013;3(1):13.

    Article  CAS  Google Scholar 

  27. Kheirelseid EAH, Chang KH, Newell J, Kerin MJ, Miller N. Identification of endogenous control genes for normalisation of real-time quantitative PCR data in colorectal cancer. BMC Mol Biol. 2010;11(1):12.

    Article  Google Scholar 

  28. Sørby LA, Andersen SN, Bukholm IRK, Jacobsen MB. Evaluation of suitable reference genes for normalization of real-time reverse transcription PCR analysis in colon cancer. J Exp Clin Cancer Res. 2010;29(1):144.

    Article  Google Scholar 

  29. Kim S, Kim T. Selection of optimal internal controls for gene expression profiling of liver disease. Biotechniques. 2003;35(3):456–458, 460.

    Article  CAS  Google Scholar 

  30. Gao Q, Wang XY, Fan J, Qiu SJ, Zhou J, Shi YH, Xiao YS, Xu Y, Huang XW, Sun J. Selection of reference genes for real-time PCR in human hepatocellular carcinoma tissues. J Cancer Res Clin Oncol. 2008;134(9):979–86.

    Article  CAS  Google Scholar 

  31. Fu LY, Jia HL, Dong QZ, Wu JC, Zhao Y, Zhou HJ, Ren N, Ye QH, Qin LX. Suitable reference genes for real-time PCR in human HBV-related hepatocellular carcinoma with different clinical prognoses. BMC Cancer. 2009;9:49.

    Article  Google Scholar 

  32. Liu S, Zhu P, Zhang L, Ding S, Zheng S, Wang Y, Lu F. Selection of reference genes for RT-qPCR analysis in tumor tissues from male hepatocellular carcinoma patients with hepatitis B infection and cirrhosis. Cancer Biomark. 2013;13(5):345–9.

    Article  CAS  Google Scholar 

  33. Cicinnati VR, Shen Q, Sotiropoulos GC, Radtke A, Gerken G, Beckebaum S. Validation of putative reference genes for gene expression studies in human hepatocellular carcinoma using real-time quantitative RT-PCR. BMC Cancer. 2008;8:350.

    Article  Google Scholar 

  34. Gresner P, Gromadzinska J, Wasowicz W. Reference genes for gene expression studies on non-small cell lung cancer. Acta Biochim Pol. 2009;56(2):307–16.

    Article  CAS  Google Scholar 

  35. Sharungbam GD, Schwager C, Chiblak S, Brons S, Hlatky L, Haberer T, Debus J, Abdollahi A. Identification of stable endogenous control genes for transcriptional profiling of photon, proton and carbon-ion irradiated cells. Radiat Oncol. 2012;7:70.

    Article  CAS  Google Scholar 

  36. Saviozzi S, Cordero F, Lo Iacono M, Novello S, Scagliotti GV, Calogero RA. Selection of suitable reference genes for accurate normalization of gene expression profile studies in non-small cell lung cancer. BMC Cancer. 2006;6:200.

    Article  Google Scholar 

  37. Zhan C, Zhang Y, Ma J, Wang L, Jiang W, Shi Y, Wang Q. Identification of reference genes for qRT-PCR in human lung squamous-cell carcinoma by RNA-Seq. Acta Biochim Biophys Sin Shanghai. 2014;46(4):330–7.

    Article  CAS  Google Scholar 

  38. Jung M, Ramankulov A, Roigas J, Johannsen M, Ringsdorf M, Kristiansen G, Jung K. In search of suitable reference genes for gene expression studies of human renal cell carcinoma by real-time PCR. BMC Mol Biol. 2007;8:47.

    Article  Google Scholar 

  39. Dupasquier S, Delmarcelle AS, Marbaix E, Cosyns JP, Courtoy PJ, Pierreux CE. Validation of housekeeping gene and impact on normalized gene expression in clear cell renal cell carcinoma: critical reassessment of YBX3/ZONAB/CSDA expression. BMC Mol Biol. 2014;15:9.

    Article  Google Scholar 

  40. Ohl F, Jung M, Xu C, Stephan C, Rabien A, Burkhardt M, Nitsche A, Kristiansen G, Loening SA, Radonic A, et al. Gene expression studies in prostate cancer tissue: which reference gene should be selected for normalization? J Mol Med (Berl). 2005;83(12):1014–24.

    Article  CAS  Google Scholar 

  41. Souza AF, Brum IS, Neto BS, Berger M, Branchini G. Reference gene for primary culture of prostate cancer cells. Mol Biol Rep. 2013;40(4):2955–62.

    Article  CAS  Google Scholar 

  42. Weber R, Bertoni AP, Bessestil LW, Brasil BM, Brum LS, Furlanetto TW. Validation of reference genes for normalization gene expression in reverse transcription quantitative PCR in human normal thyroid and goiter tissue. Biomed Res Int. 2014;2014:198582.

    PubMed  PubMed Central  Google Scholar 

  43. Lallemant B, Evrard A, Combescure C, Chapuis H, Chambon G, Raynal C, Reynaud C, Sabra O, Joubert D, Hollande F, et al. Reference gene selection for head and neck squamous cell carcinoma gene expression studies. BMC Mol Biol. 2009;10:78.

    Article  Google Scholar 

Download references


The authors are grateful to valuable comments and suggestions of the reviewers.


This work was supported by research grants from the Bio-Synergy Research Project (NRF-2015M3A9C4075820) of the Ministry of Science, ICT and Future Planning through the National Research Foundation to C.P. This study was part of the project titled “Development of the methods for controlling and managing the marine ecosystem disturbing and harmful organisms (MEDHO)”, funded by the Ministry of Oceans and Fisheries of the Republic of Korea to C.P. Publication costs are funded by the project titled “Research center for fishery resource management based on the information and communication technology (ICT)”, funded by the Ministry of Oceans and Fisheries of the Republic of Korea to C.P.

Availability of data and materials

All data for this study are downloaded from TCGA ( public database and included in Tables, Supplementary tables, and Additional files.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 20 Supplement 10, 2019: Proceedings of the 12th International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO 2018). The full contents of the supplement are available online at

Author information

Authors and Affiliations



CP and JJ designed research. CP, KKK, and SYC contributed to the research coordination. JJ, SC, JO, and SGL performed research. CP, KKK, SYC, JJ, and SC analyzed data. CP, JJ, and KKK wrote the paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Song Yi Choi, Kee K. Kim or Chungoo Park.

Ethics declarations

Ethics approval and consent to participate

The Institutional Review Board (IRB) of Chungnam National University approved the use of human tissues in the present study (IRB number 2016–08-032). All utilized human specimens and data were provided by the Biobank of Chungnam University Hospital (Korea Biobank Network).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Table S1. Primers used for quantitative analysis of gene expression. Table S4. Gene expression variability of newly identified reference genes. (DOCX 28 kb)

Additional file 2:

Figure S1. qPCR electrophoresis result and melting curve analysis of our reference genes. (A) Agarose gel electrophoresis showing specific reverse transcription PCR products of the expected size for each gene. (B) Melting curves generated for all genes. (TIFF 34570 kb)

Additional file 3:

Table S2. Gene expression variability of commonly used and experimentally selected reference genes in each cancerous and normal group. (XLSX 24 kb)

Additional file 4:

Figure S2. Nine cancer types. Nine cancer types from TCGA comprising both cancerous and matched normal data with > 40 samples. (TIFF 3075 kb)

Additional file 5:

Table S3. Top 20 candidate reference genes in each cancer type. (XLSX 65 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jo, J., Choi, S., Oh, J. et al. Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research. BMC Bioinformatics 20 (Suppl 10), 245 (2019).

Download citation

  • Published:

  • DOI: