- Methodology article
- Open Access
Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data
© Seon-Young and Kim; licensee BioMed Central Ltd. 2006
Received: 09 November 2005
Accepted: 04 July 2006
Published: 04 July 2006
A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast.
We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs) were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis.
By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding some of the important transcriptional regulatory elements of human genes.
With the genome sequences of many organisms completed, revealing the regulatory mechanisms of gene expression is the important aspect of genomics . Recent innovative technologies such as microarray and chromatin immunoprecipitation combined with chip (ChIP – CHIP), and the whole genome sequencing of many organisms are producing enormous amounts of data that are useful in elucidating the transcriptional regulatory mechanisms of genes. Whole genome sequences provide information on the cis-acting regulatory elements of each gene. Gene expression data provide information on how the expression of each gene changes in a given condition, and the combination of chromatin immunoprecipitation (ChIP) with chip technology provides genome wide binding information concerning a transcription factor .
Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence data and gene expression data [3–6, 8–12]. In one branch, a comparative sequence analysis of noncoding regulatory elements has helped to find new regulatory elements within many genes. New motifs have been discovered from evolutionarily conserved regions , from a list of co-regulated genes , or a list of functionally related genes [15, 16]. Others have developed diverse algorithms that combine diverse sources of data to predict transcriptional regulatory mechanisms. To mention a few, Bussemaker et al. used a linear model to combine gene expression data with putative regulatory motifs and predicted significant regulatory elements . Beer et al.  used probabilistic modeling in conjunction with diverse gene expression data and showed that regulatory elements can successfully predict the expression of certain genes. Bar-Joseph et al., and Gao et al. combined binding data with gene expression data to identify regulatory networks [7, 9]. Others have inferred transcriptional elements by correlating the amount of transcription factor itself and its target genes [5, 8, 10].
However, most of above mentioned studied involved yeast which has much simpler regulatory networks than the human and has many genome wide binding data and gene expression data under diverse conditions [2, 8, 9, 12]. Studies of genome wide transcriptional networks of human genomes are far behind those of yeast. A few studies reported on the development of tools that aids researchers in identifying putative transcriptional regulatory elements from a given gene expression study, but are not suitable for a meta-analysis of many gene expression studies [11, 17].
Here, we report on a new computational method in which gene expression data analysis is combined with promoter analysis to infer the transcriptional regulatory elements of human genes. Our method is similar to Gao et al.'s approach in the use of correlation across multiple conditions , but is different in that this method used the composite expression of genes having the same predicted TFBSs rather than binding data which are available for only a few transcription factors in the human. The method, although simple in concept and calculation, was used to successfully identify many known TFBSs of genes and to predict many putative TFBSs that are worthy of further study.
Analysis in one dataset
Z-scores of TFBSs in an HIV-1 gp120 treated macrophage data set.
Identification of TFBSs involved in tissue specific gene expression
Analysis in multiple data sets
Selection of the optimal overall similarity cut-off value for each MatInspector position weight matrix (PWM)
Estimation of the accuracy of TFBSs prediction by comparing with known transcription regulatory elements in TRED database
Our method predicted many TFBSs for each gene (8845 genes in the U95A data set and 12803 genes in the U133A data set). The overall prediction rate was 21.4% (U95A) and 21.5% among a total of 190 * 8845 TFBSs (U95A) and 190 * 12803 genes (U133A) at a false discovery rate (q-value) of 0.05. We estimated the accuracy of predicting the TFBSs by a comparison of known transcriptional regulatory elements in the TRED database that contains gene transcriptional regulation information including TFBSs with available experimental evidence . Among the four levels of experimental evidence (known, likely, maybe, and predicted) in the TRED database, we used only 'known' evidence that was validated by a literature search .
Accuracy of the prediction of TFBSs: comparison with known TFBSs of genes.
Previous studies have suggested that gene Y contains a binding site for transcription factor X if gene expression changes of transcription factor X and gene expression changes of gene Y are significantly correlated with each other over multiple data sets [5, 10, 26]. Our method is different from previous studies in that the Z scores of gene sets of TFBSs rather than changes in gene expression of each TF were used. To determine how our method performs compared with previous methods, we evaluated the performance of the method for predicting transcriptional regulatory elements from a correlation between the gene expression changes of each TF and each gene using the same data sets and a statistical testing procedure. We found that 25.8% of the known TFBSs among 1366 genes were predicted for the U95 data set and 26.8% of known TFBSs among 1450 genes were predicted for the U133 data set (Table 3).
Evaluation by comparing two independent predictions from two different data sets
Evaluation by comparing two independent predictions using two different data sets
U95A vs. U133A
Examples of a genome-wide prediction of TFBSs: NFκB
In this study, a computational approach is proposed for predicting the transcriptional regulatory elements of individual human genes using both gene expression data sets and promoter sequences in a genome-wide manner. Our approach uses our recently developed tool, parametric analysis of gene set enrichment, which produces a Z score which is useful in the analysis of multiple gene expression data sets.
An important issue encountered in predicting TFBSs from a promoter sequence with position weight matrices is to select an optimal cut-off value for a matrix similarity. Previous researchers, although recognizing its importance, didn't systematically select optimal cut-off values for each TFBS, but applied merely two or three different levels of cut-off values (for example, 0.8, 0.85, and 0.9) to all TFBSs . When we varied the cut-off values from 0.7 to 1.0 in increments of 0.02, we found that the optimal cut-off value for each TFBS varied widely from 0.72 to 1.00, showing the importance of a systemic approach in the selection of optimal cut-off values (Figure 3 and Additional file 2). However, a few points are worth mentioning. First, there may be a concern that using the most stringent cut-off value would lead to a smaller number of genes in a gene set and, as a result, predicted TFBSs would be restricted only to those which were used to create the initial gene sets (a circularity problem). We tested whether such a problem actually occurs but found that many TFBSs that were not included in the initial gene set were predicted even when the most stringent cut-off value was used (data not shown). Second, in cooperative binding, which is prevalent in many cooperating transcriptional modules, one factor can have an especially weak binding site escaping any type of statistical detection. When the most stringent cut-off value is used, our method is likely to miss this weak binding site. Thus, it may be helpful to try a few less stringent cut-off values to avoid missing weak binding sites or reduce false negative predictions.
Another important point when transcriptional regulatory networks are inferred from gene expression data is that many transcription factors (TFs) are regulated by posttranscriptional as well as transcriptional mechanisms. Thus, some TFs exert their altered activity on target genes through changes in the amount of their mRNA, while other TFs utilize other mechanisms such as nuclear translocation, phosphorylation, proteolytic degradation, or interaction with small ligands . Therefore, recent studies that have focused on TFBSs itself rather than TFs have enjoyed great success . To determine which methods are better in identifying human transcriptional regulatory networks, we compared two kinds of measures of transcriptional activity, Z scores of gene sets of TFBSs and the amount of TF mRNA itself. Our results showed better performance for the Z scores of gene sets of TFBS over TF mRNA levels (Table 3), suggesting that Z scores of gene sets of TFBS might reflect diverse mechanisms in changes in TFs in the cell and might be better suited to infer transcriptional networks than the amount of TF mRNA.
We computationally validated our prediction of TFBSs by observing the number of experimentally known TFBSs in the TRED database that could be predicted by our method. While we used TRED database in this work, we should mention that there are more complete, literature-based, but commercial, databases (for example, Genomatix Suite) available. The results of validation of predicted TFBSs using the TRED database showed a successful prediction rate of 43.1% (U95A) and 43.9% (U133A) at a false discovery rate (q-value) of 0.05. This corresponds to false negative rate of 56.9% and 56.1%. The second validation (Figure 4, Table 4) analysis showed that our method for predicting TFBSs from gene expression data was able to extract real signals from noise irrespective of the data set used. The two data sets (U95A and U133A) we used were from different platforms, have different gene contents, and, above all, involved different experimental conditions, but showed high correlations with each other between TFBSs predictions. This suggests that it is possible to consistently infer transcriptional regulatory elements, irrespective of the data sets used. This also suggests that cells use a limited number of transcription regulatory elements to adjust themselves to diverse environmental conditions. The combinatorial nature of transcription factors is one way to ensure an effective adaptation to diverse conditions, and is utilized in many genes. Many researchers have applied the combinatorial nature of transcription factors to the computational prediction of transcriptional networks with great success [12, 27]. We plan to adopt the combinatorial analysis to our method and expect to further improve this method.
Many genes are regulated by different TFBSs under different conditions. With enough data sets in diverse conditions, our approach should identify different TFBSs under different conditions in regulating gene expression. We tested if our approach was able to identify different TFBSs under different conditions on a few selected genes and actually found that phenomena (data not shown). At present, we didn't systematically analyze the two data sets (U95A and U133A) to identify such condition-dependent TFBSs because data sets included in this study didn't have diverse experimental conditions, but we think that identifying condition-dependent TFBSs is an important work that should be achieved when enough data sets are available.
We understand that a successful prediction rate of 43.1% and 43.9% is far from satisfactory, but considering several limitations in our approach, the method is promising. To mention a few limitations, we restricted our analysis of promoter sequence to 1200 bp (between -1000 bp and +200 bp relative to the transcription start site) of a gene, but many regulatory elements in human genes, in contrast to yeast genes, reside outside this proximal promoter region. The second limitation is that a sufficient number of gene expression data sets are not currently available to include the diversity of conditions needed. We used 127 and 138 conditions with two platforms, while yeast researchers are able to use more than 1,000 conditions in a computational study .
A correlation between the Z scores of gene sets of TFBSs, produced by gene set analysis, and the fold changes in gene expression across multiple conditions permitted successful identification of many functionally important TFBSs of human genes. We successfully identified many known TFBSs of human genes and predicted numerous TFBSs of genes that are worthy of further study. We also showed that the Z scores of gene sets of TFBSs better represented changes in the activity of TFs in a cell than transcription factor mRNA itself. In a single gene expression data set, our method was able to identify transcription regulatory elements that caused the gene expression changes that are observed for many genes. Elucidating the regulatory elements of entire genomes is the next important task in genomics and requires innovations in both experimental techniques and computational methods. We hope our approach will aid in decoding the important transcriptional regulatory elements of genes by strategically combining gene expression data with genomic sequence data.
Promoter and prediction of TFBS
Human promoter sequences for which transcription start sites are accurately known were downloaded from the DBTSS (Database of Transcriptional Start Sites)  which contains upstream sequences at -1000 to +200 relative to the transcription start site . TFBSs were predicted using the MatInspector program with position weight matrices (PWMs) of the TransFac database (ver. 3.0) included in the MatInspector program . We set the MatInspector core similarity to 0.75 and varied the overall matrix similarity from 0.7 to 1.0 in 0.02 increments for each TFBS. We used TRED (Transcriptional Regulatory Element Database) database  to obtain a list of known transcriptional regulatory elements for genes .
Gene expression data sets and data analysis
The microarray data sets used in this study were downloaded from the Gene Expression Omnibus (GEO) website . We used only data sets calculated using MAS5 (microarray suite 5) algorithm to ensure the same processing of all data sets . The list of data sets is given in Additional file 1. The microarray data set describing the gene expression changes of macrophages treated with HIV-1 gp120 was generously provided by Dr. Cicala . We used data sets GDS181, GDS422-6, GDS596, GDS1985-8, and GDS1096 to study gene expression in normal human tissues. Each data set was analyzed as follows. First, each sample within a gene expression data set (GDS) was normalized by the global mean of each sample to obtain a global mean of 1000. Signal values lower than 100 were then increased to 100 and the log base 2 was taken. All subsequent calculations were done using log2-transformed values. For a gene with multiple probes, we took the mean value of the multiple probes. Our data sets encompassed various experimental conditions including a comparison between two groups (for example, tumor vs. normal), a comparison among multiple conditions, and time course experiments. We calculated the log2-transformed fold change values between two groups. When one data set had multiple experimental conditions, each condition was regarded as a separate data set in calculating the log2-transformed fold change values. We chose to analyze data sets of the Affymetrix U95A or U133A platforms because many data sets are available for those two platforms.
Parametric analysis of gene set enrichment
One hundred ninety different gene sets for TFBSs were constructed from the predicted TFBSs for 14776 human promoters. We calculated the composite expression of genes having the same predicted TFBS (hereafter referred to as the Z score) for each TFBS in each microarray data set using gene set analysis . The Z score in our analysis is defined as
, where X is the mean of fold change values of genes having the same predicted TFBS, μ the mean of fold change values of total genes in a data set, and δ the standard deviation for the fold change values of total genes in a data set, and n the size of the gene set. The Z score serves as a measure of how far the composite expression of genes having the same predicted TFBS deviates from the mean of the fold change values of the total genes in a given data set. The correlation between Z scores and fold change values among multiple microarray data sets was calculated using Pearson's correlation coefficient. The significance of each correlation coefficient was inferred from a t-test using the following mathematical formulae.
When the number of samples is n and Pearson's correlation coefficient is r:
The statistical significance of the t-value is evaluated using the t-test with n-2 degrees of freedom [9, 35]. One possible concern in our approach is that the Z score and the fold change for a gene expression data for which is included in the calculation of the Z score are, strictly speaking, not independent variables. However, because each gene set is large (see Additional file 2), we consider that this lack of independence is not a serious practical concern. Java Treeview was used to visually represent the matrix of t- scores over all TFBSs and genes. The method of false discovery rates was used to adjust p values for multiple hypothesis testing . The adjusted q values were calculated using the qvalue package of the Bioconductor project .
Validation of predicted TFBSs
We validated our predicted TFBSs in two ways. We first calculated the number of known transcriptional regulatory elements of genes in the TRED database that could be successfully predicted by our method . Second, we used two independent gene expression data sets (U95A and U133A) in predicting the TFBSs, and compared the extent to which two predictions were correlated with each other.
The authors thank Dr. Young-Il Yeom for his valuable comments on the manuscript, Drs. C. Cicala (NIAID) and R. Lempiki (NIAID) for providing their microarray data, and anonymous reviewers for their thoughtful comments. This work was supported by the grant FG05-21-01 (Y.S.K) of the 21C Frontier Functional Human Genome Project from Ministry of Science & Technology of Korea.
- Consortium EP: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004, 306(5696):636–640. 10.1126/science.1105136View ArticleGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799–804. 10.1126/science.1075090View ArticlePubMedGoogle Scholar
- Siggia ED: Computational methods for transcriptional regulation. Curr Opin Genet Dev 2005, 15(2):214–221. 10.1016/j.gde.2005.02.004View ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
- Birnbaum K, Benfey PN, Shasha DE: cis element/transcription factor analysis (cis/TF): a method for discovering transcription factor/cis element relationships. Genome Res 2001, 11(9):1567–1573. 10.1101/gr.158301PubMed CentralView ArticlePubMedGoogle Scholar
- Bussemaker HJ, Li H, Siggia ED: Regulatory element detection using correlation with expression. Nat Genet 2001, 27(2):167–171. 10.1038/84792View ArticlePubMedGoogle Scholar
- Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK: Computational discovery of gene modules and regulatory networks. Nat Biotechnol 2003, 21(11):1337–1342. 10.1038/nbt890View ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003, 34(2):166–176.View ArticlePubMedGoogle Scholar
- Gao F, Foat BC, Bussemaker HJ: Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics 2004, 5(1):31. 10.1186/1471-2105-5-31PubMed CentralView ArticlePubMedGoogle Scholar
- Haverty PM, Hansen U, Weng Z: Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification. Nucleic Acids Res 2004, 32(1):179–188. 10.1093/nar/gkh183PubMed CentralView ArticlePubMedGoogle Scholar
- Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, Wasserman WW: oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 2005, 33(10):3154–3164. 10.1093/nar/gki624PubMed CentralView ArticlePubMedGoogle Scholar
- Beer MA, Tavazoie S: Predicting gene expression from sequence. Cell 2004, 117(2):185–198. 10.1016/S0092-8674(04)00304-6View ArticlePubMedGoogle Scholar
- Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE: Human-mouse genome comparisons to locate regulatory sites. Nat Genet 2000, 26(2):225–228. 10.1038/79965View ArticlePubMedGoogle Scholar
- Roth FP, Hughes JD, Estep PW, Church GM: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 1998, 16(10):939–945. 10.1038/nbt1098-939View ArticlePubMedGoogle Scholar
- Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res 2003, 13(5):773–780. 10.1101/gr.947203PubMed CentralView ArticlePubMedGoogle Scholar
- Liu R, McEachin RC, States DJ: Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. Genome Res 2003, 13(4):654–661. 10.1101/gr.911803PubMed CentralView ArticlePubMedGoogle Scholar
- Cole SW, Yan W, Galic Z, Arevalo J, Zack JA: Expression-based monitoring of transcription factor activity: the TELiS database. Bioinformatics 2005, 21(6):803–810. 10.1093/bioinformatics/bti038View ArticlePubMedGoogle Scholar
- Kim SY, Volsky DJ: PAGE: Parametric Analysis of Gene set Enrichment. BMC Bioinformatics 2005, 6(1):144. 10.1186/1471-2105-6-144PubMed CentralView ArticlePubMedGoogle Scholar
- Mootha VK, Handschin C, Arlow D, Xie X, St Pierre J, Sihag S, Yang W, Altshuler D, Puigserver P, Patterson N, Willy PJ, Schulman IG, Heyman RA, Lander ES, Spiegelman BM: Erralpha and Gabpa/b specify PGC-1alpha-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proc Natl Acad Sci U S A 2004, 101(17):6570–6575. 10.1073/pnas.0401401101PubMed CentralView ArticlePubMedGoogle Scholar
- Cicala C, Arthos J, Selig SM, Dennis G Jr, Hosack DA, Van Ryk D, Spangler ML, Steenbeke TD, Khazanie P, Gupta N, Yang J, Daucher M, Lempicki RA, Fauci AS: HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc Natl Acad Sci U S A 2002, 99(14):9380–9385. 10.1073/pnas.142287999PubMed CentralView ArticlePubMedGoogle Scholar
- Choe W, Volsky DJ, Potash MJ: Induction of rapid and extensive beta-chemokine synthesis in macrophages by human immunodeficiency virus type 1 and gp120, independently of their coreceptor phenotype. J Virol 2001, 75(22):10738–10745. 10.1128/JVI.75.22.10738-10745.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Choe W, Volsky DJ, Potash MJ: Activation of NF-kappaB by R5 and X4 human immunodeficiency virus type 1 induces macrophage inflammatory protein 1alpha and tumor necrosis factor alpha in macrophages. J Virol 2002, 76(10):5274–5277. 10.1128/JVI.76.10.5274-5277.2002PubMed CentralView ArticlePubMedGoogle Scholar
- Hoffmann E, Dittrich-Breiholz O, Holtmann H, Kracht M: Multiple control of interleukin-8 gene expression. J Leukoc Biol 2002, 72(5):847–855.PubMedGoogle Scholar
- DeGregori J, Kowalik T, Nevins JR: Cellular targets for activation by the E2F1 transcription factor include DNA synthesis- and G1/S-regulatory genes. Mol Cell Biol 1995, 15(8):4215–4224.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhao F, Xuan Z, Liu L, Zhang MQ: TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies. Nucleic Acids Res 2005, 33(Database):D103–107. 10.1093/nar/gki004PubMed CentralPubMedGoogle Scholar
- Zhu Z, Pilpel Y, Church GM: Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. J Mol Biol 2002, 318(1):71–81. 10.1016/S0022-2836(02)00026-8View ArticlePubMedGoogle Scholar
- Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001, 29(2):153–159. 10.1038/ng724View ArticlePubMedGoogle Scholar
- Tanay A, Sharan R, Kupiec M, Shamir R: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci U S A 2004, 101(9):2981–2986. 10.1073/pnas.0308661100PubMed CentralView ArticlePubMedGoogle Scholar
- Suzuki Y, Yamashita R, Nakai K, Sugano S: DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res 2002, 30(1):328–331. 10.1093/nar/30.1.328PubMed CentralView ArticlePubMedGoogle Scholar
- Quandt K, Frech K, Karas H, Wingender E, Werner T: MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 1995, 23(23):4878–4884.PubMed CentralView ArticlePubMedGoogle Scholar
- TRED database[http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=home]
- GEO (Gene Expression Omnibus)[http://www.ncbi.nlm.nih.gov/projects/geo/]
- Affymetrix: Microarray Suite User Guide. Santa Clara, CA; 2001.Google Scholar
- Bailey NTJ: Statistical Methods in Biology. 3rd edition. Cambridge: Cambridge University Press; 1995.View ArticleGoogle Scholar
- Saldanha AJ: Java Treeview – extensible visualization of microarray data. Bioinformatics 2004, 20(17):3246–3248. 10.1093/bioinformatics/bth349View ArticlePubMedGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003, 100(16):9440–9445. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.