- Open Access
Correlation analysis reveals the emergence of coherence in the gene expression dynamics following system perturbation
- Nicola Neretti†1, 2,
- Daniel Remondini†2, 7,
- Marc Tatar3,
- John M Sedivy4,
- Michela Pierini2,
- Dawn Mazzatti5,
- Jonathan Powell5,
- Claudio Franceschi2, 6 and
- Gastrone C Castellani1, 2, 7Email author
© Neretti et al; licensee BioMed Central Ltd. 2007
Published: 8 March 2007
Time course gene expression experiments are a popular means to infer co-expression. Many methods have been proposed to cluster genes or to build networks based on similarity measures of their expression dynamics. In this paper we apply a correlation based approach to network reconstruction to three datasets of time series gene expression following system perturbation: 1) Conditional, Tamoxifen dependent, activation of the cMyc proto-oncogene in rat fibroblast; 2) Genomic response to nutrition changes in D. melanogaster; 3) Patterns of gene activity as a consequence of ageing occurring over a life-span time series (25y–90y) sampled from T-cells of human donors.
We show that the three datasets undergo similar transitions from an "uncorrelated" regime to a positively or negatively correlated one that is symptomatic of a shift from a "ground" or "basal" state to a "polarized" state.
In addition, we show that a similar transition is conserved at the pathway level, and that this information can be used for the construction of "meta-networks" where it is possible to assess new relations among functionally distant sets of molecular functions.
Time series of gene expression data from high throughput experiments have been used to infer networks of co-expressed genes. By following the changes in expression at the genomic level, it is possible to identify groups of genes with a similar expression pattern. Most of the techniques currently used in functional genomics have been adapted from machine learning and statistical inference. Some of them generate networks of genes; others simply generate clusters of genes. Examples of the latter are algorithms based on Self-Organizing Maps (SOM) [1–4], Phylogenic-Type Trees [5–10], agglomerative clustering and partitional clustering. For network determination, techniques have been developed based on differential equations , Bayesian networks , hybrid petri networks , Boolean regulatory networks [14, 15].
Relevance networks [16, 17] are a popular method for the analysis of time series of expression levels. The basic idea is to construct a network of similarity of the genes' expression patterns. Several similarity measures have been utilized, such as correlation and mutual information. This technique can then represent multiple connections between genes, and capture negative as well as positive correlations. Once the matrix containing the similarity measure for all pairs of genes has been computed, a threshold is used to define the links in the network. Network validation can be obtained by permutation testing, i.e. by randomly shuffling the time points independently for each gene.
A similar approach has been applied to metabolic networks [18, 19]. The authors computed metabolite correlations to infer changes in regulation using samples from different physiological states. On the other hand, we focused on the analysis of time series of expression levels for genes that were selected for differential expression between treatment and control.
An alternative approach is offered by Graphical Gaussian Models (GGM) that use partial correlation as a measure of independence between two genes. Partial correlations are related to the inverse of the correlation matrix, and in GGMs missing edges indicate conditional independence. One of the biggest problems with GGMs is that the number of genes is usually much larger than the number of samples (e.g. time points), such that the correlation matrix is usually singular and cannot be inverted. Different approaches have been proposed to circumvent this problem: restrict the number of genes analyzed to less than the number of samples [20–22]; use partial correlation coefficients of limited order [23–25]; approach the matrix inversion as an ill-posed inverse problem through regularization methods (usually via empirical Bayes, such as variance reduction) [26, 27].
Although co-expression is not a direct indication of co-regulation, it is a very useful tool that can be used to interpret the effect of a perturbation in eliciting different phenotypes when combined with an ontology analysis. Here, we use a correlation based method to generate co-expression networks on three different datasets and we characterize the change in its structural properties induced by the system's perturbation. We then use a correlation-based analysis to infer how the perturbation affects the system at the level of metabolic and signalling pathways.
We considered three datasets of time course gene expression arrays: 1) Conditional, Tamoxifen dependent, activation of the cMyc proto-oncogene in rat fibroblast; 2) Genomic response to nutrition changes in D. melanogaster; 3) Patterns of gene activity as a consequence of ageing occurring over a life-span time series (25y–90y) sampled from T-cells of human donors.
Due to the heterogeneous nature and properties of the datasets, we selected the significant genes according to different criteria.
For the cMyc dataset we applied a two way ANOVA (time and treatment as variability factors) to identify genes whose expression pattern was significantly affected by cMyc activation. With this method we identified a set of 1,191 significant genes out of a total of 8,799 .
The D. melanogaster diet arrays provide a higher resolution dataset and genes were selected via GeneTrace, which looks for change points in the time series of the expression ratios between the two cohorts. GeneTrace identified 3,519 genes with significant change point. These results showed that physiological response to nutrient uptake involves a rapid change in transcriptional profile at a global scale, and that most of the changes are small (81% of the 3,519 ratios smaller than 1.5 fold).
For the human aging dataset we applied one way ANOVA (with donors' age as variability factor) with a P value < 0.01 that resulted in a set of 768 probesets selected out of a total of 14,688 probesets.
Correlation network analysis
When cMyc is activated by Tamoxifen, the activity profile of the probesets clearly changes into a strongly correlated regime. These findings are reflected in the histograms of the correlation coefficients for the N- control and T- treatment data sets (Figure 1) and in the main parameters of the connectivity distributions obtained from the corresponding adjacency matrices (Table 1). The adjacency matrix characterizing the network was obtained by considering only the correlation coefficients whose absolute value exceeded a threshold fixed between 0.95 and 0.99 (we remark that the lower threshold value was higher than the value requested for a P < 0.05 statistical significance of the correlation coefficients). The results shown in this paper were obtained for a threshold equal to 0.98, but similar results held for the [0.95–0.99] interval. These coefficients were set equal to 1, producing a symmetric adjacency matrix a lr . For each gene connectivity degree k was defined as the total number of genes it was connected to, i.e.
Network parameters. Comparison of principal network parameters for the N and T datasets
Standard deviation, σ(k)
Clustering coefficient c(k)
For the T-treatment dataset the number of coefficients close to +1 or -1 increases significantly. This finding indicates that many of the 1,191 genes, whose expression levels over time were affected by tamoxifen stimulation, became either strongly correlated or anti-correlated.
The role of noise was taken in to account by analyzing the correlation coefficient distribution obtained from a dataset of randomly generated vectors of the same size than the experimental datasets (e.g. for the c-Myc dataset: a 5 × 1191 array of values sampled from the Standardized Gaussian Distribution ). The resulting distribution strictly resambles that obtained with the no Tamoxifen dataset.
List of pathways for D. melanogaster dataset. List of pathways (from the KEGG database) that include at least 20% of genes whose expression pattern has changed significantly upon re-feeding in the D. melanogaster dataset
Pentose phosphate pathway
Fructose and mannose metabolism
Ascorbate and aldarate metabolism
Fatty acid biosynthesis (path 2)
Fatty acid metabolism
Bile acid biosynthesis
Alanine and aspartate metabolism
Glycine, serine and threonine metabolism
Valine, leucine and isoleucine degradation
Valine, leucine and isoleucine biosynthesis
Arginine and proline metabolism
Starch and sucrose metabolism
Inositol phosphate metabolism
1- and 2- Methylnaphthalene degradation
Benzoate degradation via CoA ligation
One carbon pool by folate
Limonene and pinene degradation
Aminoacyl- tRNA biosynthesis
Phosphatidylinositol signaling system
In all three data sets different and specific genes are affected by the perturbations but give rise to a common correlation profile (Figures 1, 2, 3). This suggests that, despite their specificity, the global changes in gene expression follow a pattern that is largely independent from the data sets. This results could be explained by assuming that temporal changes in gene expression in biological complex systems is scale-independent and characterized by changes in an initial triggering core of genes (specific to the perturbation and the cell type) followed by a propagation to other genes.
In particular, in the cMyc dataset we found that the correlation method identifies a list of genes containing many of the genes found by O'Connell et al. , as well as many genes that, to our knowledge, have not been identified before. This indicates the possibility that the cMyc regulatory network may be much larger than currently described. To rule out the presence of undetectable correlations between genes in the unperturbed state (i.e. when cMyc is not active), we performed a one-way analysis of variance on the unperturbed dataset to detect genes whose expression level changes significantly over time. Less than 3% of the genes was selected for differential expression over time, compared to more than 10% for the perturbed dataset (P < 0.05).
For the D. melanogaster dataset our analysis showed that refeeding induces a change in the expression patterns of thousands of genes. Since this analysis was performed at saturation, we estimate there are few false positives and few undetected changes. The analysis of the correlation between expression patterns reveals that many genes respond to re-feeding in a coordinated manner. In fact, upon re-feeding, the activity profile of the genes clearly changes to a stronglycorrelated/anticorrelated regime.
List of pathways for cMyc dataset. List of pathways (from KEGG database) that include at least 5% of genes whose expression pattern has changed significantly upon cMyc activation
Citrate cycle (TCA cycle)
Fructose and mannose metabolism
Fatty acid biosynthesis (path 2)
Fatty acid metabolism
Basal transcription factors
Synthesis and degradation of ketone bodies
MAPK signaling pathway
Calcium signaling pathway
Valine, leucine and isoleucine degradation
Phosphatidylinositol signaling system
Wnt signaling pathway
Toll-like receptor signaling pathway
Jak-STAT signaling pathway
Prostaglandin and leukotriene metabolism
Regulation of actin cytoskeleton
Insulin signaling pathway
1- and 2-Methylnaphthalene degradation
Glyoxylate and dicarboxylate metabolism
Amyotrophic lateral sclerosis (ALS)
Reductive carboxylate cycle (CO2 fixation)
Limonene and pinene degradation
Overall, our analysis revealed the emergence of positive links between MAPK, FA and GJ in following cMyc activation, an interesting insight given that the relation between electrical cell properties and the signaling system is, without a doubt, an important step for tumor establishment and progression. The increase in the anticorrelation of MAPK, FA and GJ with CS and INS/IGF may be interpreted as a new kind of control by CS and INS/IGF of the other three pathways.
Three different systems sharing the time series experimental design after a perturbation (cMyc conditional activation, refeeding after caloric restriction and effect of time in young and old donors) have been examined.
At the genomic scale, we showed that in all three datasets the correlation regime between genes selected by statistical significance analysis follow a similar behavior, which is compatible with the emergence of coherence in the gene expression dynamics following cell perturbation. We also showed that this behavior is conserved at the pathways scale for pathways that have been significantly affected by the perturbation.
In particular, our analysis revealed the emergence of links between a core set of pathways in the cMyc dataset which may play an important role for the comprehension of the early phenotypical changes following cMyc activation.
This method was successful in identifying changes in gene expression profiles related to the acute response to a perturbation both in model systems and in humans as well as in revealing the centrality and importance of selected pathways by its multiscale generalization.
1) cMyc dataset
Two gene expression data sets were extracted from the set of microarray experiments based on genetically engineered rat cell lines . The first data set (N data set) contains the gene expression data for the c -myc-/- MycER cell line treated with vehicle (ethanol) only. The second data set (T data set) contains the gene expression data collected after the addition of tamoxifen. Binding of tamoxifen to the estrogen receptor domain elicits a conformational change that allows the fusion protein to migrate to the nucleus and act as a transcription factor. Samples were harvested at five time points after the addition of tamoxifen to the culture medium: 1, 2, 4, 8, and 16 h. The entire experiment was repeated on three separate occasions, providing three independent measurements for each gene and each time point. Expression profiling was done by using the Affymetrix platform and U34A Gene Chips.
2) D. melanogaster diet dataset
Newly enclosed virgin females were maintained in yeast-free media until 4 days old and then transferred either to media with yeast (Y-treatment) or to control media without yeast (NY-control). Samples were collected every hour for the following 12 hours. Four additional samples were collected prior to refeeding and arbitrarily designated -4, -3, -2, and -1. Synchronization of the physiological state was obtained by imposing diet restriction in third instars. Affymetric gene chips were used to measure mRNA abundance at each hour for both Y-treatment and NY-controls. A time-ordered sequence of expression ratios was then computed for each gene in the array .
3) Human aging dataset
T-cells were extracted from peripheral blood of 20 healthy male human donors. Donors were age-stratified into 4 groups of 5 subjects each: 25–35 years old, 40–50 years old, 55–65 years old and 70–80 years old respectively. Custom-made microarrays (Unilever Labs, Colworth UK) with about 19000 human probes were performed in duplicate (dye-swap) for each subject.
1) cMyc dataset
A full factorial ANOVA was applied to each of the 8,799 probesets to identify those that significantly changed in their expression level in time between the two conditions (data set N versus data set T). Probesets with a P value corresponding to the change-in-treatment factor < 0.05 were considered to be significantly affected by the treatment (i.e., activation of Myc by tamoxifen). A total of 1,191 genes were selected using this criterion.
2) D. melanogaster diet dataset
When considering high resolution time series, change-point analysis  is a powerful tool for detecting subtle changes; it is robust to outliers and can control the overall error rate. The Tatar group has extended this technique to statistically detect changes for time-series microarray data (GeneTrace ). For each gene's time-series, the algorithm returns an estimated change-point (CUSUM estimator) and the two-tailed probability of this statistic from a sample of 10,000 random permutations across all of the ordered expression ratios. GeneTrace identified 3,519 genes with significant change point (significance threshold = 0.005, expected false positive inferences 70 out of 14,000).
3) Human aging dataset
One-way ANOVA was applied to each of the 14,688 probesets to identify those that significantly change their expression level in time. With a P value of 0.01, 768 genes were selected for further analysis.
The similarity measure for the expression dynamics of two genes within the same data set is given by the correlation between the two expression-level time series. For a given data set, if x lj is the expression level of a gene with label l at time j, then the similarity between two genes with labels l and r, respectively, is given by:
where μl and μr are the averages in time of the expression levels for the two genes, and σl and σr are their standard deviations.
Time reshuffling was used to test the time sequence dependence of the results obtained by the two techniques. By randomly shuffling the time series for each gene separately, time relationships between expression levels are broken, but the mean and standard deviation for each gene are unaltered. Properties of the gene network that truly depend on the expression level dynamics should be significantly affected by a random shuffling in time.
We ranked pathways based on the percentage of genes selected by the significance analysis in each one of them. To account for the different number of measured genes in different pathways, we computed the 95% confidence limits for the population proportions (p) of significant genes in each pathway. Using a relationship between the F distribution and the binomial distribution , the confidence interval can be computed for the binomial parameter p [35–37]. The lower confidence limit L1 and the upper confidence limit L2 become:
where n is the total number of genes measured in a pathway, and X is the number of significant genes in that pathway.
We selected a 20% threshold for the Drosophila array and 5% threshold for the rat array. In fact, the Drosophila array contained 14,688 probesets corresponding to more than 90% of the fly's genome. Hence the analysis was done at saturation. The rat array used in the cMyc experiments contained 8,799 probesets corresponding to roughly 1/3 of the rat's genome. Requiring a 20% change in the cMyc experiment would result in a very small list of pathways. However, it is well known that cMyc affects the expression of a large pool of genes (directly or indirectly). Our interpretation is that the same cMyc experiment done at saturation would reflect this at the pathway level as well, i.e. many pathways would more than 20% of the genes with a significant change in expression profile over time.
Links between selected pathways were generated by computing the correlation of the gene expression time series between genes in different pathways and generating a block-correlation matrix with entries that are respectively the correlation of the gene expression time series between the genes in pathways P i and P j .
This work was supported by an Italian INFN (Istituto Nazionale di Fisica Nucleare) FB11 grant, Italian MIUR FIRB RBAU01RRAZ, LITBIO and ITALBIONET grant, Italian "ex 60%" MURST grant and a US NIH/NIGMS R01 GM41690-17 grant.
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1.
- Soukas A, Cohen P, Socci ND, Friedman JM: Leptin-specific patterns of gene expression in white adipose tissue. Genes Dev 2000, 14(8):963–980.PubMed CentralPubMedGoogle Scholar
- Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999, 96(6):2907–2912. 10.1073/pnas.96.6.2907PubMed CentralView ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
- Toronen P, Kolehmainen M, Wong G, Castren E: Analysis on gene expression data using self-organizing maps. FEBS letters 1999, 451(2):142–146. 10.1016/S0014-5793(99)00524-4View ArticlePubMedGoogle Scholar
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503–511. 10.1038/35000501View ArticlePubMedGoogle Scholar
- Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, et al.: Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000, 406(6795):536–540. 10.1038/35020115View ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95(25):14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- Ewing RM, Ben Kahla A, Poirot O, Lopez F, Audic S, Claverie JM: Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 1999, 9(10):950–959. 10.1101/gr.9.10.950PubMed CentralView ArticlePubMedGoogle Scholar
- Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 2000, 24(3):227–235. 10.1038/73432View ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pac Symp Biocomput 1999, 29–40.Google Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7(3–4):601–620. 10.1089/106652700750050961View ArticlePubMedGoogle Scholar
- Matsuno H, Doi A, Nagasaki M, Miyano S: Hybrid Petri net representation of gene regulatory network. Pac Symp Biocomput 2000, 341–352.Google Scholar
- Akutsu T, Miyano S, Kuhara S: Inferring qualitative relations in genetic networks and metabolic pathways. Bioinformatics 2000, 16(8):727–734. 10.1093/bioinformatics/16.8.727View ArticlePubMedGoogle Scholar
- Szallasi Z, Liang S: Modeling the normal and neoplastic cell cycle with "realistic Boolean genetic networks": their application for understanding carcinogenesis and assessing therapeutic strategies. Pac Symp Biocomput 1998, 66–76.Google Scholar
- Butte AJ, Kohane IS: Unsupervised knowledge discovery in medical databases using relevance networks. Proc AMIA Symp 1999, 711–715.Google Scholar
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA 2000, 97(22):12182–12186. 10.1073/pnas.220392197PubMed CentralView ArticlePubMedGoogle Scholar
- Camacho D, de la Fuente A, Mendes P: The origin of correlations in metabolomics data. Metabolomics 2005, 1(1):53–63. 10.1007/s11306-005-1107-3View ArticleGoogle Scholar
- Martins AM, Camacho D, Shuman J, Sha W, Mendes P, Shulaev V: A Systems Biology Study of Two Distinct Growth Phases of Saccharomyces cerevisiae Cultures. Current Genomics 2004, 5: 649–663. 10.2174/1389202043348643View ArticleGoogle Scholar
- Kishino H, Waddell PJ: Correspondence analysis of genes and tissue types and finding genetic links from microarray data. Genome Inform Ser Workshop Genome Inform 2000, 11: 83–95.PubMedGoogle Scholar
- Toh H, Horimoto K: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 2002, 18: 287–297. 10.1093/bioinformatics/18.2.287View ArticlePubMedGoogle Scholar
- Waddell PJ, Kishino H: Cluster inference methods and graphical models evaluated on NCI60 microarray gene expression data. Genome Inform Ser Workshop Genome Inform 2000, 11: 129–140.PubMedGoogle Scholar
- de la Fuente A, Bing N, Hoeschele I, Mendes P: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 2004, 20(18):3565–3574. 10.1093/bioinformatics/bth445View ArticlePubMedGoogle Scholar
- Magwene PM, Kim J: Estimating genomic coexpression networks using first-order conditional independence. Genome Biol 2004, 5(12):R100. 10.1186/gb-2004-5-12-r100PubMed CentralView ArticlePubMedGoogle Scholar
- Wille A, Zimmermann P, Vranova E, Furholz A, Laule O, Bleuler S, Hennig L, Prelic A, von Rohr P, Thiele L, et al.: Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biol 2004, 5(11):R92. 10.1186/gb-2004-5-11-r92PubMed CentralView ArticlePubMedGoogle Scholar
- Dobra A, Hans C, Jones B, Nevins JR, West M: Sparse graphical models for exploring gene expression data. J Multiv Analysis 2004, 90: 196–212. 10.1016/j.jmva.2004.02.009View ArticleGoogle Scholar
- Schafer J, Strimmer K: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 2005, 21(6):754–764. 10.1093/bioinformatics/bti062View ArticlePubMedGoogle Scholar
- Remondini D, O'Connell B, Intrator N, Sedivy JM, Neretti N, Castellani GC, Cooper LN: Targeting c-Myc-activated genes with a correlation method: detection of global changes in large gene expression network dynamics. Proc Natl Acad Sci USA 2005, 102(19):6902–6906. 10.1073/pnas.0502081102PubMed CentralView ArticlePubMedGoogle Scholar
- Remondini D, Neretti N, Sedivy J, Franceschi C, Milanesi L, Tieri P, Castellani GC: Networks from gene expression time series: characterization of correlation patterns. International Journal of Bifurcation and Chaos 2007., 17:Google Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28(1):27–30. 10.1093/nar/28.1.27PubMed CentralView ArticlePubMedGoogle Scholar
- O'Connell BC, Cheung AF, Simkevich CP, Tam W, Ren X, Mateyak MK, Sedivy JM: A large scale genetic analysis of c-Myc-regulated gene expression patterns. J Biol Chem 2003, 278(14):12563–12573. 10.1074/jbc.M210462200View ArticlePubMedGoogle Scholar
- Gershman B, Hang L, Puig O, Tatar M, Garofalo RS: High resolution dynamics of the transcriptional response to nutrition in Drosophila: a key role for dFOXO. Physiol Genomics 2006.Google Scholar
- Taylor WA: Change-Point Analysis: a powerful new tool for detecting changes.[http://www.variation.com/cpa/tech/changepoint.html]
- Fisher RA, Yates F: Statistical tables for biological, agricultural, and medical research. Volume 3. 6th edition. New York: Hafner; 1963.Google Scholar
- Bliss CI: Statistics in biology. Volume 1. New York: McGraw-Hill; 1967:199–201.Google Scholar
- Brownlee KA: Statistical theory and methodology in science and engineering. 2nd edition. New York: John Wiley; 1965:148–149.Google Scholar
- Zar JH: Biostatistical analysis. 4th edition. Upper Saddle River, NJ: Prentice Hall; 1999:527–530.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.