Use of genomic DNA control features and predicted operon structure in microarray data analysis: ArrayLeaRNA – a Bayesian approach
© Pin and Reuter; licensee BioMed Central Ltd. 2007
Received: 09 January 2007
Accepted: 19 November 2007
Published: 19 November 2007
Microarrays are widely used for the study of gene expression; however deciding on whether observed differences in expression are significant remains a challenge.
A computing tool (ArrayLeaRNA) has been developed for gene expression analysis. It implements a Bayesian approach which is based on the Gumbel distribution and uses printed genomic DNA control features for normalization and for estimation of the parameters of the Bayesian model and prior knowledge from predicted operon structure. The method is compared with two other approaches: the classical LOWESS normalization followed by a two fold cut-off criterion and the OpWise method (Price, et al. 2006. BMC Bioinformatics. 7, 19), a published Bayesian approach also using predicted operon structure. The three methods were compared on experimental datasets with prior knowledge of gene expression. With ArrayLeaRNA, data normalization is carried out according to the genomic features which reflect the results of equally transcribed genes; also the statistical significance of the difference in expression is based on the variability of the equally transcribed genes. The operon information helps the classification of genes with low confidence measurements.
ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at http://www.ifr.ac.uk/safety/ArrayLeaRNA/
We have introduced a novel Bayesian model and demonstrated that it is a robust method for analysing microarray expression profiles. ArrayLeaRNA showed a considerable improvement in data normalization, in the estimation of the experimental variability intrinsic to each hybridization and in the establishment of a clear boundary between non-changing and differentially expressed genes. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA and can be used for the analysis of datasets where differentially regulated genes predominate.
DNA microarrays are well established means of monitoring genome-wide patterns of gene expression . The first level of analysis requires determining whether observed differences in expression are significant. Data analysis techniques are actively being developed for this purpose including classical ANOVA methods [2, 3] and Bayesian approaches based on both Gaussian [4–9] and non-Gaussian [10, 11] models. Several problems arise with classical statistical inference due to the lack of replications and the large amount of genes in cases of multiple testing . None of the existing methods estimate the variability of the measurements from equally expressed genes. Our goal is to introduce a new Bayesian approach for transcriptome analysis, ArrayLeaRNA, based on the intrinsic variability of the equally expressed genes estimated from genomic DNA features printed on the microarray slide and on the predicted transcriptional organisation of operons. The model underlying ArrayLeaRNA assumes that the log ratios have a Gumbel distribution; this gives an asymmetric posterior distribution with a very steep tail, which is more discerning than that obtained from the Gaussian model.
To test ArrayLeaRNA, we compared it with two other analysis approaches: a constant two-fold cut-off value, i.e. two-fold changes between intensities, and OpWise . The reasons for the choice of these two methods are that the two-fold cut-off is a common practice for data analysis with some commercial analysis packages and OpWise is a Bayesian approach based on a Gaussian model, as used in other published approaches [4–9]. OpWise also incorporates predicted operon structure to inform on systematic microarray errors and to decide whether the expression is significantly different. The performance of these analysis approaches is illustrated using three experimental hybridization datasets with prior knowledge of expected transcription patterns.
We introduce the use of genomic DNA features printed as serial dilutions on the microarray slide and demonstrate that the measurements from these features are equivalent to the measurements of genes that are equally expressed under different experimental conditions. The normalization approach presented in this paper, based on these genomic control features, was compared with the so-called LOWESS normalization  based on the LOWESS non parametric regression  applied to describe the relationship between the difference (M) and the average (A) of the logarithm of the intensities. We use the genomic controls not only in data normalization, but also in data analysis for the estimation of the parameters of the Bayesian model. Also the Bayesian model includes the information on the transcription of the predicted operon to help the correct assignment of genes with low confidence measurements. ArrayLeaRNA is implemented in a new user-friendly software tool freely available .
Description of the analysis
Hybridization datasets analyzed in this study
(Datasets are available at )
Dataset I is the result of a microarray hybridization of cDNA obtained from C. jejuni strains 11168 and 81116 and labelled with Cy3 and Cy5, respectively. There are 6 replicates for each ORF and ca. 8–10 replicates for each genomic DNA control feature (100, 250, 500, 1000, 3000 and 5000 ng) from each strain.
Dataset II was obtained from a hybridization of cDNA made from two replicated cultures of S. pneumoniae TIGR 4. The samples of cDNA were differentially labelled with Cy3 and Cy5. The dataset contains ca. 4 replicated measurements from each ORF and ca. 15 replicated measurements of each genomic DNA feature (10, 50, 100, 500 and 1000 ng).
Dataset III was obtained from two independent hybridizations. Each hybridization was carried out by mixing genomic DNA with cDNA both obtained from E. coli. The mixture was hybridized to the microarray slide. Dataset III was constructed by combining the fluorescence intensities measured from the cDNA sample in each hybridization. The dataset consisted of one measurement of each ORF and ca. 15 replicated measurements of each genomic DNA feature (25, 75, 250, 750 and 2250 ng).
Dataset IV was generated from Dataset I. The log ratios of differentially expressed genes were made positive so that 178 genes were up-regulated in sample 1 and none in sample 2. Then, a set of 178 equally transcribed genes were randomly selected. This gives an asymmetric gene expression dataset with 356 genes from which 178 genes up-regulated in sample 1 and 178 genes equally transcribed, and therefore the mean ratio of the whole dataset is very different from the mean ratio of the equally transcribed genes.
Standardization of the hybridization datasets
Data standardization was based on the genomic DNA features printed on the microarray slides at different concentrations. We reasoned that equal amounts of each differentially labelled cDNA sample will hybridize to the genomic control features and can therefore be used as a reference to standardize the whole hybridization dataset. In order to do this, the relationship between the intensities measured on the genomic features is described as:
log2 I1 = α g + β g log2I2
where I1 and I2 are the fluorescence intensities observed from each sample differentially labelled and hybridized on one slide in the case of cDNA vs cDNA hybridizations. For cDNA vs genomic DNA hybridizations, I1 and I2 are the intensities corresponding to the cDNA samples measured in two different slides.
The standardization of the intensities was based on both parameters α g and β g specifically estimated for each microarray hybridization. Both the intensities measured from ORFs and genomic features were corrected as follows:
log2 I1' = β g log2 I1 + α g
log2 I2' = log2 I2
The value of 2 αg , the constant of proportionality between intensities, should in theory be 1, i.e. α g = 0, because equal amounts of differentially labelled cDNA should hybridize to the genomic features. A value of β g different from 1 expresses lack of linearity between the intensities. To avoid unnecessary non-linear data transformation, β g took a value different from 1 only if an F test rejected the null hypothesis β g = 1.
After standardization, the average of the logarithm of the ratios between the intensities measured from the genomic features was equal to zero. Accordingly, the logarithm of the ratios measured from ORFs with equal amounts of transcripts of both samples hybridized is also expected to be centred on zero.
Bayesian inference test
We denote the logarithm of the ratio between intensities as R = log2 (I1) - log2(I2), where I1 and I2 are the fluorescence intensities measured for two differentially labelled samples of cDNA. Samples 1 and 2 are either hybridized on one slide in the case of cDNA-cDNA hybridizations or on two slides in the case of cDNA-DNA hybridizations.
where A and b are the location and scale parameters, respectively. After an experiment, there is uncertainty concerning the location parameter, A, which has a prior distribution ξ A prior . The parameter b is invariant and known using the genomic control features. When b is known a conjugate family of distributions for the parameter A exists (see the Appendix).
where is described in the Appendix (Eq A3) and a0 is the centred value of the parameter A for those genes equally expressed in both samples; p0 and p1 quantify the information of the transcription of the operon and comply with p0 + p1 = 1; is a normalizing constant such that .
where as shown in the Appendix (Eqs A4, A5 and A6). Then, a gene can be declared differentially expressed if its posterior probability is smaller than some predefined cut-off. Throughout this paper, we will use 0.01 as cut-off value
The estimation of the parameters was carried out with the ratios measured from the genomic control features printed on the same slide as the dataset to be analyzed.
The parameter b of the Gumbel distribution was estimated by the method of moments as where sd g represents the standard deviation of the ratios measured from the genomic features. In nine experimental hybridizations the standard deviation of the ratios of genes equally transcribed was greater, from 1.6 to 2.4 fold, if measured in ORFs features than if measured in genomic features. For this reason, b was estimated assuming that the standard deviation of the ratios of the genes equally transcribed is twice as much as that of the ratios measured from genomic features.
After data standardization, the ratios of equally transcribed genes are expected to be centred on 0. Thus, the centred value, a0, of the hyperparameter A is estimated by the method of moments as
As indicated in the Appendix, the transformation has a gamma distribution with shape parameter α and scale parameter 1/β. Accordingly, the ratios measured from the genomic features, r g , are transformed into . From the transformed ratios, the maximum likelihood estimates for the parameters α and α of a gamma distribution are obtained as described by . For the gamma function we used the approximation derived by Lanczos .
To estimate the expected value and variance of A (see Appendix), the digamma function was approximated by using the formula 6.3.16 p.259 of .
Estimates from the 1 normalized datasets for ArrayLeaRNA approach
Parameters and Statistics
Average of 2R from genomic features
Standard deviation of R from genomic features
Parameter b of the Gumbel distribution for R
Parameter α of the prior distribution for A
Parameter β of the prior distribution for A
Prior expected value for A
Prior variance for A
Parameter α of the 3posterior distribution for A
Parameter β of the posterior distribution for A
Posterior expected value for A
Posterior variance for A
where N0 and N1 depends on the classification of the gene in study. When the gene in study is equally transcribed, N0 = Card (genes equally transcribed) and N1 = |Card (genes up-regulated) – Card (genes down-regulated)| where Card (.) denotes the cardinal number of the set of genes. If some genes of the operon are up-regulated and some of them are down-regulated in the same sample they cancel each other and do not decrease the probability of the equal transcription in both samples; hence, N1 is calculated as the absolute value of the difference between the cardinal numbers of the up and down regulated genes. If the gene in study is classified as up-regulated, N0 = Card (genes equally transcribed) + Card (genes down-regulated) and N1 = Card (genes up-regulated). If the gene in study is down-regulated, N0 = Card (genes equally transcribed) + Card (genes up-regulated) and N1 = Card (genes down-regulated). T is the number of genes of the operon involved in the estimation, i.e. T = N1 + N2 + 1. Therefore, p1 and p0 are based only on those genes successfully analysed within an operon. The operons composition is that published at [18–20]. The analysis of the dataset is iterated several times. Five iterations have been proven sufficient. In each iteration genes are reclassified and the values of p0 and p1 are updated.
We demonstrate a Bayesian method of microarray data analysis based on using internal positive controls for normalisation and as a basis of the Bayesian model, plus using predicted operon structure to improve assignments of differentially expressed genes. The method has been tested on microarray data obtained from three different bacterial organisms.
The OpWise method was applied by using the computing tool provided by the authors. The cut-off values for the posterior probability of equal expression were 0.99 and 0.01 for OpWise. This is equivalent to the cut off chosen for ArrayLeaRNA that is 0.01 of posterior probability associated to the absolute value of the ratio between intensities.
In dataset I, the 138 ORFs unique to strain 81116 could not be analysed because the operon composition is not available for this strain. Regarding the features unique to the strain NCTC 11168, 26 features were misclassified as equally expressed. Some of the misclassified features showed large ratios (Fig 2). The misclassification of features with large rations was associated to features with few replicate measurements as a result of discarding bad quality measurements. For dataset I, OpWise was too conservative and failed to classify clearly differentially expressed genes (Fig 1). Opwise is a Bayesian approach based on a Gaussian model that can be expressed in terms of the t distribution. The posterior probability can be formulated without taking into account the operon information. When the model underlying OpWise was applied without operon information, only 33 genes, with ratios greater than 4.3, were classified as differentially expressed and the number of features unique to the strain NCTC 11168 that were misclassified as equally expressed increased to 41. Therefore, for this dataset, the operon information decreased the number of misclassified genes. In dataset II, OpWise misclassified 114 genes that showed equal expression in both samples. In this case, the model without operon information performed better and did not misclassify any gene. Dataset III could not be analysed with OpWise because there was only one replicate of each ORF.
When applying the two-fold cut-off value in dataset I, 440 genes showed ratios greater than the arbitrarily chosen two-fold value; this is significantly higher than the 315 and the 86 genes differentially expressed according to ArrayLeaRNA and OpWise methods, respectively. Fifty five features unique to 81116 and 17 features unique to 11168 were incorrectly classified but as mentioned above and shown in Fig 2, these features exhibited typical values of identically expressed genes. The two-fold approach did not misclassify any gene on dataset II but misclassified 615 genes in dataset III. The variability of dataset II was relatively small because the mixture of differentially labelled cDNA was hybridized on a single slide and each ORF had ca. 6 replicates. This contrasts the set-up of the experiment from which dataset III was derived, where the labelled cDNA samples were mixed with labelled genomic DNA and hybridized on different slides and the final dataset contained only 1 ratio per ORF. Therefore variability of dataset III was greater and 615 genes of this dataset showed ratios larger than 2. A constant cut-off value arbitrarily chosen is not an advisable analysis technique in any case.
In the non-transformed and normalized with genomic controls datasets, ArrayLeaRNA was more accurate than the other two approaches. In the non-transformed dataset, ArrayLeaRNA misclassified 19 genes and only 1 in the dataset normalized according to the genomic controls. Some genes equally transcribed showed ratios slightly greater than 2, therefore, the two-fold approach misclassified 39 genes in the non-transformed dataset and 15 after genomic normalization. OpWise misclassified 105 and 80 genes in the non-transformed and normalized dataset, respectively. OpWise misclassified a large number of genes up-regulated in sample 1 as equally transcribed. The two-fold cut-off approach with narrower boundaries mostly misclassified genes equally transcribed as up-regulated (Fig 3).
We present a Bayesian microarray analysis tool which takes advantage of genomic DNA control features and predicted operon structure to provide an accurate and informed analysis of transcriptomic data.
The estimation of the model parameters relies on the ratios measured from genomic DNA features of the appropriate strain(s) printed on the microarray slide. This is the scenario for which ArrayLeaRNA has been designed. If genomic DNA features are not printed on the slide, the computing tool implementing ArrayLeaRNA allows the user to input the ratio at the boundary between genes equally and differentially expressed. The parameters of the model are then estimated to obtain approximately the desired boundary. An alternative solution could be to identify genes shown to have consistently non-changing expression patterns instead of printed genomic DNA
The underlying model in ArrayLeaRNA is different from the Gaussian models developed by [4–8]. The approaches of [4, 5, 7] model the expression measurements by normal distributions parameterized by means and variances with conjugate prior distributions and assuming dependence between means and variance. As an alternative to full Bayesian treatment,  suggested the use of an intermediate solution using a regularized t-test, in which the variance is replaced by the posterior mean of the variance of the model.  modified the Gaussian model by introducing a Bernoulli random variable, indicating whether the gene is differentially expressed. The parameter, p, of the Bernoulli distribution describes the proportion of differentially expressed genes and it is estimated by the iteration of the analysis. The effect of this parameter in the posterior probability is similar to that of p0, or p1, introduced in ArrayLeaRNA to quantify the operon information. The model underlying OpWise  is also based in the above Gaussian model but introducing a new component of the error called systematic error or bias estimated from the measurements of the genes belonging to the same operon. The posterior probabilities are estimated for single genes without operon information and with operon information as a mixture of the posterior probabilities of all the possible operon composition combinations of the gene in study with the pair of genes adjacent to it.
The scale parameter of the Gumbel distribution is estimated from the standard deviation of the ratios measured from the genomic control features. This is an estimation of the variability in equally expressed genes intrinsic to the experimental hybridization and overcomes the uncertainty of estimations with low number of replicates and the fact that replicated measurements, in the same or replicated slides and from the same or replicated experiments, may not reflect the variability of the set of features with equal amounts of hybridized transcripts from each sample. Thus the genomic DNA features printed on the microarray slide offer a significant advantage not only for data normalization but also for determining whether the differences in expression are significant based on the robust estimation of the variability of equally transcribed genes. Moreover, printed genomic DNA offers distinct advantages over other types of features designed to be non-crosshybridizing controls (i.e. yeast ORFs in bacterial microarrays) in combination with exogenously added cDNA. Such controls will only account for part of the experimental variability, compared to printed genomic DNA, which reflects the variability arising from the experimental hybridization of the labelled cDNA prepared from the RNA under study.
We have introduced a Bayesian model based on the Gumbel distribution, in combination with printed genomic DNA controls and predicted operon information, and demonstrated that it is a robust method for analysing microarray expression profiles. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA. The method can equally be applied to datasets where differentially regulated genes predominate. The method we introduce performed better than two existing methods (OpWise and the two-fold cut-off) when analysing the experimental datasets presented in this work.
Campylobacter jejuni strains NCTC 11168 and 81116 (NCTC 11828) were grown at 37°C under microaerophilic conditions (10% CO2, 5% O2, 85% N2; relative humidity 80%) on Skirrow agar plates or in Brucella broth using a MACS-MG-1000 controlled atmosphere workstation (DW Scientific, UK).
Streptococcus pneumoniae JNR7/87 (also called TIGR4) was grown at 37°C in tryptone soy broth or on tryptone soy agar plates supplemented with 5% horse blood.
Escherichia coli K-12 strain MG1655 was grown at 25°C in Luria-Bertani broth (10 g/l Tryptone, 5 g/l yeast extract and 10 g/l NaCl; pH 7.2) with 0.2% glucose.
Construction of DNA microarrays
Internal DNA fragments corresponding to unique segments of each open reading frame (ORFs) annotated in the genome of the strain were PCR-amplified using gene-specific primers. DNA probes and various concentrations of chromosomal DNA were spotted on GAPS II slides (Corning) using a in-house Stanford designed arrayer and the recommended software and protocols .
The following three DNA microarrays were used: 1. A microarray representing six replicates of all ORFs from C. jejuni NCTC 11168 and 138 ORFs unique to strain 81116. From the ORFs of C. jejuni NCTC 11168, 134 are missing genes in the strain 81116; 2. A microarray representing four replicates of all open reading frames from the S. pneumoniae TIGR4 ; 3. A microarray representing one replicate of all open reading frames from E. coli K-12 MG1655 .
All the arrays contained ca.100 features of serially diluted chromosomal DNA (ca. 15–20 replicates of each dilution) isolated from the reference strain(s) used to construct the array. These features are referred as genomic controls and they are used in data standardization and in data analysis.
RNA and DNA purification and microarray hybridizations
RNA was purified from S. pneumoniae as described in ; RNA was purified from E. coli as described in ; RNA was purified from C. jejuni as described in . RNA quality and quantity was checked using the Agilent 2100 Bioanalyzer . DNA was isolated from bacteria using the QIAgen DNeasy™ method (QIAgen)
cDNA was prepared from RNA using Stratascript RT (Stratagene) and labelled with Cy3-dCTP and Cy5-dCTP (Amersham). Labelled cDNA and DNA were purified using the QIAquick PCR purification kit (QIAgen). Differentially labelled cDNA or cDNA and DNA were mixed and hybridized on a microarray slide at 62°C overnight. Following hybridization, microarray slides were washed and scanned using an Axon GenePix 4000A microarray laser scanner (Axon Instruments, CA) and the feature data generated using GenePix Pro software (Molecular Devices). The fluorescence intensity was defined as the median of the foreground intensities in each feature with the median background subtracted.
Availability and requirements
ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at http://www.ifr.ac.uk/safety/ArrayLeaRNA/. The user requires Excel 2000 or later versions installed in their computer.
Conjugate family of distributions for the location parameter of the Gumbel distribution
where A and b are the location and scale parameters, respectively.
It can be shown that is a sufficient statistic for the Gumbel distribution
where is the gamma function. Notice that is distributed according to a gamma distribution with shape parameter α and scale parameter 1/β. It can be demonstrated that a priori, the expected value of A is E(A) = b(ψ (α)-In β) where is the digamma function and its variance is .
which belongs to the same family as the prior distribution with parameters α' = α + n and β' = β + k. A posteriori, the expected value of A is E(A) = b(ψ (α + n) - In(β + k) and its variance is .
We gratefully acknowledge the support from the BBSRC core strategic grants 41213A and 42254A. We thank the laboratory of Molecular Microbiology at IFR, especially Mathew Rolfe and Bruce Pearson for their kind help with the microarray work. We thank Gary Barker for helpful discussions on Bayesian inference.
- DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278 (5338): 680-686. 10.1126/science.278.5338.680.View ArticlePubMedGoogle Scholar
- Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS: Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001, 8 (6): 625-637. 10.1089/106652701753307520.View ArticlePubMedGoogle Scholar
- Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol. 2000, 7 (6): 819-837. 10.1089/10665270050514954.View ArticlePubMedGoogle Scholar
- Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics. 2001, 17 (6): 509-519. 10.1093/bioinformatics/17.6.509.View ArticlePubMedGoogle Scholar
- Lonnstedt I, Speed TP: Replicated microarray data. Statistica Sinica. 2002, 31-46.Google Scholar
- Gottardo R, Pannucci JA, Kuske CR, Brettin T: Statistical analysis of microarray data: a Bayesian approach. Biostatistics. 2003, 4 (4): 597-620. 10.1093/biostatistics/4.4.597.View ArticlePubMedGoogle Scholar
- Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3 (1): Article3-PubMedGoogle Scholar
- Price MN, Arkin AP, Alm EJ: OpWise: operons aid the identification of differentially expressed genes in bacterial microarray experiments. BMC Bioinformatics. 2006, 7: 19-10.1186/1471-2105-7-19.PubMed CentralView ArticlePubMedGoogle Scholar
- Fox RJ, Dimmic MW: A two-sample Bayesian t-test for microarray data. BMC Bioinformatics. 2006, 7: 126-10.1186/1471-2105-7-126.PubMed CentralView ArticlePubMedGoogle Scholar
- Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW: On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol. 2001, 8 (1): 37-52. 10.1089/106652701300099074.View ArticlePubMedGoogle Scholar
- Townsend JP, Hartl DL: Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biol. 2002, 3 (12): RESEARCH0071-10.1186/gb-2002-3-12-research0071.PubMed CentralView ArticlePubMedGoogle Scholar
- Luu P, Yang YH, Dudoit S, Speed TP: Normalization od cDNA microarray data. SPIE BIOS 2001. 2001Google Scholar
- Cleveland W: Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 1979, 74: 829-836. 10.2307/2286407.View ArticleGoogle Scholar
- ArrayLeaRNA [www.ifr.ac.uk/safety/ArrayLeaRNA].Google Scholar
- Mielke PW: Simple iterative procedures for two-parameter gamma distribution maximum likelihood estimates. Journal of Applied Meteorology. 1976, 15 (15): 181-182. 10.1175/1520-0450(1976)015<0181:SIPFTP>2.0.CO;2.View ArticleGoogle Scholar
- Lanczos C: A precision approximation of the gamma function. SIAM Journal on Numerical Analysis series B, volume 1. 1964Google Scholar
- Abramowitz M, Stegun IA: Handbook of Mathematical functions. 1972, New York , Dover PublicationsGoogle Scholar
- Microbesonline [www.microbesonline.org].Google Scholar
- Alm EJ, Huang KH, Price MN, Koche RP, Keller K, Dubchak IL, Arkin AP: The MicrobesOnline Web site for comparative genomics. Genome Res. 2005, 15 (7): 1015-1022. 10.1101/gr.3844805.PubMed CentralView ArticlePubMedGoogle Scholar
- Price MN, Huang KH, Alm EJ, Arkin AP: A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res. 2005, 33 (3): 880-892. 10.1093/nar/gki232.PubMed CentralView ArticlePubMedGoogle Scholar
- Pin C, Reuter M, Pearson B, Friis L, Overweg K, Baranyi J, Wells J: Comparison of different approaches for comparative genetic analysis using microarray hybridization. Appl Microbiol Biotechnol. 2006, 72 (4): 852-859. 10.1007/s00253-006-0536-x.View ArticlePubMedGoogle Scholar
- Townsend JP: Resolution of large and small differences in gene expression using models for the Bayesian analysis of gene expression levels and spotted DNA microarrays. BMC Bioinformatics. 2004, 5: 54-10.1186/1471-2105-5-54.PubMed CentralView ArticlePubMedGoogle Scholar
- Stanford University [http://cmgm.stanford.edu/pbrown/mguide/index.htm].Google Scholar
- Dagkessamanskaia A, Moscoso M, Henard V, Guiral S, Overweg K, Reuter M, Martin B, Wells JM, Claverys JP: Interconnection of competence, stress and CiaR regulons in Streptococcus pneumoniae: competence triggers stationary phase autolysis of ciaR mutant cells. Mol Microbiol. 2004, 51: 1071-1086. 10.1111/j.1365-2958.2003.03892.x.View ArticlePubMedGoogle Scholar
- Anjum MF, Lucchini S, Thompson A, Hinton JC, Woodward MJ: Comparative genomic indexing reveals the phylogenomics of Escherichia coli pathogens. Infect Immun. 2003, 71 (8): 4674-4683. 10.1128/IAI.71.8.4674-4683.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Mohedano ML, Overweg K, de la Fuente A, Reuter M, Altabe S, Mulholland F, de Mendoza D, Lopez P, Wells JM: Evidence that the essential response regulator YycF in Streptococcus pneumoniae modulates expression of fatty acid biosynthesis genes and alters membrane composition. J Bacteriol. 2005, 187 (7): 2357-2367. 10.1128/JB.187.7.2357-2367.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Eriksson S, Lucchini S, Thompson A, Rhen M, Hinton JC: Unravelling the biology of macrophage infection by gene expression profiling of intracellular Salmonella enterica. Mol Microbiol. 2003, 47 (1): 103-118. 10.1046/j.1365-2958.2003.03313.x.View ArticlePubMedGoogle Scholar
- Holmes K, Mulholland F, Pearson BM, Pin C, McNicholl-Kennedy J, Ketley JM, Wells JM: Campylobacter jejuni gene expression in response to iron limitation and the role of Fur. Microbiology. 2005, 151 (Pt 1): 243-257. 10.1099/mic.0.27412-0.View ArticlePubMedGoogle Scholar
- Agilent Technologies. [http://www.agilent.com/chem/labonachip]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.