Toxicogenomic analysis incorporating operon-transcriptional coupling and toxicant concentration-expression response: analysis of MX-treated Salmonella

Background Deficiencies in microarray technology cause unwanted variation in the hybridization signal, obscuring the true measurements of intracellular transcript levels. Here we describe a general method that can improve microarray analysis of toxicant-exposed cells that uses the intrinsic power of transcriptional coupling and toxicant concentration-expression response data. To illustrate this approach, we characterized changes in global gene expression induced in Salmonella typhimurium TA100 by 3-chloro-4-(dichloromethyl)-5-hydroxy-2(5H)-furanone (MX), the primary mutagen in chlorinated drinking water. We used the co-expression of genes within an operon and the monotonic increases or decreases in gene expression relative to increasing toxicant concentration to augment our identification of differentially expressed genes beyond Bayesian-t analysis. Results Operon analysis increased the number of altered genes by 95% from the list identified by a Bayesian t-test of control to the highest concentration of MX. Monotonic analysis added 46% more genes. A functional analysis of the resulting 448 differentially expressed genes yielded functional changes beyond what would be expected from only the mutagenic properties of MX. In addition to gene-expression changes in DNA-damage response, MX induced changes in expression of genes involved in membrane transport and porphyrin metabolism, among other biological processes. The disruption of porphyrin metabolism might be attributable to the structural similarity of MX, which is a chlorinated furanone, to ligands indigenous to the porphyrin metabolism pathway. Interestingly, our results indicate that the lexA regulon in Salmonella, which partially mediates the response to DNA damage, may contain only 60% of the genes present in this regulon in E. coli. In addition, nanH was found to be highly induced by MX and contains a putative lexA regulatory motif in its regulatory region, suggesting that it may be regulated by lexA. Conclusion Operon and monotonic analyses improved the determination of differentially expressed genes beyond that of Bayesian-t analysis, showing that MX alters cellular metabolism involving pathways other than DNA damage. Because co-expression of similarly functioning genes also occurs in eukaryotes, this method has general applicability for improving analysis of toxicogenomic data.


Background
Deficiencies in microarray technology can produce undesired variation in the hybridization signal, obscuring a clear measurement of intracellular transcript levels. In order to overcome this problem, we applied two analytical techniques in addition to the typically used t-test to discern differentially expressed genes. We used (a) an operon analysis that assumes if one gene in an operon is differentially expressed, then all genes in that operon are differentially expressed and (b) an analysis relating monotonic-expression response to increasing concentrations of MX.
Many bacterial genes are grouped into multi-gene transcriptional units or operons, resulting in coordinate transcriptional regulation. This has been exploited previously to evaluate different statistical tools available for microarray analyses [1,2]. Other investigators have estimated expression levels by borrowing information from genes within the same operon [3] or have estimated systematic error to increase confidence in significance calls [4]. These studies focused on improving significance calls for individual genes. Our analysis differs from these by identifying changes to functional pathways due to co-expression of genes within an operon. Although previous studies have used monotonic increases in toxicant concentrationgene expression response to identify genes affected by toxicant exposure [5], we have combined this analysis with the operon analysis to construct a list of differentially expressed genes. The resulting list of genes was then analyzed for functional and KEGG pathway representation.
To examine the usefulness of this approach, we have evaluated global gene expression in Salmonella typhimurium TA100 by 3-chloro-4-(dichloromethyl)-5-hydroxy-2(5H)furanone (MX), the primary mutagen in chlorinated drinking water. MX is a chlorinated furanone that accounts for 20-60% of the mutagenic activity of chlorinated drinking water and is a multi-site carcinogen in rats [6]. MX and its structural analogues have been given a high or high-moderate rating for priority concern regarding their potential carcinogenicity in drinking water [7]. Although MX is not a regulated drinking water disinfection by-product, it has a relatively high mean cancer potency estimate of 2.3 (mg/kg-d) -1 [6]. It is a potent direct-acting mutagen in Salmonella that induces primarily GC to TA base substitutions in bacteria and mammalian cells, presumably as a consequence of replication past unrepaired abasic sites resulting from unstable MX adducts on guanine [6].
Most studies of global gene expression in cells exposed to a toxicant in vitro have measured cytotoxicity as the relevant biological endpoint and were conducted at only one concentration of the toxicant [8]. In addition to cytotoxic-ity, global gene expression could be determined under conditions of mutagenesis, which generally require relatively high survival to permit viable mutants to be grown under selective conditions. Also, mutagenicity experiments are generally performed under a range of concentrations of the mutagen in order to generate a mutagenicity dose-response curve. Characterization of global gene expression in cells exposed in vitro under conditions of mutagenesis could help reveal the pleiotropic effects of mutagens.
To our knowledge, only a few studies have determined global gene expression in cells exposed in vitro under conditions of mutagenesis [9][10][11][12], and all of these were performed in mammalian cells in vitro. Along with changes in expression of DNA repair and metabolism genes, these studies identified other pathways affected by the mutagens tested. Other studies have shown that mutagens alter gene expression in a variety of pathways beyond those involved in mutagenesis per se [13][14][15]. However, to date, no such study has been performed in the Salmonella (Ames) mutagenicity assay, which is the assay used most widely in genetic toxicology. Although Porwollik et al. [16] did evaluate global gene expression in Salmonella LT2, which is the parent strain of the Ames strains, they did not report survival or perform mutagenesis after hydrogen peroxide treatment of the cells. They found that hydrogen peroxide induced expression of sulA, which is a gene involved in the inhibition of cell division. This somewhat parallels the finding described above in mouse lymphoma cells [9] in which hydrogen peroxide induced genes involved in apoptosis. Consistent with this was the finding in Salmonella that hydrogen peroxide depressed expression of genes involved in cell replication and protein synthesis [16]. Thus, in addition to showing the value of the bioinformatic analyses described here, our study also extends the limited literature on the changes in global gene expression in cells treated under conditions of mutagenesis.

Principal component analyses (PCA)
Microarray data were first analyzed as described previously [16]. CyberT, which is an unpaired Bayesian t-test [17], was used to identify those genes whose expression was significantly different (p < 0.05) between the control and the highest MX concentration. To determine similarity of expression for biological replicates, those genes were then analyzed by Cluster 3.0 [18] for principal components at all concentrations. In our case, samples have similar PCA #1 values. This is often found in genomic studies and is attributed to biological variation. PCA #2 segregated each concentration of MX, and biological replicates had similar PCA#2 values (Fig. 2). Thus, gene expression changed coordinately with increasing concentrations of MX.

Operon analysis
The RegulonDB [19] identifies operons and their gene constituents in E. coli. To map the Salmonella genes on our microarray to E. coli genes in the RegulonDB, one author (M. McClelland) blasted a sliding window of 100 bp from the arrayed sequences to the E. coli genome and recorded the highest or "best" percentage hit for each arrayed sequence. The median of all "best hits" was approximately 85%. Using the RegulonDB [19] and this ortholog map, all of the genes were assigned to their respective operons. All the genes in an operon were considered differentially expressed when at least one gene in the operon was determined to be differentially expressed at the highest concentration of MX by CyberT analysis (p < 0.05). There were 54 operons with at least one differentially expressed gene.
To validate this approach, we evaluated the distribution of p-values for all the genes in these 54 operons ( Table 1) and found that the p-values for these genes were shifted significantly to lower values (0-0.2) as determined by a hyper-geometric test (p-value of 1.7 × 10 -11 ). To further support this determination of gene co-expression within an operon, we analyzed, in detail the distribution of p-values and the uniformity of fold change within these operons (Tables 2 and 3). Fifty-one percent (87/169) of the genes with p-values <0.05 clustered in our identified 54 operons or 6% (54/866) of the total number of operons. Ninety-six percent of the 54 operons had genes whose expression moved in one direction.

Monotonic analysis
A final list of differentially expressed genes was then constructed by augmenting the previous list with monotonically changing genes as described in the Methods section. Monotonically changing genes are those whose expression steadily increased or decreased with each increase in concentration of MX. The analysis of monotonically changing transcript expression relative to MX concentration added 153 more genes to the list. The p-value distribution for these monotonically changing genes was also heavily weighted toward low p-values (Table 1). However, the operons containing monotonically changing genes did not show enrichment for genes with low p-values as did the operons identified by the CyberT analysis.
For illustrative purposes, the 10 genes whose expressions were increased the most at the highest concentration of MX compared to the control and that showed a monotonic increase in expression are shown in Fig. 3. Seven of these 10 genes are involved in the SOS response. Fig. 4 similarly depicts the 10 genes whose expressions were Principal component analysis of differentially expressed genes in MX-treated Salmonella TA100 Figure 2 Principal component analysis of differentially expressed genes in MX-treated Salmonella TA100. PCA#1 accounts for 31% of the variation in the data and PCA#2 accounts for 16%.
Average mutagenicity of MX in Salmonella TA100 from four experiments Figure 1 Average mutagenicity of MX in Salmonella TA100 from four experiments. The average % survival of log-phase cells was 95, 97, and 73% at 1.15-, 2.3-, and 4.6-μM MX, respectively. monotonically decreased the most; phsABC is the thiosulfate reductase gene.

LexA regulon analysis
The SOS response is the primary transcriptional response to DNA damage, and this response is mediated through the activation of the LexA regulon. The gene constituents of the LexA regulon have been studied intensely in E. coli [20,21]. Based on bioinformatic analysis, the gene constituents of the LexA regulon have been extended to S. typhimurium [22]. Table 4 contains the response of these LexA genes to the indicated concentrations of MX. The listed pvalues were computed from the Cyber-T, Bayesian t-test, and the genes were ordered based on these p-values. Among those genes exhibiting significant expression (p < 0.05) in response to MX treatment (Table 4), most genes coded for proteins involved in DNA repair and replication [21]. Our finding that only 19 of the 33 genes of the E. coli LexA regulon were significantly expressed (p < 0.05) in Salmonella due to MX treatment suggests that the remaining 14 genes are not part of the LexA regulon of Salmonella.

Functional analyses
Functional and pathway analyses were based on differentially expressed genes at all concentrations of MX determined from CyberT, operon analysis, and the analysis of monotonically changing gene expression. CyberT identified 169 altered genes, the operon analysis added 161 genes, and the monotonic analyses added 153 genes, for a total of 483 genes.
KEGG pathway analyses (Table 5) indicated that MX altered the expression of genes involved in cellular membrane and porphyrin metabolism, similar to the functions identified by the TIGR analyses (Table 6). Furthermore, the pathways and functions identified as altered (p < 0.05) after applying operon and monotonic analyses (483 genes) differed considerably from those identified by only CyberT analysis (169 genes) (Tables 5 and 6). In addition to the well-known effects of MX on DNA [6], our KEGG and TIGR analyses indicate that the pathway most altered by MX was porphyrin metabolism. In addition, MX also altered membrane function, specifically flagellar assembly and Type III secretion, two processes that have been shown recently to be co-regulated [23].

Discussion
In an attempt to overcome the inherent noise from twodye hybridization that obscures the statistical identification of differentially expressed genes, and to obtain a comprehensive list of genes whose expression changes were related to MX treatment, we applied two additional determinations of altered gene expression: an operon analysis and a monotonic analysis. Operon analysis increased the number of altered genes by a factor of 1.95. The Bayesian t-test p-value distribution for the genes that were added by operon analysis was heavily weighted towards low p-values as demonstrated by a hypergeomet- Table 1

Range of p-values
All genes in the genome* All genes in selected operons †  (169) brought the total genes considered for analysis by the operon method to 312, which was the denominator used to calculate the percentages in this column. A Monte Carlo simulation, repeated 1500 times using a sample size of 312, randomly selected from 4253 p-values, and sorted into the p-value bins shown in column 1, produced a p-value distribution the same to the nearest percentage point as that shown in the second column, i.e., a distribution essentially identical to that found for all genes in the genome. ‡ There were 161 genes residing in the selected operons that did not have p-values < 0.05. Eleven percent of these genes had missing data and, thus, did not appear in the table. Thus, 143 was the denominator used to calculate the percentages in this column. A Monte Carlo simulation repeated 1500 times using a sample size of 143 produced a p-value distribution the same to the nearest percentage point as that shown in the second column, i.e., a distribution essentially identical to that found for all genes in the genome. § The monotonic analysis alone identified 309 genes as being differentially expressed. Of these, 153 were uniquely identified by the monotonic analysis; the other 156 had been identified by either the CyberT and/or operon analyses. Thus, the monotonic analysis added an additional 153 genes for analysis, and this number was the denominator used to calculate the percentages shown in this column. A total of 448 differentially expressed genes were subjected to functional analysis. Two broad categories for this functional response were those genes whose expression was altered in response to DNA damage and those whose expression was altered by other mechanisms. Among the former were genes involved directly in DNA damage repair and prophage excision. For those genes altered by other mechanisms, two predominant functions were (a) molecular transport and (b) porphyrin, heme, and cobalamin metabolism. Below we discuss these two functions as well as the possible role of nanH in the LexA regulon and concept of steric coupling to explain the downregulation of porphyrin metabolism genes by MX.

nanH and the LexA regulon
Bacteria partially regulate their response to DNA damage via the LexA regulon, which contains 33 genes in E. coli [21]. In Salmonella, we found that the expression of 19 of these genes was induced by MX (p < 0.05), and all showed a monotonic increase in expression with increasing MX concentration. One of the strongest responses to MX was exhibited by nanH, which codes for a neuraminidase/sialidase associated with pathogenicity in Salmonella.
Although pathogenicity genes are not typically associated with the response to DNA damage, Benson et al. [22] identified a number of pathogenicity determinants induced as part of the SOS response. Because the nanH transcriptional response so closely matched the response of other genes in the LexA regulon, we investigated the potential for nanH membership in this regulon.
Modulation of the SOS response is facilitated by the differential affinity of LexA for the promotors of SOS response genes, which allows some genes to be fully induced at a lower level of DNA damage than others. The standard LexA binding site has a 16-bp palindromic repeat motif (CTGTN 8 ACAG) within 3-171 bp of the transcription site of the regulated gene [21]. For nanH there is a 21-bp palindromic sequence, CTGCTATATGT-TATATAGCAG, where the middle three bp (underlined) do not participate in the palindrome. This potential regulatory sequence ends 20 bases prior to the nanH transcription start site, suggesting, along with the transcriptional evidence, that nanH may be regulated by LexA.

Transport genes
Expression of a number of genes involved in transport processes was also affected by treatment with MX. Transporters serve numerous functions in bacteria, including uptake of nutrients, transport of proteins and peptides to the cell surface, transport of ions to regulate osmolarity, cell signaling, elimination of toxins, and secretion of virulence factors into host cells. Two well-studied prokaryotic transport systems are the Type III Secretion system and the ABC Transport system. In Salmonella, the Type III system seems to be involved mainly with host-cell interactions through secretion of virulence proteins, whereas the ABC Transport system supports a wide range of functions, including host-cell interactions and physiological maintenance of the bacteria cell itself. In this study, MX induced expression changes in genes involved in both the Type III secretion and the ABC Transport systems. In general, Type III Secretion system genes were down-regulated, whereas the ABC Transport system genes were either down-or upregulated depending on their transport function. Thus, the expression of genes encoding proteins involved in amino acid transport was down-regulated, whereas that of genes encoding proteins involved in ion transport was upregulated.
Not surprisingly, our results show that the potent mutagen MX activates DNA repair genes in Salmonella. In mam-  Table 3     thiH-thiG-STM4161-thiF-thiE-thiC 2535 dinF-STM4239-yjbJ malian systems MX-induced DNA damage seems to be efficiently repaired [24,25]. However, considering the other genes whose expression MX alters, it is unlikely that mutagenesis alone accounts for the carcinogenicity of MX. Our findings that MX alters the transcription of genes involved in other cellular processes, particularly membrane transport functions, and the recent report that MX is a potent inhibitor of gap-junction intercellular communication in rat cells in vitro [26,27] support the contention that MX disrupts cellular pathways that are important for differentiation, growth, and apoptosis. Indeed, MX reduced the expression of connexin43 protein in rat cells in vitro [26]. Thus, besides directly mutating key genes, MX also may alter expression of key genes, such as observed here for membrane transport and metabolism. These combined effects could contribute to the carcinogenic mechanism of MX.

Steric coupling: MX and pyrrole
In order to explain the down-regulation of porphyrin metabolism by MX, we considered the possibility that a toxicant might disrupt a pathway and alter gene expression if it is structurally related, but not identical, to ligands in that pathway. MX contains a furan, which is a 5-membered ring containing an oxygen; this is structurally similar to a pyrrole, which contains a nitrogen instead of an oxygen in the ring (Fig. 5). Our data show that MX downregulated genes involved in the metabolism of porphyrin, including its derivatives heme and cobalamin, both of which have pyrrole as a base structural unit. Heme is a component of red blood cells; however, in rats MX caused leukemia, which is a disease of the white blood cells [6]. MX also caused thyroid tumors in rats [6], and thyroxine contains two benzene rings connected by an oxygen atom, a structural element of MX.
These findings raise the interesting notion that some of the changes in gene expression resulting from toxicant treatment may be due to the structural similarity of the toxicant to effectors involved in normal cellular metabolism. The number of different small molecules within our own bodies may be just a few thousand [28], and these are processed by a select group of proteins to which they are linked sterically [29]. To our knowledge, consideration of the steric features of the toxicant has not been included in the interpretation of toxicogenomic data. However, our study suggests that such consideration may be beneficial Monotonically decreasing gene expression in MX-treated Sal-monella TA100 Figure 4 Monotonically decreasing gene expression in MX-treated Salmonella TA100. The x-axis represents the concentration of MX at 4 concentrations, 0-, 1.15-, 2.3-, and 4.6-μM MX, respectively. The log-scaled y-axis represents expression values that are the mean of 12 background-corrected intensities (4 biological replicates and 3 technical replicates for each biological replicate) normalized to the DNA reference.
Monotonically increasing gene expression in MX-treated Sal-monella TA100 Figure 3 Monotonically increasing gene expression in MX-treated Salmonella TA100. The x-axis represents the concentration of MX at 4 concentrations, 0-, 1.15-, 2.3-, and 4.6-μM MX, respectively. The log-scaled y-axis represents expression values that are the mean of 12 background-corrected intensities (4 biological replicates and 3 technical replicates for each biological replicate) normalized to the DNA reference. Double asterisks represent those genes that are known members of the LexA regulon.
to understanding why particular pathways are perturbed by a toxicant.
To distinguish between the possible modes of action of MX, we have initiated additional experiments with structural congeners of MX that have varying mutagenic potencies, different physical-chemical properties, and induce different mutation spectra-but are structurally similar to MX [30]. Such a study should indicate what components of the transcriptional response are due to which structural features of the mutagens and could segregate the DNA damage response from other transcriptional changes.

Conclusion
The co-expression of similarly functioning genes has been demonstrated recently to be common not only to prokaryotes but also to eukaryotes from yeast to humans [31]. Thus, because transcriptional coupling exists in all species, the methodology described here is applicable to essentially all toxicogenomic assays, regardless of species. This methodology potentially identifies transcriptional impacts to cellular functions that might otherwise be overlooked. Combining transcriptional coupling in operons with the monotonic analysis can produce an analysis of toxicogenomic data that is more robust than that produced by CyberT analysis alone.
Such an analysis indicates that the drinking-water mutagen MX alters a variety of functions, including transporter activities and porphyrin metabolism. This latter effect of MX may be due to steric interference by MX, which is a furanone that bears structural similiarity to pyrrole. Our results also suggest that the lexA regulon of Salmonella may contain only 60% of the genes present in this regulon in E. coli. MX strongly induced expression of nanH, which contains a putative lexA regulatory motif in its regulatory region, suggesting that nanH may be a previously unrecognized member of the lexA regulon.

Survival and mutagenesis
For survival, a sample was diluted 10 -6 , and 100 μl of this dilution were plated in triplicate onto VBME medium supplemented with excess biotin and histidine [33]. For mutagenesis, 50 μl of the treated cells were plated in duplicate onto VBME supplemented with excess biotin and trace histidine [33]. Plates were incubated for 3 days, and the colonies were counted by an automatic colony counter. Immediately after sampling for survival and mutagenesis, RNA was extracted and purified using the Institute of Food Research Microarray Facility protocol [34]. Mutagenesis and microarray results were the average of four independent experiments.

Microarray procedures
Array fabrication, DNA and RNA labeling, and hybridizations for all biological samples were performed as described [16]. Briefly, we used glass slides arrayed in triplicate with PCR products from the genome of S almonella LT2 plus the genes from the pKM101 plasmid, i.e., each gene was spotted three times per slide, for a total of 4366 genes. RNA was isolated from treated and control cells and converted to cDNA, which was competitively hybridized with genomic DNA from control TA100 cells. For each treatment condition, two biological samples were hybridized reciprocally such that the genomic DNA was labeled with Cy3 and the cDNA with Cy5, and then vice versa. This was repeated for two more biological samples, yielding four competitive hybridizations per treatment condition. The slides were scanned on a GenePix 4000B Axon Scanner (Axon Instruments, Inc., Union City, CA), and signal intensities were quantified using the GenePix TM Pro 4.1 software package (Axon Instruments, Inc., Union City, CA). Consequently, the intensity data obtained for expression analysis for each gene were derived from 12 separate measurements (4 independent biological samples × 3 technical replicates per sample). These intensity data have been archived in the Gene Expression Omnibus [35], accession number GSE7034.

Initial data analyses and concentration-related changes
Data were first analyzed as described previously [16]. Briefly, spot intensities were quantified, the local background intensities were subtracted, and the ratios of the contribution of each spot to total signal in each channel were calculated. Because each gene was spotted three times per slide, the average of the three replicate set of genes/slide constituted the measure of expression of a gene for each experiment.

Montonic analysis
Montonically changing genes were those whose expression fulfilled the following conditions: (a) exhibited a progressive increase or decrease in expression across all 3 concentrations of MX and (b) changed by ≥ 50% between the control and the highest concentration of MX.

Functional analysis
The functions of the differentially expressed genes identified above were ascribed to each gene using The Institute for Genomic Research (TIGR) Comprehensive Microbial Resource [36]. KEGG pathway annotations were obtained from Kyoto Encyclopedia for Genes and Genomes (KEGG) [37]. Pathway p-values for gene constituent overrepresentation were calculated by a hypergeometric test in Excel. aided in data analysis. DMD conceived of the study, participated in its design and coordination, and helped write the manuscript. All authors read and approved the final manuscript.