Motif-guided sparse decomposition of gene expression data for regulatory module identification
© Gong et al; licensee BioMed Central Ltd. 2011
Received: 2 November 2010
Accepted: 22 March 2011
Published: 22 March 2011
Genes work coordinately as gene modules or gene networks. Various computational approaches have been proposed to find gene modules based on gene expression data; for example, gene clustering is a popular method for grouping genes with similar gene expression patterns. However, traditional gene clustering often yields unsatisfactory results for regulatory module identification because the resulting gene clusters are co-expressed but not necessarily co-regulated.
We propose a novel approach, motif-guided sparse decomposition (mSD), to identify gene regulatory modules by integrating gene expression data and DNA sequence motif information. The mSD approach is implemented as a two-step algorithm comprising estimates of (1) transcription factor activity and (2) the strength of the predicted gene regulation event(s). Specifically, a motif-guided clustering method is first developed to estimate the transcription factor activity of a gene module; sparse component analysis is then applied to estimate the regulation strength, and so predict the target genes of the transcription factors. The mSD approach was first tested for its improved performance in finding regulatory modules using simulated and real yeast data, revealing functionally distinct gene modules enriched with biologically validated transcription factors. We then demonstrated the efficacy of the mSD approach on breast cancer cell line data and uncovered several important gene regulatory modules related to endocrine therapy of breast cancer.
We have developed a new integrated strategy, namely motif-guided sparse decomposition (mSD) of gene expression data, for regulatory module identification. The mSD method features a novel motif-guided clustering method for transcription factor activity estimation by finding a balance between co-regulation and co-expression. The mSD method further utilizes a sparse decomposition method for regulation strength estimation. The experimental results show that such a motif-guided strategy can provide context-specific regulatory modules in both yeast and breast cancer studies.
Transcriptional gene regulation is a complex process that uses a network of interactions to . A central problem remains the accurate identification of transcriptional modules or gene sub-networks involved in the regulation of critical biological processes . For cancer research, these sub-networks can help provide a signature of the disease that is potentially useful for diagnosis, or suggests novel targets for drug intervention. The biomedical research literature and several specific databases contain sequence information, gene expression profiling data, and small scale biological experiments that allow investigators to reconstruct gene regulatory networks and explore the direct effects of transcription factors on gene expression.
Recently, the bioinformatics community has explored various computational approaches for transcriptional module identification [3–7]. These approaches can be classified into two major categories. The first category uses clustering methods to explore the similarity in gene expression patterns to form gene modules. The second approach uses projection methods to infer latent (hidden) components with which to group genes into modules. A growing literature documents attempts to reconstruct gene networks by applying clustering methods [8, 9] and their more sophisticated variants such as statistical regression  and Bayesian networks . While this line of work is important to help formulate hypotheses, there are many limitations on using clustering methods for regulatory module inference. One common challenge is detecting the interactions between transcription factors and their target genes based on gene expression data alone. For regulatory module identification, it is critical to distinguish 'co-regulation' from 'co-expression', and to understand the relationship between co-regulation and co-expression. Generally, genes with highly homologous regulatory sequences (co-regulation) should have a similar expression pattern (co-expression). However, the reverse is likely not true; co-expressed genes must not necessarily exhibit common regulatory sequences . Traditional clustering analysis often returns clusters lacking shared regulatory sequences, thus making the biological relevance of these clusters relatively low for the identification of regulatory mechanisms.
A group of projection methods from the second category, including principle component analysis (PCA), independent component analysis (ICA), and non-negative matrix factorization (NMF) [13–15], have also been extensively applied for transcriptional module identification. These methods decompose gene expression data into components that are constrained to be mutually uncorrelated or independent, and then cluster genes based on their loading in the components. Since these methods do not cluster genes based on their expression similarity, they are better equipped to find co-regulated gene modules. One major difficulty using such projection approaches is that the components usually represent the joint effects of many underlying transcription factors. Thus, the components do not correspond to individual known transcription factors (TFs), making the biological interpretation of the components very difficult.
To overcome the above-mentioned shortcomings, several integrative methods have been proposed that integrate TF-gene interaction data with gene expression data. For instance, network component analysis (NCA) has been recently developed to successfully estimate the TF activities of regulatory networks using both ChIP-on-chip and gene expression data . Note that NCA heavily relies on ChIP-on-chip data for network connectivity information with which to define regulatory modules. Thus, the NCA scheme is not readily applicable to many biological studies where adequate network connectivity information is not available (due to lack of adequate ChIP-on-chip data). To deal with this difficulty, Sabatti and James  were among the first to use motif information as the initial network topology, subsequently adopting a Bayesian algorithm to reconstruct regulatory modules. While theoretically elegant, this approach needs to estimate the posterior probability, a joint distribution of network topology and transcription factor activity. Even using the Gibbs sampling technique, it is a formidable task to estimate the joint distribution when the number of samples is limited.
We now propose a novel approach, namely motif-guided sparse decomposition (mSD), to identify co-regulated transcriptional modules by integrating motif information and gene expression data. The mSD method is a Bayesian-principled method without the need to estimate the joint distribution. Instead, a two-step approach is used to first estimate transcription factor activity and then regulation strength on the target genes. A motif-guided clustering method is developed to help estimate transcription factor activity by taking into account both co-expression and co-regulation. A sparse decomposition step is then applied to estimate the regulation strength of predicted regulatory networks. To evaluate the performance of the proposed approach, we applied the mSD method to simulated and real yeast cell cycle data, showing an improved performance in identifying three kinds of coherent modules associated with known cell cycle transcription factors. We then applied our approach to a molecular profiling study of estrogen dependence in breast cancer cells, with the goal of recovering condition-specific transcriptional modules related to estrogen action. The results demonstrated that our approach effectively finds important condition-specific regulatory modules that are functionally relevant to estrogen signaling pathways.
Latent variable model
where x pg is defined as the logarithm of the expression ratio of gene g between data sample p and control sample, a pt the activity level of TF t in sample p and s tg the regulation strength of TF t onto gene g. The log-ratios of gene expression X ∈ Rm×N,(N >> 1) are expressed as a linear combination of log-ratios of TF activity (A ∈ Rm×n) weighted by their regulation strength (S ∈ R n×N ). Note that m is the number of samples, N is the number of genes, and n is the number of TFs.
In general, the number of TFs is much smaller than the number of transcribed genes (n << N) and most genes are regulated only by a small number of TFs. Hence, the matrix S that describes the regulation strength between the TFs and their regulated genes is sparse. Further, the number of TFs (n) is usually greater than the number of samples (m), i.e., n > m , such that Equation (1) represents an underdetermined linear system (ULS). To obtain a sparse solution to this ULS, we develop a two-stage approach to estimate transcription factor activity (A) and regulation strength (S) sequentially.
Transcription factor activity estimation
A generic approach for transcription factor activity estimation is to use a clustering method to find representative genes whose expression profiles (columns of X) can be utilized to estimate A. For a theoretical justification of the identifiability of A, please refer to Section 1.1 in the supplementary material. Many clustering techniques have been proposed to cluster gene expression data, such as k-means clustering  and self-organizing maps , which are designed to find gene expression patterns by grouping genes with similar expression profiles. Very recently, an affinity propagation (AP) algorithm has been proposed for data clustering that shows an improved performance . Based on an ad hoc pair-wise similarity function between data points, AP seeks to identify each cluster by one of its elements, the so-called exemplar. AP takes as input a collection of real-valued similarities between data points, where the similarity s(i, k) indicates how well data point k is suited to be the exemplar for data point i. The goal is to maximize the similarity s(i, k) or equivalently, to minimize the Euclidean distance , d(i, k) = ||x i - x k ||2, where x i and x k are two column vectors of gene i and gene k, respectively, in X.
However, direct application of the AP clustering technique to gene expression data will only give rise to co-expressed gene clusters. To identify gene regulatory modules, we need a clustering technique to integrate motif information and gene expression data, aiming to find co-regulated gene clusters with co-expressed patterns. We here propose a motif-guided clustering method to find a group of genes that not only is of similar expression pattern but also shares a common set of binding motifs as much as possible.
Motif-guided gene clustering with a joint similarity measure
where λ is a trade-off parameter that controls the contribution from two different information sources: motif information and gene expression data. When incorporated into an AP clustering method, the first term in Eq. (3) is used to find a group of genes with similar expression pattern, while the second term estimates those genes that should share a common set of TFs.
Ideally, the clustering result will generate a better representation of the transcription factor activity that underlies a co-regulated group of genes. However, both motif information and gene expression data are noisy because the binding motif is a very short DNA sequence  and there is often a low signal-to-noise ratio in gene expression measurements . The impact of the noises can be clearly observed in two extreme cases: (1) the gene cluster resulting from (noisy) motif information alone will show a noisy expression pattern; (2) the cluster resulting from gene expression data alone will often gain little support in terms of being regulated by a shared set of motifs. Therefore, it is important to understand the contribution of each data source and assign its proper weight. The trade-off parameter λ in Eq. (3) is used to alleviate the effects of noise. In the following section, we will design an entropy-based measure, in conjunction with a non-uniformity measure, to help find the optimal value for the trade-off parameter λ.
Determination of the trade-off parameter
Conceptually, when motifs are randomly distributed (with an assumed uniform distribution) among the clusters, the mean entropy reaches its maximum; conversely, when motifs are uniquely distributed for each cluster (cluster-specific), the mean entropy reaches its minimum.
Where is the variance of gene expression pattern for cluster j(j = 1,..., j), the maximum variance for all clusters, and w j is the weight of cluster j defined as the proportion of genes to the entire gene population.
Theoretically, the cost function C(λ) is a U-shaped function; when λ reaches its optimal value, the cost function C(λ) reaches its minimum. In other words, by minimizing C(λ) we can find the optimal value of λ to take advantage of both the motif information and gene expression data, while alleviating the noise impact on gene clustering.
We can extend this cost function to a weighted form by using a trade-off parameter μ: C(μ, λ) = μH(λ)+(1-μ)NonU(λ), where 0 ≤ μ ≤ 1. By controlling μ we can obtain different sets of gene clusters with different degrees of motif occupancy and similarity in expression pattern. To determine an appropriate parameter λ, we use a simplified version of the cost function C(μ, λ): C(λ) = H(λ)+NonU(λ) (which is equivalent to the case of μ = 0.5), to help find an appropriate balance between motif occupancy and expression pattern for regualtory module identification. A simplified assumption here is that it is equally important to consider both co-regulation (measured by the entropy for motif occupancy) and co-expression (measured by non-uniformity of expression pattern) for regulatory module identification. Nevertheless, we use C(μ, λ) to examine the robustness of parameter λ for the microarray data analyzed in this paper, ensuring that the selected parameter λ is not sensitive to a particular choice of parameter μ.
Regulation strength estimation
Initialize source S with a matrix W, which comes from either Chip-on-chip data or TF-gene binding strength matrix searched from TRANSFAC .
Iterate for every column of S (which is corresponding to each gene)
If sparseness constraints on the current column of S (denote s g ) apply, project s g to be desired sparse by making its L 1 norm larger than a predefined sparseness threshold, while having the L 2 norm unchanged. (For the definition of sparseness, please refer to .)
In the projected space, detect approximately which TFs are "active"; the term "active" is used to refer to the TFs with "considerably nonzero" strengths.
Notice that a major step in the above algorithm (Step (2a)) requires a projection operator that enforces sparseness by explicitly setting both L1 and L2 norms. This operator, fortunately, has been found by Hoyer  to incorporate sparseness constraint in the context of non-negative matrix factorization (NMF). We use this projection operator in the SCA approach to find the closest (in the Euclidean sense) sparse vector s g with a desired L1 and L2 norm. The cost function in Step (2c) is designed to minimize the regulation strength of "inactive" TFs, while letting the regulation strength of "active" TFs to change freely in order to fulfil the imposed constraint x g = As g . This can also be viewed as a form of projection into an active subspace , resulting in an elegant mathematical approach to obtain the solution to a Karush-Kuhn-Tucker (KKT) system (for more details, please see Section S2 in the supplementary material).
Results and Discussion
Synthetic and real yeast data
To validate the proposed integrative approach, we applied mSD to synthetic and real yeast cell cycle data for regulatory module identification, and then compared its performance with those of other approaches including FastNCA  and sparse decomposition . For the synthetic data set, we used a network generator, SynTReN, to produce a benchmark gene expression data set based on a synthetic S. cerevisiae transcriptional regulatory network. SynTReN generated 15 samples of expression data with a set of 345 genes in different conditions. The genome-wide location data (ChIP-on-chip data)  were then used to provide the binding information and these data were integrated with the gene expression data to extract transcription factor activity and estimate regulation strength.
To evaluate the performance of the mSD approach, we compared its performance with those of other similar methods, including FastNCA  and sparse decomposition (SD) . Performances were measured by Receiver Operating Characteristic (ROC) analysis and the area under the ROC curve (AUC). The ROC curve measures the sensitivity and specificity of a method by calculating true-positive (TP) rate against false-positive (FP) rate. To generate a ROC curve, we first ranked the target genes for each TF according to their connection strengths in S, and then we calculated the true and false positive rates by running down the ranked gene list one at a time. To investigate the impact of noise on the respective performances of mSD and FastNCA, the binding information was obtained from the ChIP-on-chip data with different cut-off p-values (0.01, 0.05 and 0.1); a large cut-off p-value results in a high false positive rate in binding information (a high noise level).
AUCs of mSD, SD and FastNCA methods, respectively, under different cut-off p-values
cut-off p-value = 0.1
cut-off p-value = 0.05
cut-off p-value = 0.01
To further evaluate our algorithm, we applied the mSD approach to a cell cycle data set obtained under the condition of arrest of a cdc 15 temperature-sensitive mutant . As a pre-processing step, we employed KNNimpute  to fill in missing values and then identified 800 cell cycle-related genes as the gene subpopulation to test the mSD approach. For the mSD approach, we set the trade-off parameter λ in Eq. (3) as 0.08 for this experiment, since the cost function, C(λ) (Eq. (8)), reached its minimum at λ = 0.08 (see Additional file 1, Figure S4 in the supplementary material for the C(λ) curve). The modified cost function C(μ, λ) can also be found in Additional file 1, Figure S5 in the supplementary material, which supports the robustness of the selected parameter λ with respect to parameter μ. Since there is no ground truth of target genes available for this experiment, we used the functional enrichment of regulatory modules to compare the performance of mSD with that of another method, COGRIM . COGRIM is derived from a Bayesian hierarchical model and implemented using the Gibbs sampling technique. COGRIM can help infer the activation or inhibition of TFs acting on their target genes, with an integration of microarray gene expression data, ChIP-on-chip data, and motif information. The top GO enrichment p-values were transformed to negative logarithm values and averaged over all identified modules. The averaged enrichment score for the mSD method is 3.900, which is slightly better than the score for COGRIM (3.894), demonstrating that the mSD method can help identify functionally coherent gene clusters associated with specific TFs.
Breast cancer cell line data
We then applied the mSD approach to breast cancer cell line data to help understand estrogen signaling and action in breast cancer cells. Greater than 70% of invasive breast cancers diagnosed each year in the U.S. express detectable levels of estrogen receptor alpha (ER, ER+) . The most potent natural ligand for ER is 17β-estradiol, which can regulate the proliferation of breast cancer cells and alter their cytoarchitectural and phenotypic properties [37, 38]. Antiestrogens, such as Tamoxifen and Fulvestrant, are widely used in the treatment of these breast cancers and they produce a significant survival benefit for some patients. However, half of these cancers will recur, and recurrent metastatic breast cancer remains an incurable disease. It is, therefore, clinically and biologically important to understand what transcriptional programs regulate these recurrence events [39, 40].
To gain insights into the transcriptional programs that drive tumor recurrence, we have collected and acquired breast cancer cell line data in estrogen-induced and estrogen-deprived conditions, respectively. The estrogen induced data set is a time course microarray data set obtained from the ER+, estrogen-dependent breast cancer cell line MCF-7, treated with 17β-estradiol (E2) . The estrogen-deprived data set consists of a series of breast cancer variants that closely reflect clinical phenotypes of endocrine sensitive tumors . The breast cancer variants are also derived from the MCF-7 cell line, including MIII cells and LCC1 cells. MIII cells were derived directly from MCF-7 and became estrogen independent and proliferate aggressively after six months of selection in vivo in ovariectomized athymic mice. LCC1 cells were derived from MIII following further selection in vivo. Both cell lines remain ER+ and exhibit an estrogen-independent but antiestrogen sensitive phenotype [39, 40].
Twenty six breast cancer and ER-related transcription factors
The motif information was obtained from the TRANSFAC database  and ChIP-on-chip experiments . All human promoter DNA sequences were obtained from the UCSC Genome database ; we searched 5,000 bp upstream from the transcription start site (TSS). With all vertebrate position weight matrices (PWMs) provided by the TRANSFAC 11.1 Professional Database , the Match™  algorithm was used to generate a gene-motif binding strength matrix with cut offs that minimize the false-positive rate.
Target genes of ETF (V$ETF_Q6) in both E2-induced and ER-deprived conditions
Probe Set ID
HEAT SHOCK 70 KDA PROTEIN 9B (MORTALIN-2)
PLECTIN 1, INTERMEDIATE FILAMENT BINDING PROTEIN 500 KDA
EUKARYOTIC TRANSLATION TERMINATION FACTOR 1
INTERFERON INDUCED TRANSMEMBRANE PROTEIN 1 (9-27)
ADDUCIN 3 (GAMMA)
EGF-CONTAINING FIBULIN-LIKE EXTRACELLULAR MATRIX PROTEIN 1
FERM, RHOGEF (ARHGEF) AND PLECKSTRIN DOMAIN PROTEIN 1 (CHONDROCYTE-DERIVED)
EPIDERMAL GROWTH FACTOR RECEPTOR (ERYTHROBLASTIC LEUKEMIA VIRAL (V-ERB-B) ONCOGENE HOMOLOG, AVIAN)
SOLUTE CARRIER FAMILY 39 (ZINC TRANSPORTER), MEMBER 6
SOLUTE CARRIER FAMILY 16 (MONOCARBOXYLIC ACID TRANSPORTERS), MEMBER 1
FIBRONECTIN TYPE III DOMAIN CONTAINING 3A
PROTEIN PHOSPHATASE 3 (FORMERLY 2B), CATALYTIC SUBUNIT, ALPHA ISOFORM (CALCINEURIN A ALPHA)
HIV-1 TAT SPECIFIC FACTOR 1
PROGRAMMED CELL DEATH 4 (NEOPLASTIC TRANSFORMATION INHIBITOR)
SERINE PEPTIDASE INHIBITOR, KUNITZ TYPE 1
HCF-BINDING TRANSCRIPTION FACTOR ZHANGFEI
PHD FINGER PROTEIN 21A
ENHANCER OF ZESTE HOMOLOG 2 (DROSOPHILA)
PRA1 DOMAIN FAMILY, MEMBER 2
CENTROSOMAL PROTEIN 57 KDA
INOSITOL POLYPHOSPHATE-5-PHOSPHATASE F
WD REPEAT DOMAIN 47
UBIQUITIN SPECIFIC PEPTIDASE 46
B-CELL CLL/LYMPHOMA 9
MYOSIN VA (HEAVY POLYPEPTIDE 12, MYOXIN)
WD REPEAT DOMAIN, PHOSPHOINOSITIDE INTERACTING 2
INTEGRIN, BETA 4
CYCLIN-DEPENDENT KINASE 5, REGULATORY SUBUNIT 1 (P35)
ENOYL-COENZYME A, HYDRATASE/3-HYDROXYACYL COENZYME A DEHYDROGENASE
INHIBIN, BETA B (ACTIVIN AB BETA POLYPEPTIDE)
POTASSIUM INTERMEDIATE/SMALL CONDUCTANCE CALCIUM- ACTIVATED CHANNEL, SUBFAMILY N, MEMBER 1
PROTEIN TYROSINE PHOSPHATASE TYPE IVA, MEMBER 3
SOLUTE CARRIER FAMILY 16 (MONOCARBOXYLIC ACID TRANSPORTERS), MEMBER 6
TUMOR NECROSIS FACTOR, ALPHA-INDUCED PROTEIN 8
NUCLEOSOME ASSEMBLY PROTEIN 1-LIKE 1
P21 (CDKN1A)-ACTIVATED KINASE 2
MISSHAPEN-LIKE KINASE 1 (ZEBRAFISH)
CDC42 EFFECTOR PROTEIN (RHO GTPASE BINDING) 3
TUMOR NECROSIS FACTOR RECEPTOR SUPERFAMILY, MEMBER 14 (HERPESVIRUS ENTRY MEDIATOR)
SRY (SEX DETERMINING REGION Y)-BOX 13
SOLUTE CARRIER FAMILY 16 (MONOCARBOXYLIC ACID TRANSPORTERS), MEMBER 1
ACHAETE-SCUTE COMPLEX-LIKE 1 (DROSOPHILA)
INTEGRIN, ALPHA X (COMPLEMENT COMPONENT 3 RECEPTOR 4 SUBUNIT)
VASCULAR ENDOTHELIAL GROWTH FACTOR
SOLUTE CARRIER FAMILY 6 (NEUROTRANSMITTER TRANSPORTER, CREATINE), MEMBER 8
PRE-B-CELL LEUKEMIA TRANSCRIPTION FACTOR 2
VASCULAR ENDOTHELIAL GROWTH FACTOR
TRINUCLEOTIDE REPEAT CONTAINING 12
DNAJ (HSP40) HOMOLOG, SUBFAMILY C, MEMBER 13
PROGRAMMED CELL DEATH 4 (NEOPLASTIC TRANSFORMATION INHIBITOR)
NON-METASTATIC CELLS 4, PROTEIN EXPRESSED IN
ZINC FINGER CCCH-TYPE, ANTIVIRAL 1
FERRITIN, LIGHT POLYPEPTIDE
DOPEY FAMILY MEMBER 1
SPLICING FACTOR, ARGININE/SERINE-RICH 14
HEAT SHOCK TRANSCRIPTION FACTOR 1
EUKARYOTIC TRANSLATION INITIATION FACTOR 5A
CD47 ANTIGEN (RH-RELATED ANTIGEN, INTEGRIN-ASSOCIATED SIGNAL TRANSDUCER)
SERINE HYDROXYMETHYLTRANSFERASE 2 (MITOCHONDRIAL)
SERINE HYDROXYMETHYLTRANSFERASE 2 (MITOCHONDRIAL)
ROD1 REGULATOR OF DIFFERENTIATION 1 (S. POMBE)
TUBEROUS SCLEROSIS 2
NGFI-A BINDING PROTEIN 2 (EGR1 BINDING PROTEIN 2)
FATTY ACID DESATURASE 3
MCM5 MINICHROMOSOME MAINTENANCE DEFICIENT 5, CELL DIVISION CYCLE 46 (S. CEREVISIAE)
SIMILAR TO RIKEN CDNA A730055C05 GENE
CHROMOSOME 11 OPEN READING FRAME 23
VAV 3 ONCOGENE
MITOCHONDRIAL RIBOSOMAL PROTEIN L2
NUCLEOLAR COMPLEX ASSOCIATED 3 HOMOLOG (S. CEREVISIAE)
INTEGRIN BETA 1 BINDING PROTEIN (MELUSIN) 2
POTASSIUM INTERMEDIATE/SMALL CONDUCTANCE CALCIUM- ACTIVATED CHANNEL, SUBFAMILY N, MEMBER 2
RAB33B, MEMBER RAS ONCOGENE FAMILY
INTERLEUKIN 17 RECEPTOR C
HYPOTHETICAL PROTEIN PRO2176
GUANINE NUCLEOTIDE BINDING PROTEIN-LIKE 3 (NUCLEOLAR)-LIKE
Figure 3(b) shows the gene expression pattern of EGFR and its direct neighbors under estrogen-deprived conditions. As we can see from the figure, the expression level of CBL was largely suppressed in the estrogen-deprived condition. Since CBL can promote the ubiquitination and degradation of activated EGFR , we hypothesize that EGFR expression is increased in LCC1 cells due to both the activation of ETF and the downregulation of CBL. Studies to explore these predictions are currently in progress.
Overexpression and/or activation of the ErbB receptors (ErbB1 = EGFR) may also promote proliferation, motility, adhesion, and differentiation . Recent evidence has shown that increased growth factor (GF) signaling augments the ligand (estrogen)-independent activity of ER , which may partially explain the activity of ER (V$ER_Q6) in LCC1 cells as seen in Figure 2(b). In addition, the PLC-Gamma (PLCG1) and the JAK-STAT pathways are known to enhance the transcription of genes that regulate cell proliferation. This could contribute to the induced activity of STAT (V$STAT_Q6) (see Figure 2(b)), since one of the important signaling events activated by EGFR involves tyrosine phosphorylation of STAT. Stimulation of EGFR may induce tyrosine phosphorylation of STAT1, STAT3 and STAT5, initiating complex formation of these STATs with JAK1 and JAK2. JAKs are essential mediators of the interaction between EGFR and the STATs, which then translocate to the nucleus to stimulate gene transcription [60, 61]. Importantly, we have recently shown that EGFR signaling through p130Cas and the tyrosine kinase c-Src leads to phosphorylation of STAT5B, and that this signal transduction pathway induces Tamoxifen resistance in MCF-7 breast cancer cells .
It is also important to validate the identified target genes by biological experiments such as other breast cancer cell line data and ChIP-on-chip experiments. While many estrogen target genes have been identified through expression microarray studies , the results from ChIP-on-chip experiments are not currently complete. Nonetheless, our list of ER target genes includes the following known direct targets: TFF1, GREB1 [64, 65]; VAMP3 [65, 66]; PRKCSH, PLEC1, NT5C2, C19ORF2, TMOD3, and FLJ11286 . Furthermore, Cicatiello et al. have recently performed a comprehensive genome-wide analysis to investigate ERα target genes by chromatin immunoprecipitation coupled to massively parallel sequencing and expression data . Comparing our gene list with their ChIP-seq and expression data showed that we find family members or isoforms of CLIC3, ELF3, RAB31, FKBP4, IGFBP4, and SLC25A19 within their ChIP-seq data. Several genes (CDT1, IGFBP5, YARS, IPO4, EPS8L1, GPR137) appear in both our target gene list and their list of genes responsive to 17β-estradiol. Currently, we are investigating several other transcription factors with biological experiments including ChIP-on-chip experiments.
To provide further statistical evidence in support of the identified ER target genes, we conducted several additional analyses including statistical significance analysis, false discovery rate (FDR) calculation, gene set enrichment analysis, and motif enrichment analysis. For these statistical analyses, we selected two recently published genomic analyses of transcription factor binding of estrogen-regulated promoters as a benchmark [63, 67]; we acknowledge the incompleteness of ChIP-on-chip data for ER target genes across multiple cellualr contexts. Firstly, a statistically significant enrichment of ER target genes can be observed in our ER target gene list, as supported by the statistical significance (p- value = 3.59×10-06) calulated based on the assumption of a hyper-geometric distribution in a comparison with the ChIP-on-chip benchmark target genes. A low false positive rate is evident (FDR = 9.72×10-09) for the ER target gene list identified by mSD.
To calculate the FDR, we first ranked all the genes according to their computed binding strength in matrix S to $ER_Q6 binding site; we then selected a 'negative' set of genes with no binding connection with $ER_Q6 in position weight matrix (PWM) to form a null distribution of the binding strength. As in the mSD approach, we assumed that the binding strength of target genes regulated by a transcription factor roughly follows a Gamma distribution, since most transcription factors likely regulate relatively few target genes. Thus, we calculated the p-value for each gene by selecting the strongest binding strength when compared with those obtained from the null distribution. To properly determine a cut-off threshold of the binding strength, we also controlled the FDR for multiple tests based on the total number of genes in the experiments . We used the Benjamini-Hochberg procedure  to compute the false discovery rate as follows. Letting p k represent the corrected p- value computed for gene k, r k the rank of gene k sorted by the p-values, and G the total number of genes in the experiment, we calculated the false discovery rate for gene k as FDR k =Gp k /r k . For our identified ER taget gene list, we obtaned a low FDR (FDR = 9.72×10-09) corresponding to a binding strength cutoff of 0.7.
We also used a Kolmogorov-Smirnov (KS) test to evaluate the enrichment of ER target genes . We first ordered all the genes in our experiments according to their computed binding strength in matrix S. We then formed the distribution of the target gene set within this ordered list by the KS nonparametric rank statistic as described below . First, we denote n the total number of genes in the ordered ER target list, x the number of overlapped genes between our inferred target genes and the ChIP-on-chip benchmark data, and y the number of non-ovarlepped genes. Second, we let V(i) = y, if gene i is included in the overlapped genes; V(i) = -x, if not; note that we have from this configuration. Finally, we define the KS rank statistic as follows: to conduct this statistical test based on a permutation test . For our ER target gene list, the KS score (KS_score = 208) is significantly higher than the scores in the null distribution based on 10,000 randomly selected gene sets of the same size as the inferred ER target genes (with a statistical significance of p-value = 0.0099; see Fig. S9 in the supplementary material).
We evaluated the enrichment of ER binding sites in the promoters of target genes identified by the mSD approach using TRANSFAC . A motif enrichment analysis procedure was used based on a permutation test , which can be summarized as follows. Given a gene set S extracted by any computational method such as the mSD approach, a statistic to measure the enrichment of a specific motif f is defined as , where m is the motif binding score as defined by both matrix similarity score and core similarity score [29, 72]. To calculate the statistical significance (p-value), we need to form a null distribution. The null hypothesis is that the gene set is randomly generated from the gene population and there is no significant enrichment of the motif f. We randomly select gene sets with same size of S from the baseline gene population, and repeat B times to generate the corresponding null statistic enrichment score , for b = 1,..., B. The null hypothesis distribution is assumed to be symmetric in this study. The p-value can be obtained for each gene set by calculating the probability that a null gene set has a larger statistic than the observed statistic. Mathematically, the p-value can be calculated by
Traditional clustering methods have been widely used for gene module identification by searching for similar patterns in gene expression data. Clustering methods on gene expression data alone can only provide co-expressed gene modules. The expression pattern of genes in the same cluster may be correlated for reasons other than co-regulation. To identify gene regulatory modules, it is important to incorporate transcription factor binding information based either on ChIP-on-chip data or on motif information. The proposed method, namely, motif-guided sparse decomposition (mSD), is an integrated approach to combine gene expression data and binding information for regulatory module identification.
The main challenge is that the level of noise is high in both of the data types to be integrated. If a simple integration strategy is used, the method will result in many false positive target genes due to noise. Two strategies were developed in our mSD approach to mitigate the effects of noise impact on target gene identification. Firstly, an affinity propagation (AP) clustering method  is used to estimate transcription factor activity by clustering gene expression data in conjunction with binding information. Secondly, a sparse component analysis (SCA) method  is applied to estimate regulation strength by exploiting the constraint that most genes are regulated by only a few transcription factors. Since a gene cluster formed using an AP method reflects a similar pattern (from the gene expression data) and a shared regulator (from the binding information), the transcription factor activity (TFA) estimated from the cluster is a better starting point for regulatory module identification. Using a SCA method and the improved TFA estimates further refines the gene cluster by estimating the regulation strength of a particular transcription factor.
The mSD approach has been developed and implemented as follows. Binding motif information is initially used to define potential target genes, providing prior knowledge of the regulatory network topology. A sparse latent variable model is then used to integrate gene expression data and identify which of the potential target genes are actually activated by transcription factors. The mSD approach was implemented as a two-step algorithm to perform (1) transcription factor activity estimation, and (2) regulation strength estimation. In the first step, we start to integrate binding motif information and gene expression data to identify co-regulated gene clusters. A motif-guided gene cluster method was developed and used to find the gene clusters, based on a joint similarity measure from both gene expression data and motif information. To limit the impact of noise on gene clustering performance, the contribution of each data type to clustering is quantified. The optimal trade-off between data sources can then be determined by minimizing a cost function taking into account the frequency of motif occupancy and non-uniformity of expression pattern. Subsequently, we use a sparse decomposition method for regulation strength estimation.
Unlike the NCA method  that assumes the network topology derived from ChIP-on-chip data or motif information is known without error, we consider both network configuration and connection strength estimation as integrative components of the decomposition method. The use of prior knowledge of binding motif-information provides a solid starting point. As in Sabatti's work , we also incorporate a sparse constraint to achieve a biologically meaningful representation of regulatory networks. The experimental results on synthetic and real yeast data have demonstrated that our method can effectively identify the target genes of transcription factors. The application of mSD to breast cancer cell line data further revealed condition-specific regulatory modules associated with estrogen signaling and action in breast cancer, which are consistent with known gene functions in this cellular context.
The current work represents an important step toward integrating available biological information for reconstructing complex biological networks. This goal will be better accomplished by incorporating an analysis of the synergistic effect of regulators into the proposed method. Combinatorial analysis may help discover the complex interplay between different regulators in order to assemble a complete map of regulatory networks for complex biological systems.
This study is supported by the National Institutes of Health under Grants (CA139246, CA149147, CA109872, CA149653, NS29525, EB000830 and CA096483) and the Department of Defense under Grant (BC030280). We thank Alan Zwart for his work in the acquisition of breast cancer cell line microarray data. We also thank the reviewers for their invaluable suggestions that lead to many improvements in the manuscript.
- Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM: A census of human transcription factors: function, expression and evolution. Nat Rev Genet 2009, 10(4):252–263. 10.1038/nrg2538View ArticlePubMedGoogle Scholar
- Neame E: Gene networks: Network analysis gets dynamic. Nat Rev Genet 2008, 9(12):897–897. 10.1038/nrg2496View ArticleGoogle Scholar
- Clements M, Someren EPv, Knijnenburg TA, Reinders MJT: Integration of Known Transcription Factor Binding Site Information and Gene Expression Data to Advance from Co-Expression to Co-Regulation. Genomics, Proteomics & Bioinformatics 2007, 5(2):86–101.View ArticleGoogle Scholar
- Joung J-G, Shin D, Seong RH, Zhang B-T: Identification of regulatory modules by coclustering latent variable models: stem cell differentiation. Bioinformatics 2006, 22(16):2005–2011. 10.1093/bioinformatics/btl343View ArticlePubMedGoogle Scholar
- Yang YL, Suen J, Brynildsen MP, Galbraith SJ, Liao JC: Inferring yeast cell cycle regulators and interactions using transcription factor activities. BMC Genomics 2005, 6: 90. 10.1186/1471-2164-6-90PubMed CentralView ArticlePubMedGoogle Scholar
- Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 2003, 34(2):166–176. 10.1038/ng1165View ArticlePubMedGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al.: Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799–804. 10.1126/science.1075090View ArticlePubMedGoogle Scholar
- Dembele D, Kastner P: Fuzzy C-means method for clustering microarray data. Bioinformatics 2003, 19(8):973–980. 10.1093/bioinformatics/btg119View ArticlePubMedGoogle Scholar
- D'Haeseleer P, Liang S, Somogyi R: Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 2000, 16(8):707–726.View ArticlePubMedGoogle Scholar
- Yeung MKS, Tegnér J, Collins JJ: Reverse engineering gene networks using singular value decomposition and robust regression. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(9):6163–6168. 10.1073/pnas.092576199PubMed CentralView ArticlePubMedGoogle Scholar
- Nachman I, Regev A, Friedman N: Inferring quantitative models of regulatory networks from expression data. Bioinformatics 2004, 20(suppl_1):i248–256. 10.1093/bioinformatics/bth941View ArticlePubMedGoogle Scholar
- Latchman DS: Transcription Factors as Potential Targets for Therapeutic Drugs. Current Pharmaceutical Biotechnology 2000, 1: 57–61. 10.2174/1389201003379022View ArticlePubMedGoogle Scholar
- Yeung KY, Ruzzo WL: Principal component analysis for clustering gene expression data. Bioinformatics 2001, 17(9):763–774. 10.1093/bioinformatics/17.9.763View ArticlePubMedGoogle Scholar
- Lee S-I, Batzoglou S: Application of independent component analysis to microarrays. Genome Biology 2003, 4(11):R76. 10.1186/gb-2003-4-11-r76PubMed CentralView ArticlePubMedGoogle Scholar
- Brunet J-P, Tamayo P, Golub TR, Mesirov JP: Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences of the United States of America 2004, 101(12):4164–4169. 10.1073/pnas.0308531101PubMed CentralView ArticlePubMedGoogle Scholar
- Liao JC, Boscolo R, Yang YL, Tran LM, Sabatti C, Roychowdhury VP: Network component analysis: reconstruction of regulatory signals in biological systems. Proc Natl Acad Sci USA 2003, 100: 15522–15527. 10.1073/pnas.2136632100PubMed CentralView ArticlePubMedGoogle Scholar
- Sabatti C, James GM: Bayesian sparse hidden components analysis for transcription regulation networks. Bioinformatics 2006, 22(6):739–746. 10.1093/bioinformatics/btk017View ArticlePubMedGoogle Scholar
- Zhou XJ, Kao M-CJ, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio OM, Finch CE, Morgan TE, Wong WH: Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat Biotech 2005, 23(2):238–243. 10.1038/nbt1058View ArticleGoogle Scholar
- Georgiev P, Theis F, Cichocki A: Sparse component analysis and blind source separation of underdetermined mixtures. Neural Networks, IEEE Transactions on 2005, 16(4):992–996. 10.1109/TNN.2005.849840View ArticleGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285. 10.1038/10343View ArticlePubMedGoogle Scholar
- Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America 1999, 96(6):2907–2912. 10.1073/pnas.96.6.2907PubMed CentralView ArticlePubMedGoogle Scholar
- Frey BJ, Dueck D: Clustering by Passing Messages Between Data Points. Science 2007, 315(5814):972–976. 10.1126/science.1136800View ArticlePubMedGoogle Scholar
- Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A, Posch S, Grosse I: Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 2005, 21(11):2657–2666. 10.1093/bioinformatics/bti410View ArticlePubMedGoogle Scholar
- Jin VX, Rabinovich A, Squazzo SL, Green R, Farnham PJ: A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--A case study using E2F1. Genome Res 2006, 16(12):1585–1595. 10.1101/gr.5520206PubMed CentralView ArticlePubMedGoogle Scholar
- Tu Y, Stolovitzky G, Klein U: Quantitative noise analysis for gene expression microarray experiments. Proceedings of the National Academy of Sciences of the United States of America 2002, 99(22):14031–14036. 10.1073/pnas.222164199PubMed CentralView ArticlePubMedGoogle Scholar
- Kundaje A, Kundaje A, Middendorf M, Feng G, Wiggins CAWC, Leslie CALC: Combining sequence and time series expression data to learn transcriptional modules. Computational Biology and Bioinformatics, IEEE/ACM Transactions on 2005, 2(3):194–202. 10.1109/TCBB.2005.34View ArticleGoogle Scholar
- Cover TM, Thomas JA: Elements of Information Theory. 2nd edition. Wiley-Interscience; 2006.Google Scholar
- Levine MD, Nazif AM: Dynamic Measurement of Computer Generated Image Segmentations. Pattern Analysis and Machine Intelligence, IEEE Transactions on 1985, PAMI-7(2):155–164. 10.1109/TPAMI.1985.4767640View ArticleGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, et al.: TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes. Nucl Acids Res 2006, 34(suppl_1):D108–110. 10.1093/nar/gkj143PubMed CentralView ArticlePubMedGoogle Scholar
- Hoyer PO: Non-negative Matrix Factorization with Sparseness Constraints. J Mach Learn Res 2004, 5: 1457–1469.Google Scholar
- Arash Ali A, Massoud B-Z, Christian J: A Fast Method for Sparse Component Analysis Based on Iterative Detection-Estimation. AIP Conference Proceedings 2006, 872(1):123–130.Google Scholar
- Chang C, Ding Z, Hung YS, Fung PCW: Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data. Bioinformatics 2008, 24(11):1349–1358. 10.1093/bioinformatics/btn131View ArticlePubMedGoogle Scholar
- Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, De Moor B, Marchal K: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006, 7(1):43. 10.1186/1471-2105-7-43PubMed CentralView ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525. 10.1093/bioinformatics/17.6.520View ArticlePubMedGoogle Scholar
- Chen G, Jensen S, Stoeckert C: Clustering of genes into regulons using integrated modeling-COGRIM. Genome Biology 2007, 8(1):R4. 10.1186/gb-2007-8-1-r4PubMed CentralView ArticlePubMedGoogle Scholar
- Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ: Cancer statistics, 2009. CA Cancer J Clin 2009, 59(4):225–249. 10.3322/caac.20006View ArticlePubMedGoogle Scholar
- Musgrove EA, Sutherland RL: Biological determinants of endocrine resistance in breast cancer. Nat Rev Cancer 2009, 9(9):631–643. 10.1038/nrc2713View ArticlePubMedGoogle Scholar
- Clarke R, Liu MC, Bouker KB, Gu Z, Lee RY, Zhu Y, Skaar TC, Gomez B, O'Brien K, Wang Y, et al.: Antiestrogen resistance in breast cancer and the role of estrogen receptor signaling. Oncogene 2003, 22(47):7316–7339. 10.1038/sj.onc.1206937View ArticlePubMedGoogle Scholar
- Clarke R, Shajahan AN, Riggins RB, Cho Y, Crawford A, Xuan J, Wang Y, Zwart A, Nehra R, Liu MC: Gene network signaling in hormone responsiveness modifies apoptosis and autophagy in breast cancer cells. J Steroid Biochem Mol Biol 2009, 114(1–2):8–20. 10.1016/j.jsbmb.2008.12.023PubMed CentralView ArticlePubMedGoogle Scholar
- Creighton C, Cordero K, Larios J, Miller R, Johnson M, Chinnaiyan A, Lippman M, Rae J: Genes regulated by estrogen in breast tumor cells in vitro are similarly regulated in vivo in tumor xenografts and human breast tumors. Genome Biology 2006, 7(4):R28. 10.1186/gb-2006-7-4-r28PubMed CentralView ArticlePubMedGoogle Scholar
- Bjornstrom L, Sjoberg M: Mechanisms of Estrogen Receptor Signaling: Convergence of Genomic and Nongenomic Actions on Target Genes. Mol Endocrinol 2005, 19(4):833–842. 10.1210/me.2004-0486View ArticlePubMedGoogle Scholar
- Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF, et al.: Genome-wide analysis of estrogen receptor binding sites. Nat Genet 2006, 38(11):1289–1297. 10.1038/ng1901View ArticlePubMedGoogle Scholar
- Chen CC, Lee WR, Safe S: Egr-1 is activated by 17beta-estradiol in MCF-7 cells by mitogen-activated protein kinase-dependent phosphorylation of ELK-1. J Cell Biochem 2004, 93(5):1063–1074. 10.1002/jcb.20257View ArticlePubMedGoogle Scholar
- Gu Z, Lee RY, Skaar TC, Bouker KB, Welch JN, Lu J, Liu A, Zhu Y, Davis N, Leonessa F, et al.: Association of interferon regulatory factor-1, nucleophosmin, nuclear factor-kappaB, and cyclic AMP response element binding with acquired resistance to Faslodex (ICI 182,780). Cancer Res 2002, 62(12):3428–3437.PubMedGoogle Scholar
- Kageyama R, Merlino GT, Pastan I: A transcription factor active on the epidermal growth factor receptor gene. Proc Natl Acad Sci USA 1988, 85(14):5016–5020. 10.1073/pnas.85.14.5016PubMed CentralView ArticlePubMedGoogle Scholar
- Niida A, Smith A, Imoto S, Tsutsumi S, Aburatani H, Zhang M, Akiyama T: Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells. BMC Bioinformatics 2008, 9(1):404. 10.1186/1471-2105-9-404PubMed CentralView ArticlePubMedGoogle Scholar
- Gasco M, Shami S, Crook T: The p53 pathway in breast cancer. Breast Cancer Res 2002, 4(2):70–76. 10.1186/bcr426PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen-Durr P, Meichle A, Steiner P, Pagano M, Finke K, Botz J, Wessbecher J, Draetta G, Eilers M: Differential modulation of cyclin gene expression by MYC. Proc Natl Acad Sci USA 1993, 90(8):3685–3689. 10.1073/pnas.90.8.3685PubMed CentralView ArticlePubMedGoogle Scholar
- Zwicker J, Lucibello FC, Wolfraim LA, Gross C, Truss M, Engeland K, Muller R: Cell cycle regulation of the cyclin A, cdc25C and cdc2 genes is based on a common mechanism of transcriptional repression. Embo J 1995, 14(18):4514–4522.PubMed CentralPubMedGoogle Scholar
- Dedera DA, Waller EK, LeBrun DP, Sen-Majumdar A, Stevens ME, Barsh GS, Cleary ML: Chimeric homeobox gene E2A-PBX1 induces proliferation, apoptosis, and malignant lymphomas in transgenic mice. Cell 1993, 74(5):833–843. 10.1016/0092-8674(93)90463-ZView ArticlePubMedGoogle Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, et al.: The UCSC Genome Browser Database. Nucleic Acids Res 2003, 31(1):51–54. 10.1093/nar/gkg129PubMed CentralView ArticlePubMedGoogle Scholar
- Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl Acids Res 2003, 31(13):3576–3579. 10.1093/nar/gkg585PubMed CentralView ArticlePubMedGoogle Scholar
- Abell K, Watson CJ: The Jak/Stat Pathway: A Novel Way to Regulate PI3K Activity. Cell cycle 2005, 4(7):4. 10.4161/cc.4.7.1837View ArticleGoogle Scholar
- Moggs JG, Orphanides G: Estrogen receptors: orchestrators of pleiotropic cellular responses. EMBO reports 2001, 2(9):7. 10.1093/embo-reports/kve185View ArticleGoogle Scholar
- Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, et al.: STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 2009, (37 Database):D412–416. 10.1093/nar/gkn760
- Muthuswamy SK, Gilman M, Brugge JS: Controlled dimerization of ErbB receptors provides evidence for differential signaling by homo- and heterodimers. Mol Cell Biol 1999, 19(10):6845–6857.PubMed CentralPubMedGoogle Scholar
- Yarden Y, Sliwkowski MX: Untangling the ErbB signalling network. Nat Rev Mol Cell Biol 2001, 2(2):127–137. 10.1038/35052073View ArticlePubMedGoogle Scholar
- Nicholson RI, McClelland RA, Robertson JF, Gee JM: Involvement of steroid hormone and growth factor cross-talk in endocrine response in breast cancer. Endocr Relat Cancer 1999, 6(3):373–387. 10.1677/erc.0.0060373View ArticlePubMedGoogle Scholar
- Alvarez JV, Greulich H, Sellers WR, Meyerson M, Frank DA: Signal transducer and activator of transcription 3 is required for the oncogenic effects of non-small-cell lung cancer-associated mutations of the epidermal growth factor receptor. Cancer Res 2006, 66(6):3162–3168. 10.1158/0008-5472.CAN-05-3757View ArticlePubMedGoogle Scholar
- Smith KD, Wells A, Lauffenburger DA: Multiple signaling pathways mediate compaction of collagen matrices by EGF-stimulated fibroblasts. Exp Cell Res 2006, 312(11):1970–1982. 10.1016/j.yexcr.2006.02.022View ArticlePubMedGoogle Scholar
- Riggins RB, Thomas KS, Ta HQ, Wen J, Davis RJ, Schuh NR, Donelan SS, Owen KA, Gibson MA, Shupnik MA, et al.: Physical and functional interactions between Cas and c-Src induce tamoxifen resistance of breast cancer cells through pathways involving epidermal growth factor receptor and signal transducer and activator of transcription 5b. Cancer Res 2006, 66(14):7007–7015. 10.1158/0008-5472.CAN-05-3952View ArticlePubMedGoogle Scholar
- Kininis M, Chen BS, Diehl AG, Isaacs GD, Zhang T, Siepel AC, Clark AG, Kraus WL: Genomic analyses of transcription factor binding, histone acetylation, and gene expression reveal mechanistically distinct classes of estrogen-regulated promoters. Mol Cell Biol 2007, 27(14):5090–5104. 10.1128/MCB.00083-07PubMed CentralView ArticlePubMedGoogle Scholar
- Reid G, Metivier R, Lin CY, Denger S, Ibberson D, Ivacevic T, Brand H, Benes V, Liu ET, Gannon F: Multiple mechanisms induce transcriptional silencing of a subset of genes, including oestrogen receptor alpha, in response to deacetylase inhibition by valproic acid and trichostatin A. Oncogene 2005, 24(31):4894–4907. 10.1038/sj.onc.1208662View ArticlePubMedGoogle Scholar
- Carroll JS, Liu XS, Brodsky AS, Li W, Meyer CA, Szary AJ, Eeckhoute J, Shao W, Hestermann EV, Geistlinger TR, et al.: Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell 2005, 122(1):33–43. 10.1016/j.cell.2005.05.008View ArticlePubMedGoogle Scholar
- Lin CY, Strom A, Vega VB, Kong SL, Yeo AL, Thomsen JS, Chan WC, Doray B, Bangarusamy DK, Ramasamy A, et al.: Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells. Genome Biol 2004, 5(9):R66. 10.1186/gb-2004-5-9-r66PubMed CentralView ArticlePubMedGoogle Scholar
- Cicatiello L, Mutarelli M, Grober OM, Paris O, Ferraro L, Ravo M, Tarallo R, Luo S, Schroth GP, Seifert M, et al.: Estrogen receptor alpha controls a gene network in luminal-like breast cancer cells comprising multiple transcription factors and microRNAs. Am J Pathol 2010, 176(5):2113–2130. 10.2353/ajpath.2010.090837PubMed CentralView ArticlePubMedGoogle Scholar
- Shaffer JP: Multiple Hypothesis Testing. Ann Rev Psych 1995, 46: 561–584. 10.1146/annurev.ps.46.020195.003021View ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Statist Soc B 1995, 57(1):289–300.Google Scholar
- Marsaglia G, Tsang WW, Wang J: Evaluating Kolmogorov's Distribution. Journal of Statistical Software 2003, 8(18):1–4.View ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Chen L, Xuan J, Wang C, Shih Ie M, Wang Y, Zhang Z, Hoffman E, Clarke R: Knowledge-guided multi-scale independent component analysis for biomarker identification. BMC Bioinformatics 2008, 9: 416. 10.1186/1471-2105-9-416PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.