Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data
© Zhang et al; licensee BioMed Central Ltd. 2008
Received: 11 December 2007
Accepted: 21 April 2008
Published: 21 April 2008
Integrating data from multiple global assays and curated databases is essential to understand the spatio-temporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network (TRN) with strong similarity to the structure of the underlying genetic regulatory modules. Decomposing the TRN into a small set of recurring regulatory patterns, called network motifs (NM), facilitates the inference. Identifying NMs defined by specific transcription factors (TF) establishes the framework structure of a TRN and allows the inference of TF-target gene relationship. This paper introduces a computational framework for utilizing data from multiple sources to infer TF-target gene relationships on the basis of NMs. The data include time course gene expression profiles, genome-wide location analysis data, binding sequence data, and gene ontology (GO) information.
The proposed computational framework was tested using gene expression data associated with cell cycle progression in yeast. Among 800 cell cycle related genes, 85 were identified as candidate TFs and classified into four previously defined NMs. The NMs for a subset of TFs are obtained from literature. Support vector machine (SVM) classifiers were used to estimate NMs for the remaining TFs. The potential downstream target genes for the TFs were clustered into 34 biologically significant groups. The relationships between TFs and potential target gene clusters were examined by training recurrent neural networks whose topologies mimic the NMs to which the TFs are classified. The identified relationships between TFs and gene clusters were evaluated using the following biological validation and statistical analyses: (1) Gene set enrichment analysis (GSEA) to evaluate the clustering results; (2) Leave-one-out cross-validation (LOOCV) to ensure that the SVM classifiers assign TFs to NM categories with high confidence; (3) Binding site enrichment analysis (BSEA) to determine enrichment of the gene clusters for the cognate binding sites of their predicted TFs; (4) Comparison with previously reported results in the literatures to confirm the inferred regulations.
The major contribution of this study is the development of a computational framework to assist the inference of TRN by integrating heterogeneous data from multiple sources and by decomposing a TRN into NM-based modules. The inference capability of the proposed framework is verified statistically (e.g., LOOCV) and biologically (e.g., GSEA, BSEA, and literature validation). The proposed framework is useful for inferring small NM-based modules of TF-target gene relationships that can serve as a basis for generating new testable hypotheses.
Enormous amount of data has been generated by the use of high-throughput analytical methods in biology during the last two decades. However, the inherited properties of these data create significant problems in their analysis and interpretation. Standard statistical approaches are not powerful enough to dissect data with thousands of variables (i.e., semi-global or global gene expression data) and limited sample sizes (i.e., several to hundred samples in one experiment). These properties are typical in microarray and proteomic datasets  as well as other high dimensional data where a comparison is made to biological samples that tend to be limited in number, thus suffering from curse of dimensionality .
A number of researches have explored the integration of multiple data sources (e.g., time course expression data and sequence motifs) for TRN inference [6–9]. A typical approach for exploiting two or more data sources uses one type of data to validate the results generated independently from the other (i.e., without data fusion). For example, cluster analysis of gene expression data followed by the identification of consensus sequence motifs in the promoters of genes within each cluster . The underlying assumption behind this approach is that genes co-expressed under varying experimental conditions are likely to be co-regulated by the same TF or sets of TFs. Holmes et al.  constructed a joint likelihood score based on consensus sequence motif and gene expression data and used this score to perform clustering. Segal et al.  built relational probabilistic models by incorporating gene expression and functional category information as input variables. Gene expression data and gene ontology (GO) data were combined for TRN discovery in B cell . Computational methodologies that allow systematic integration of data from multiple resources are needed to fully utilize the complementary information available in those resources.
Another way to reduce the complexity of the TRN inference problem is to decompose it into simple units of commonly used network structures. TRN is a network of interactions between TFs and the genes they regulate, governing many of the biological activities in cells. Breaking down the TRN into simplest units of commonly used network architectures helps in understanding complex biological networks. Such patterns of local interconnections are called network motifs (NM) . Since the establishment of the first NM in Escherichia coli , similar NMs have also been found in eukaryotes including yeast , plants, and animals [16–18], suggesting that the general structure of NMs are evolutionarily conserved. One well known family of NMs is the feed-forward loop (FFL) , which appears in hundreds of gene systems in E. coli [14, 20] and yeast [15, 21], as well as in other organisms [13, 16–18, 22, 23]. A comprehensive review on NM theory and experimental approaches is currently available . Knowledge of the NMs to which a given TF belongs facilitates the identification of downstream target gene clusters. In yeast, a genome-wide location analysis was carried out for 106 TFs and five NMs were considered significant: autoregulation, FFL, single input module, multi-input module and regulator cascade. The first four NMs are transcriptionally related, while the last one reflects the signalling pathway activities beyond transcriptional regulation.
In this study, we developed a computational framework that integrates information from time course gene expression experiment, genomic location analysis, binding site sequence, and GO category information to infer the relationship between TFs and their potential target genes based on known and predicted NMs. This was accomplished through a three-step approach outlined in the following. First, we applied cluster analysis of time course gene expression profiles to reduce dimensionality and use the GO category information to determine biologically meaningful clusters, upon which a model of the regulatory module is built. This step enables us to address the scalability problem that is faced by researchers in inferring TRNs from time course gene expression data with limited time points. Second, we trained support vector machines (SVMs) to classify TFs into different NMs based on their time course gene expression profiles, location analysis data, and target binding site sequences. The resulting SVM classifiers were utilized to predict NMs for TFs with unknown NMs. Finally, we used recurrent neural network (RNN) models that mimic the topology of NMs to identify gene clusters that may be regulated by a TF, thereby inferring the regulatory relationships between the TFs and gene clusters. A hybrid of genetic algorithm and particle swarm optimization (GA-PSO) methods was applied to train the RNN models. We tested the proposed computational framework using changes in gene expression associated with cell cycle progression in yeast , genomic location data , binding site sequences , and corresponding GO category information .
Clustering genes into groups with enrichment for biological functions
Gene set enrichment analysis (GSEA) for clusters generated by FCM with the optimal fuzziness value.
# of genes in cluster
Enriched Functional Category
Total Genes in the category
steroid metabolic process
steroid biosynthetic process
cytokinesis, completion of separation
cell separation during cytokinesis
mitotic cell cycle
dolichyl-phosphate-mannose-protein mannosyltransferase activity
protein amino acid O-linked glycosylation
primary active transmembrane transporter activity
cellular bud neck
site of polarized growth
DNA helicase activity
ribonucleoside-diphosphate reductase activity
cell cycle process
transmembrane transporter activity
leading strand elongation
DNA metabolic process
L-serine ammonia-lyase activity
site of polarized growth
pentose transmembrane transporter activity
fructose transmembrane transporter activity
mitotic sister chromatid cohesion
response to pheromone
conjugation with cellular fusion
amine transmembrane transporter activity
polyamine transmembrane transporter activity
sulfur metabolic process
methionine metabolic process
energy reserve metabolic process
cellular carbohydrate metabolic process
DNA replication preinitiation complex
structural constituent of cell wall
DNA-dependent DNA replication
Predicting NMs for TFs
Among the 800 cell cycle related genes, 85 have been identified to have TF-related functions based on their GO annotation. Out of these, 14 TFs have known NMs. A list of 85 TFs is presented in Additional file 2. We used data from 106 TFs to train SVM classifiers with time course gene expression profile and binding site sequence data as inputs to classify the TFs into four NMs. We retrieved the binding site sequence data for the TFs from the TRANSFAC database . For TFs with unknown binding site sequences, we used the discovered binding site sequences described by Harbison et al. .
Performance of SVM classifiers evaluated via LOOCV.
Gene expression data
Gene expression data and binding site sequence information
Inferring TF-target gene relationships in yeast
Recurrent neural network (RNN) models that mimic the topology of the known/predicted NMs were constructed to identify the relationships between TFs and putative gene clusters. The RNN models were trained to select for all 85 TFs the downstream targets from the 34 gene clusters.
The experimental results of GA-PSO with RNN.
We evaluated the predicted TF-target gene relationships for the following eight well known cell cycle related TFs: SWI4, SWI5, FKH1, NDD1, ACE2, KAR4, MET28 and RAP1. Among these, the first five have NM assignments, while the last three were assigned to different NMs by the SVM classifiers. Since the "true" gene regulatory network was not available, the accuracy of putative regulatory relationship was determined by searching known gene connections in databases. Based on the results of the NM module prediction, we collected literature evidences from SGD  and BIND  databases. We examined the inferred relationships for each of the eight TFs. An inferred relationship is assumed to be biologically significant if the TFs are correlated with the biological functions associated with the critical downstream cluster(s). Figure 3 lists the significant relationships; the eight TFs yielded an average precision of 82.9%. We calculated the precision as TP/(TP+FP), where TP and FP denote true positive and false positive, respectively. Network motifs for four of these TFs were identified in Chiang et al.  together with other four TFs. The eight TFs in  yielded an average precision of 80.1%.
The main goal of this study was to infer the components and underlying mechanism of gene regulation in yeast based on the combined constraints from multiple information sources. Our method effectively utilizes genomic location analysis for the establishment of NM for each TF. Target genes are grouped into biologically meaningful clusters and are represented by the average expression profiles of the genes in the cluster. Cluster analysis coupled with the idea of categorizing TFs into pre-defined NMs increased the robustness of our analysis not only in terms of obtaining meaningful modules, but also in terms of addressing the scalability problem. Some genes are very important in biological processes, thus are regulated through multiple pathways as shown by the presence of several distinct binding site sequences. Our proposed method allows the representation of a gene in different regulatory NMs since a TF can be assigned to more than one NM. This is different from previous approaches where only a single model is used for TRN inference [33, 34].
Compared to previous methods that aimed at global TRN inference, the TF-target gene relationships inferred in this study are expected to correspond more closely to biologically meaningful regulatory systems and naturally lend themselves to optimum experimental design methods. For example, the results presented in Figure 3 can be verified from previous biological evidences. For example, FKH1 is a gene whose protein product is a fork head family protein with a role in the expression of G2/M phase genes. It negatively regulates transcriptional elongation, and regulates donor preference during switching. To further investigate the possibilities that the predicted downstream gene clusters are truly regulated by FKH1, we applied the motif discovery tool, WebMOTIFS  to find shared motifs in these gene clusters. The results revealed that a motif called Fork_head, GTAAACAA, is identified as the most significant motif among these gene clusters . This finding strongly supports our NM inference results. The details of the binding site enrichment analysis (BSEA) results are shown in Additional file 3. Another example is the FFL involving SWI5, GAT3 and Gene Cluster 10. SWI5 has been identified as the upstream regulator of GAT3 [7, 15, 27]. Genes in cluster 10 are mostly involved in DNA helicase activity and mitotic recombination, both of which are important biological steps in the regulation of cell cycle. Although no biological evidences have shown that SWI5 and GAT3 are involved in these processes, there are significant numbers of genes in cluster 10 which are characterized (according to yeastract.com) as genes regulated by both TFs (24 for GAT3 and 23 for SWI5 out of 44 genes in cluster 10, respectively).
Compared to Chiang et al. , the first improvement of our approach is that instead of predicting the TF and individual downstream genes, we group genes into biologically functional clusters and discover the relationships between TFs and gene clusters. Through clustering, we were able to integrate the GO information, reduce the computational complexity, and established insights into new interactions. If a gene cluster is involved in the NM of one TF, and most genes have evidence that they are regulated by this TF, it is most likely that the genes left in this cluster are under the regulatory control of the TF. Furthermore, the intermediate result analysis such as GSEA and motif discovery analysis employed in our method ensure that every step in the data integration contributes to the final NM inference.
Reconstruction of TRNs is one of the major challenges in post genomic era. The study presented here addressed two important issues in TRN inference: (1) the development of analysis methods that utilizes multiple types of data and (2) network analysis on the NM level. A data integration approach is proposed to effectively infer the underlying mechanism and pattern of gene regulation using yeast as model on the basis of combined constraints arising from multiple biological data sources, including time course gene expression data, location analysis data, binding site sequence data and GO category information. This computational framework allows us to fully exploit the partial constraints that can be inferred from each data source. First, to reduce the inference dimensionalities, the genes are grouped into clusters by FCM, where the optimal fuzziness value is determined by statistical properties of gene expression data and the optimal cluster number is identified by integrating the GO category information. Then, the known NM information from location data analysis together with the binding site information is used to train SVM classifiers. TFs without NM assignment are predicted by the classifiers. LOOCV is used to build the SVM classifiers with high confidence. Once the NM(s) for a TF is identified, the hybrid GA-PSO algorithm is applied to search for target gene clusters that may be regulated by the TF. This search is guided by the successful training of a RNN model that mimics the regulatory NM(s) assigned to the TF. This has been demonstrated on eight well-studied yeast cell cycle dependent TFs. The upstream BSEA indicates that the proposed method has the potential to identify the underlying regulatory relationships between TFs and their downstream genes on the NM level. We conducted a thorough evaluation of our approach by applying it to a well studied process in yeast (regulation of cell cycle progression). Although we limited our analysis to gene regulatory program at the transcriptional level, we believe that our model is expandable to other biological network inference as more types of high-through data become available such as protein-protein interaction data (yeast two-hybrid) and in vivo (yeast one-hybrid) and in vitro (chromatin immunoprecipitation) protein-DNA interaction data. We anticipate that this approach will serve as a novel method for analyzing multi-source data on the NM level.
The data sources used in this study involve two information levels: (1) The location analysis data, binding site sequences, and GO category information characterize the physical interactions at factor-gene binding level; (2) The time course expression data characterize the functional interactions at transcriptional regulation level. The goal is to discern dependencies between the gene expression profiles and the physical (molecular interaction) mechanisms revealed by complementary data sources (e.g., location data and binding site sequences).
The genome-wide location analysis is a genomic scale assay  measuring the in vivo abundance of TFs that bind to intergenic regions of the DNA. Unlike the expression data, location analysis provides direct evidence about the physical processes underlying gene regulation. Available data from location analysis experiments of 106 TFs, representing ~52% of the total TFs encoded by yeast genome were used in this study to determine transcriptional NMs.
The DNA sequence motifs that define transcription factor binding sites (TFBSs), were extracted from TRANSFAC database . Additional information for other TFs were obtained from recent data as described by Harbison et al. .
GO information was used as the source of gene annotations from already validated biological evidences . Three GO categories (biological process, molecular function, and cellular component) were used as a basis to determine/evaluate the optimal number of gene clusters.
Gene expression profiling represents a high-throughput data source, where expression levels for thousands of genes are measured simultaneously. Models such as Bayesian networks  or probabilistic relational models  have been used to capture the interactions among the measured expression levels. The limited number of time points and the large number of genes present a challenge in inferring TRNs from time course gene expression data. The yeast (S. cerevisiae) cell cycle data are based on the changes in gene expression in terms of transcript abundance at six stages (cln3, clb2, alpha, cdc15, cdc28, and elu) . A total of 800 genes were identified as cell cycle-regulated based on cluster analysis . In this study, we chose the cdc15 expression data set for 800 genes, because this set has the largest number of time points (24).
Inputs and outputs of the proposed three-step approach
Cluster genes into biologically relevant groups
- Gene expression profiles
- GO category information
Categorize TFs into different NMs
Gene expression profile of a TF
Predicted NM(s) for the TF
- Encoded binding site sequences for the TF
Infer NM-based TF-target relationship via RNN
- Gene expression profile of a TF
Identified cluster centers in the known or predicted NMs (i.e., identified TF-target gene relationship).
- Gene expression profiles of cluster centers (metagenes)
- Known or predicted NM(s) of the TF
From the time course gene expression data, 800 genes are identified as being cell cycle-regulated based on an analysis that combines a Fourier algorithm and a correlation algorithm . These genes are functionally annotated based on information from GO. Missing values in the data are imputed using K nearest neighbour (KNN) imputation . Following that, the expression profile of each gene was standardized between 0 and 1.
Encoding of nucleic acids by numeric values.
Cluster genes into biologically relevant groups
We use cluster analysis to assign genes into functional groups and use the resulting cluster nodes as metagenes. Clustering is a widely used technique in microarray data analysis. An underlying assumption is that genes with similar expression profiles are more likely to have similar biological functions . Common clustering algorithms such as hierarchical clustering, k-means clustering, and self-organized maps have been used to analyze gene expression data [9, 44, 45]. These are called hard clustering because each gene is assigned to exactly one cluster. Microarray data involve substantial amount of noise due to biological and experimental factors. In this study, we utilize a soft clustering approach using FCM, which has been demonstrated to be resilient to noise; genes with high membership values cluster together in spite of the noise in the gene expression data .
To characterize the optimal gene clusters, we utilize FuncAssociate  that determines GO terms that are over-represented among the genes associated with a given cluster relative to what would be expected for randomly chosen sets of genes of the same size. FuncAssociate computes a one-tailed Fisher's exact test whose categories are "belongs/does not belong to cluster" and "is annotated/is not annotated with GO term". This computation is performed for all the GO terms for which annotations are available, and the P-values obtained are corrected for multiple hypotheses by comparing raw P-values against those obtained from 1000 simulated runs using randomized queries (resampling), as described in detail in Berriz et al. . The definition of the 'universe' of all genes used by FuncAssociate corresponds to the set of all genes used in FCM clustering. Once the cluster analysis is completed, a set of genes in a cluster is considered as a metagene in the subsequent analyses.
Categorize TFs into different NMs
Since GO has the most detailed gene annotation, we first search for genes with GO functional annotation terms related to transcription such as "transcriptional regulator activity", "DNA binding", etc. These genes are treated as potential TFs and also verified by comparison with the TRANSFAC database. The TF list in TRANSFAC only contains known TFs. The detailed annotation of GO provides a larger list containing not only the confirmed but also the potential TFs. These TFs are assigned to different kinds of motifs by their characteristic of regulation functions. This is based on the assumption that some TFs play crucial roles in some specific motifs. Unlike to most previous TRN inference approaches, where a single large network is sought, our method focuses on inferring target cluster gene(s) regulated by a particular TF. This is accomplished by assigning likely NM(s) to each TF based on prior biological knowledge collected from literatures that report on results from traditional experiments or large-scale genomic location analysis data .
Assemble positive set from genome-wide location data. Sample n TFs randomly from the whole TF set to construct the negative set (n = number of TFs in positive set).
Leave the first TF out as a test TF; the remaining TFs serve as a training set.
Build SVM classifiers using the training set.
Use trained SVM classifiers to determine the NM(s) for the TF left out in Step 2.
Replace the left out TF and leave the next TF out as a test TF.
Repeat Steps 3–5 until each TF is used as a test TF.
Summarize the prediction error for the left out TFs.
Repeat steps 1–7 100 times.
Calculate the mean of the predicted error in 100 runs.
The final SVM classifiers are trained by using all TFs with known NMs as a positive set and an equal number of randomly selected TFs as a negative set. The NM(s) for a TF with unknown NM(s) is determined using these classifiers.
Infer NM-based TF-target relationship via RNN
where x i is the gene expression level of the i th gene (1 ≤ i ≤ N, N is the number of genes in the model), φ(.) is a activation function introduces nonlinearity to the model (e.g. sigmoid function), w ij represents the effect of j th gene on the i th gene, b i denotes the bias for the i th term, and τ is the decay rate parameter. A negative value of w ij represents the inhibition of the j th gene on the i th gene, whereas a positive value of w ij represents the activation control of the j th gene on the i th gene. If w ij is zero, then it means that j th gene has no influence on the i th gene.
Figures 8A and 8B show the architecture of a RNN that can simulate the mathematical relationship in Eq. (2). As illustrated in Figures 8A and 8B, the output of each neuron is fed back to its input after a unit delay and is connected to other neurons . It can be used as a simple form of a NM, where each entity (e.g. TF or gene cluster) in the network is considered as a neuron. The RNN can model not only the interactions between entities but also entity self-regulation. In this study, we consider four RNN models (Figure 8C), each of which has an architectural layout that mimics the corresponding NM in Figure 5.
where x i (t) and are the true and predicted values (expression levels) for the i th neuron (entity) at time t. The goal is to determine the structure and weights of a RNN that minimize this cost function.
Assign the NM to the TF it belongs to. If the NM is unknown, use SVM to predict the NM(s).
Use the following GA-PSO algorithm to build a RNN model that mimics the NM to identify the downstream gene clusters.
2.1. Generate combinations of M gene clusters to represent the target genes that may be regulated by the TF. Each combination is a vector/chromosome. The initial set of combinations is composed of the initial population of chromosomes.
2.2. Use the PSO algorithm to train a RNN model for each chromosome, where the input is the TF and the outputs are gene clusters. The goal is to determine the optimized parameters of the RNN that maps the measured expression profiles of the TF to the gene clusters.
2.3. For each chromosome, calculate the RMSE between the predicted output of the RNN and measured expression profiles for the target gene clusters.
2.4. Apply GA operators (reproduction, cross-over, mutation) based on the RMSE calculated in Step 2.3 as a fitness value. This will generate new vectors/chromosomes altering the choice of output gene cluster combinations.
2.5. Repeat steps 2.1 – 2.4 until stop criteria are met. The stopping criteria are numbers of generations or minimum RMSE, depending on which one is met first.
Repeat Steps 1 and 2 for each TF.
When the process is completed, regulatory NMs are constructed between TFs and their regulated gene clusters.
GA and PSO parameter settings. The table presents the GA and PSO parameter settings used in this study.
Search space range
Acceleration constants c1
Acceleration constants c2
Size of swarm
List of abbreviations
- Transcriptional regulatory network:
- network motif:
- transcription factor:
- fuzzy c-means:
- gene set enrichment analysis:
- binding site enrichment analysis:
- support vector machine:
- leave-one-out cross-validation:
- recurrent neural network:
- genetic algorithm:
- particle swarm optimization:
- gene ontology:
- root mean squared error:
- feed-forward loop:
- transcription factor binding site:
- transcriptional start site:
- Bubitzky W, Granzow M, Berrar DP: Fundamentals of Data Mining in Genomics and Proteomics. New York, Springer; 2007.View ArticleGoogle Scholar
- Wit E, McClure J: Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons; 2006.Google Scholar
- Walhout AJ: Unraveling transcription regulatory networks by protein-DNA and protein-protein interaction mapping. Genome Res 2006, 16(12):1445–1454.View ArticlePubMedGoogle Scholar
- Blais A, Dynlacht BD: Constructing transcriptional regulatory networks. Genes Dev 2005, 19(13):1499–1511.View ArticlePubMedGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA: Genome-wide location and function of DNA binding proteins. Science 2000, 290(5500):2306–2309.View ArticlePubMedGoogle Scholar
- Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput 2002, 437–449.Google Scholar
- Simon I, Barnett J, Hannett N, Harbison CT, Rinaldi NJ, Volkert TL, Wyrick JJ, Zeitlinger J, Gifford DK, Jaakkola TS, Young RA: Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 2001, 106(6):697–708.View ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 1998, 9(12):3273–3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22(3):281–285.View ArticlePubMedGoogle Scholar
- Holmes I, Bruno WJ: Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics 2001, 17(9):803–820.View ArticlePubMedGoogle Scholar
- Segal E, Taskar B, Gasch A, Friedman N, Koller D: Rich probabilistic models for gene expression. Bioinformatics 2001, 17 Suppl 1: S243–52.View ArticlePubMedGoogle Scholar
- Tuncay K, Ensman L, Sun J, Haidar AA, Stanley F, Trelinski M, Ortoleva P: Transcriptional regulatory networks via gene ontology and expression data. In Silico Biol 2007, 7(1):21–34.PubMedGoogle Scholar
- Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U: Superfamilies of evolved and designed networks. Science 2004, 303(5663):1538–1542.View ArticlePubMedGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 2002, 31(1):64–68.View ArticlePubMedGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298(5594):799–804.View ArticlePubMedGoogle Scholar
- Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, Volkert TL, Schreiber J, Rolfe PA, Gifford DK, Fraenkel E, Bell GI, Young RA: Control of pancreas and liver gene expression by HNF transcription factors. Science 2004, 303(5662):1378–1381.PubMed CentralView ArticlePubMedGoogle Scholar
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122(6):947–956.PubMed CentralView ArticlePubMedGoogle Scholar
- Swiers G, Patient R, Loose M: Genetic regulatory networks programming hematopoietic stem cells and erythroid lineage specification. Dev Biol 2006, 294(2):525–540.View ArticlePubMedGoogle Scholar
- Mangan S, Alon U: Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci U S A 2003, 100(21):11980–11985.PubMed CentralView ArticlePubMedGoogle Scholar
- Mangan S, Zaslaver A, Alon U: The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 2003, 334(2):197–204.View ArticlePubMedGoogle Scholar
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks. Science 2002, 298(5594):824–827.View ArticlePubMedGoogle Scholar
- Saddic LA, Huvermann B, Bezhani S, Su Y, Winter CM, Kwon CS, Collum RP, Wagner D: The LEAFY target LMI1 is a meristem identity regulator and acts together with LEAFY to regulate expression of CAULIFLOWER. Development 2006, 133(9):1673–1682.View ArticlePubMedGoogle Scholar
- Iranfar N, Fuller D, Loomis WF: Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC. Dev Biol 2006, 290(2):460–469.View ArticlePubMedGoogle Scholar
- Alon U: Network motifs: theory and experimental approaches. Nat Rev Genet 2007, 8(6):450–461.View ArticlePubMedGoogle Scholar
- Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31(1):374–378.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29.PubMed CentralView ArticlePubMedGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature 2004, 431(7004):99–104.PubMed CentralView ArticlePubMedGoogle Scholar
- Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhauser R, Pruss M, Schacherer F, Thiele S, Urbach S: The TRANSFAC system on gene expression regulation. Nucleic Acids Res 2001, 29(1):281–283.PubMed CentralView ArticlePubMedGoogle Scholar
- Ressom HW, Zhang Y, Xuan J, Wang J, Clarke R: Inferring network interactions using recurrent neural networks and swarm intelligence. In Proceedings of the 28th IEEE Engineering in Medicine and Biology Society Annual International Conference, New York City, NY. New York City, NY , IEEE; 2006:4241–4244.Google Scholar
- Cherry JM, Ball C, Weng S, Juvik G, Schmidt R, Adler C, Dunn B, Dwight S, Riles L, Mortimer RK, Botstein D: Genetic and physical maps of Saccharomyces cerevisiae. Nature 1997, 387(6632 Suppl):67–73.PubMed CentralPubMedGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003, 31(1):248–250.PubMed CentralView ArticlePubMedGoogle Scholar
- Chiang JH, Chao SY: Modeling human cancer-related regulatory modules by GA-RNN hybrid algorithms. BMC Bioinformatics 2007, 8: 91.PubMed CentralView ArticlePubMedGoogle Scholar
- Keedwell E, Narayanan A: Discovering gene networks with a neural-genetic hybrid. IEEE/ACM Trans Comput Biol Bioinform 2005, 2(3):231–242.View ArticlePubMedGoogle Scholar
- Weaver DC, Workman CT, Stormo GD: Modeling regulatory networks with weight matrices. Pac Symp Biocomput 1999, 112–123.Google Scholar
- Romer KA, Kayombya GR, Fraenkel E: WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches. Nucleic Acids Res 2007, 35(Web Server issue):W217–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Weigel D, Jackle H: The fork head domain: a novel DNA binding motif of eukaryotic transcription factors? Cell 1990, 63(3):455–456.View ArticlePubMedGoogle Scholar
- Friedman N, Linial M, Nachman I, Pe'er D: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7: 601–620.View ArticlePubMedGoogle Scholar
- Ressom H, Reynolds R, Varghese RS: Increasing the efficiency of fuzzy logic-based gene expression data analysis. Physiol Genomics 2003, 13(2):107–117.View ArticlePubMedGoogle Scholar
- Yeung KY, Medvedovic M, Bumgarner RE: From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol 2004, 5(7):R48.PubMed CentralView ArticlePubMedGoogle Scholar
- Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with FuncAssociate. Bioinformatics 2003, 19(18):2502–2504.View ArticlePubMedGoogle Scholar
- De Hoon MJ, Imoto S, Miyano S: Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics 2002, 18(11):1477–1485.View ArticlePubMedGoogle Scholar
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520–525.View ArticlePubMedGoogle Scholar
- Gibbons FD, Roth FP: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res 2002, 12(10):1574–1581.PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95(25):14863–14868.PubMed CentralView ArticlePubMedGoogle Scholar
- Toronen P, Kolehmainen M, Wong G, Castren E: Analysis of gene expression data using self-organizing maps. FEBS Lett 1999, 451(2):142–146.View ArticlePubMedGoogle Scholar
- Dembele D, Kastner P: Fuzzy C-means method for clustering microarray data. Bioinformatics 2003, 19(8):973–980.View ArticlePubMedGoogle Scholar
- Xie XL, Beni G: A Validity Measure for Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991, 13(8):841–847.View ArticleGoogle Scholar
- Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a data set via the gap statistic. J Royal Statist Soc B 2001, 63: 411–423.View ArticleGoogle Scholar
- Ressom H, Wang D, Natarajan P: Adaptive double self-organizing maps for clustering gene expression profiles. Neural Netw 2003, 16(5–6):633–640.View ArticlePubMedGoogle Scholar
- Ressom HW, Zhang Y, Xuan J, Wang Y, Clarke R: Inferring Network Interactions using Recurrent Neural Networks and Swarm Intelligence. Conf Proc IEEE Eng Med Biol Soc 2006, 1: 4241–4244.View ArticlePubMedGoogle Scholar
- Xu R, Wunsch DC: Gene regulatory networks inference with recurrent neural network models: 31 July-4 Aug. 2005, 1: 286–291.Google Scholar
- D'Haeseleer P, Wen X, Fuhrman S, Somogyi R: Linear modeling of mRNA expression levels during CNS development and injury. Pac Symp Biocomput 1999, 41–52.Google Scholar
- Wahde M, Hertz J: Modeling genetic regulatory dynamics in neural development. J Comput Biol 2001, 8(4):429–442.View ArticlePubMedGoogle Scholar
- Ma J, Zhao Y, Ahalt S: OSU SVM Classifier Matlab Toolbox. 2002.Google Scholar
- Birge B: PSOt - a particle swarm optimization toolbox for use with Matlab. 2003, 182–186.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.