Skip to main content
Figure 4 | BMC Bioinformatics

Figure 4

From: Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data

Figure 4

The proposed computational framework. The framework of the proposed method is composed of three parts. First, gene expression profiles are clustered into biologically meaningful groups by FCM; GO category information of genes is used to determine the optimal cluster number. To evaluate the gene clusters, gene set enrichment analysis (GSEA) is performed on the optimal clusters. This analysis revealed that 28 out of 34 optimal clusters were enriched in certain biological categories (P-value < 0.001) (Table 1). In NM assignment part, SVM classifiers are built to classify TFs into known NM categories. For a given TF, its time course gene expression profile and binding site sequences are used as inputs to SVM classifiers to predict its corresponding NM(s). Positive training data sets include TFs with known NMs from location data analysis. Negative training data sets include TFs randomly chosen from TF pools (same size as positive ones). After the gene clusters are formed and TFs are assigned to NM categories, the relationships between TFs and gene clusters are inferred by training recurrent neural networks (RNNs) that mimic the topologies of the NMs that TFs are assigned to. Since the NM inference only includes small number of TFs and gene clusters, the computational complexity is reduced compared to the global TRN inference problem (inferring TRN on gene level by including all genes in one data set). Finally, the inferred TF-target gene relationships are validated by BSEA and literature results.

Back to article page