Skip to main content
Figure 8 | BMC Bioinformatics

Figure 8

From: Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data

Figure 8

Known TFs' binding site motifs identified in tissue-specific gene expression clusters. Gene clusters are arranged in columns named according to the tissue type where the majority of the genes are up-regulated. The column labeled "Wasserman" corresponds to the 40 validated human skeletal muscle CRMs [15]; the column labeled "Skeletal muscle – only human expr" corresponds to a skeletal muscle expression cluster identified from the GNF data without considering the expression patterns of any homologous mouse genes (however, RepeatMasked, noncoding sequence conserved between human and mouse was still examined by MultiFinder). Each row represents a known TFBS motif obtained from the TRANSFAC Professional 7.4 database [69,70]. A listing of the TRANSFAC TFBS matrix accession numbers for each of the TFBS motif names shown here and all others that we considered is provided in Additional Data File 2. The Mef2, Myf, Sp1, SRF, and Tef motifs were taken from Philippakis et al. [35]. Shown for each expression cluster are the nonredundant motifs from five separate MultiFinder runs for both the input sequence set and the matched randoms; a correlation coefficient cutoff of 0.6 was used in the merging of highly similar motifs discovered by MultiFinder (see Methods). The following color scheme indicates whether a gene encoding a TF is expressed above the detection threshold (here, AD >= 200) and whether a motif matching that TF's BS motif was found by MultiFinder: black and gray boxes denote TFs whose binding site motifs we did not find, with black boxes denoting TFs that were not expressed above the detection threshold in the tissue cluster (AD < 200) and gray boxes denote those that were expressed an AD value of at least 200; yellow boxes denote TFs that were expressed below the detection threshold, but for which matches to the corresponding DNA binding site motifs were found by MultiFinder and passed the block filter; green boxes denote TFs that were expressed below the detection threshold, but for which matches to their binding site motifs were found by MultiFinder and failed the block filter; orange boxes denote TFs that were expressed above the detection threshold and were found by MultiFinder, but that failed the block filtering screen; red boxes denote TFs that were expressed above the detection threshold, were found by MultiFinder, and passed the block filtering screen; for the yellow, green, orange, and red boxes, solid colored boxes denote the discovered motifs whose group specificity scores were lower (i.e., more significant) than the geometric mean of the block-filtered motifs resulting from the size-matched randomly selected sets of genes, while the stippled boxes denote the discovered motifs whose group specificity scores were equal to or greater (i.e., less significant) than the geometric mean resulting from the size-matched randomly selected sets of genes. For the skeletal muscle CRMs ("Wasserman"), the size-matched randoms were chosen such that they were also matched to come from the same genomic regions upstream of transcriptional Start as were the Wasserman CRMs; in other words, the randoms for the 1 kb upstream Wassermansequences are all within 1 kb of the transcriptional start site. Similarly, since the examined Wasserman sequences were conserved and RepeatMasked, so too were the corresponding size-matched randoms.

Back to article page