- Methodology Article
- Open Access
Reconstruction of novel transcription factor regulons through inference of their binding sites
© Elmas et al. 2015
- Received: 1 February 2015
- Accepted: 24 July 2015
- Published: 21 September 2015
In most sequenced organisms the number of known regulatory genes (e.g., transcription factors (TFs)) vastly exceeds the number of experimentally-verified regulons that could be associated with them. At present, identification of TF regulons is mostly done through comparative genomics approaches. Such methods could miss organism-specific regulatory interactions and often require expensive and time-consuming experimental techniques to generate the underlying data.
In this work, we present an efficient algorithm that aims to identify a given transcription factor’s regulon through inference of its unknown binding sites, based on the discovery of its binding motif. The proposed approach relies on computational methods that utilize gene expression data sets and knockout fitness data sets which are available or may be straightforwardly obtained for many organisms. We computationally constructed the profiles of putative regulons for the TFs LexA, PurR and Fur in E. coli K12 and identified their binding motifs. Comparisons with an experimentally-verified database showed high recovery rates of the known regulon members, and indicated good predictions for the newly found genes with high biological significance. The proposed approach is also applicable to novel organisms for predicting unknown regulons of the transcriptional regulators. Results for the hypothetical protein D d e0289 in D. alaskensis include the discovery of a Fis-type TF binding motif.
The proposed motif-based regulon inference approach can discover the organism-specific regulatory interactions on a single genome, which may be missed by current comparative genomics techniques due to their limitations.
- Transcription factor
- Regulon identification
- Motif discovery
- Sequential Monte Carlo filtering
In most sequenced genomes a significant proportion (3–6 %) of all genes are known to encode transcription factors , an essential DNA-binding component that regulates target gene transcriptional activity. The promoter regions where TFs specifically bind on genome are usually located in intergenic sites. Extensive sequencing of genomes of various organisms revealed that there is a large conservation of intergenic regions across different species, often occurring among moderately-distant relatives. This is the main intuition behind comparative genomics approaches where one aims to reconstruct regulatory networks by exploiting evolutionary conservation of regulatory features. The assumption is that if a TF-encoding gene is preserved in a set of closely-related species, the respective target genes that are regulated via cognate TF binding sites also tend to be preserved . Such regulatory elements as TFBSs and their target genes identified for each genome constitute the “regulon” of the given TF.
Although most known regulators abide evolutionary conservation, many TF-encoding genes can be organism-specific due to various reasons and the orthologs may not exist in closely-related species. In particular, the discovery of horizontal gene transfer can explain the occurrence of nonconserved regulatory members . The intuition of “true sites occur upstream of orthologous genes, false sites are scattered at random”  can thereby miss organism-specific interactions by treating them as false predictions. Hence, there is some limitation to the comparative genomics approaches, and alternative techniques are needed to identify organism-specific regulatory interactions .
In this work, we attack this problem from a more general computational perspective by aiming at single-genome TF regulon reconstruction, which makes our approach also suitable for novel organisms. We demonstrated our results for the TFs LexA, PurR and Fur in model bacteria Escherichia coli K12 by comparing to their respective regulons in manually curated RegPrecise database . The extended predictions –which are not captured by RegPrecise– are presented with annotations provided by GO  and Protein Interactions (http://www.pir.uniprot.org) databases. Putative regulon genes reported with high biological significance have expanded the known regulons of LexA, PurR and Fur. Furthermore, the results for a novel genome Desulfovibrio alaskensis discovered a Fis-type motif for the hypothetical regulator D d e0289.
Motif-based inference of novel regulons of transcription factors
The cis-acting regulatory elements of genes are usually located in upstream regions of their coding sequences, where gene expression is controlled by sequence-specific binding of the TFs. Co-expressed genes that have similar TF binding patterns in their regulatory regions can be good candidates for a putative regulon. Binding preference (motif) of a TF can be described by a matrix that represents the frequency of nucleotides observed in each position of the known binding sites. Among others, the position weight matrix (PWM) is a well-suited representation of motifs for statistical evaluation of the corresponding binding sites , and it is also a more sensitive metric for TFBS recognition .
Recently, more complex models are introduced when modeling TF-DNA binding affinities. In , it is shown that DNA structural features can be calculated from the nucleotide sequences in motif databases, and later  et al. proposed that certain 3D DNA shape information can be derived from high-throughput approaches. In , epigenetic factors (methylation, histone modification etc.) are considered in TF binding, where they investigated certain location- and cell-type specific relationships between epigenetic modifications and binding affinities. Although these studies expand the knowledge for modeling TF binding affinity, the proposed methods may not be readily employed in every genome. In this study, we focused on a more general regulon recovery approach based on the discovery of sequence motifs that could be broadly applied by only using the genome sequence and corresponding gene expression data sets.
By using the proposed approaches we estimated binding motifs of the given transcription factors, and reconstructed their putative binding sites and regulons. We compared our results (i.e., estimated motifs, binding sites and regulated genes) with the RegPrecise database which is manually curated by an approach  most relevant to our work. We used the well-studied transcription factor regulons, LexA, PurR, and Fur in the model organism E. coli to validate our predictions. LexA is a repressor protein that –under non-stress conditions– represses SOS response genes which involve in repairing DNA damages. Manually-curated LexA regulon in RegPrecise database consists of 30 genes that are regulated by 26 operons. PurR is an important repressor for the transcriptional regulation of purine metabolism. Its regulon includes genes participating in the biosynthesis of purine/pyrimidine nucleotides. FUR consists of a family of TFs including metal ion-dependent regulators Fur, Mur, Zur, and Nur which are responsible for homeostasis of the metal ions in the organism. For each studied TF we used relevant gene expression assays and knockout fitness dataset when available. We also applied our approach to a novel genome D. alaskensis to predict the binding behavior of one of its hypothetical regulators Dde0289. In fact, we discovered a rare type of binding motif which is structurally-weak and unexposed to most sequence-based motif finding tools. We further validated this prediction by applying the same approach to the estimated motif’s main presumed regulator (Fis) that has been annotated in E. coli . The details of the applications and results are described as follows.
Predictions in the model organism E. coli K12 validates our approach’s sensitivity
To assess our approach’s performance we reconstructed the regulons of LexA, PurR and Fur transcription factors which are among the well-studied regulons in Escherichia coli K12 strain. In literature, these transcription factor motifs are known to have palindromic sequence symmetry. The informative sites in the LexA motif are conserved in the middle of each half-site sequence, while for PurR and Fur they are mostly scattered across the motif and appear more informative near the motif half-site. In Figs. 3, 4 and 5 these structures can be seen in the estimated motifs too which are obtained by the proposed approach.
The best hits of the estimated (BAMBI2b) vs true (RegPrecise) LexA motifs in SwissRegulon database
Novel binding sites for LexA regulon
Functional enrichment in reconstructed LexA regulon
aklB, ada, dinJ
DNA repair (SP-PIR)
DNA damage (SP-PIR)
Cellular response to stress (GO)
DNA repair (GO)
Response to DNA damage stimulus (GO)
The best hits of the estimated (BAMBI2b) vs true (RegPrecise) PurR motifs in SwissRegulon database
Novel binding sites for PurR regulon
GO annotationss for this putative regulon suggested a cluster of genes with significant functional enrichment. Among the genes, nudB, cadA, cadB, cysZ, gltF, gltS, rhtC, and ygeW were members of this cluster which are annotated by the GO term nitrogen compound biosynthetic process (see Additional file 3).
The best hits of the estimated (BAMBI2b) vs true (RegPrecise) Fur motifs in SwissRegulon database
Functional enrichment in reconstructed Fur regulon
Iron transport (SP-PIR)
Ion transport (SP-PIR)
Since the performance of our approach depends on the complexity of the TF’s regulatory network, we expect better performance for relatively smaller regulons. It can be seen from Fig. 6 that the recovery rates are less significant for the larger regulons Crp and Fnr. This is in accord with our expectations since the Crp and Fnr family TFs are among the 7 global regulators that control 50 % of all regulated genes in E. coli .
Results for the hypothetical proteins indicates good predictions for non-generic TF binding motifs
We used our approach to predict the motifs of hypothetical proteins of the novel organism Desulfovibrio alaskensis . It is an anaerobic sulfate-reducing bacteria that is notable for its ability to produce hydrogen sulfide, a chemically reactive product toxic to plants, animals and humans. Dde0289 is one of the hypothetical DNA-binding proteins in D. alaskensis, which is annotated as a Sigma-54-dependent transcriptional activator. It is presumed to belong to Fis-type helix-turn-helix motifs in literature.
Although such structure is not generic in TF motifs, it is known that the intrinsic sequence-dependent protein-DNA conformations can result in high-affinity binding events. A recent study proposes that A-tracts are the preferred Fis-binding sites in E. coli, and in particular, A6-tracts provide the strongest binding signals . A6-tracts are known to induce intrinsic curvature to segments of DNA , whereby it enhances the local region’s exposure to transcription machinery.
The best hits of the estimated (BAMBI2b) vs true (RegulonDB) Fis motifs in SwissRegulon database
Refining the methods
We applied certain constraints to refine our predictions. Knockout fitness data sets are integrated to refine high-confidence gene set estimation by giving more biological relevance. Integrating appropriately-selected growth conditions often showed a positive effect for eliminating false positive genes. In motif discovery, we imposed the two-block motif structure to account for palindromic or inverted/direct-repeat symmetry patterns of the TFBSs. We also used a heuristic to define an optimal data feeding order to motif discovery problem based on the co-expression of local genes.
Further constraints can be imposed in the reconstruction program to limit the extent of predictions, such as searching genes only in the downstream direction by the strand which TF putatively binds. On the other hand, some restrictions could be relaxed to refine the results in particular cases. For instance, one can obtain different estimates by performing multiple runs of BAMBI2b with scrambled data order, in particular, when the co-expression patterns are not very determinative. If a reference motif is available, e.g., when reconstructing/expanding known regulons, this allows one to directly use it as the prior PWM (θ) in the proposed motif discovery algorithm BAMBI2b (see Additional file 1 for more detail.) Such procedures will likely reduce the negative effect of the false positive sequences which can diminish the high-confidence set.
In this paper, we proposed a computational method to predict TF regulons of a single organism without relying on phylogenetic footprinting techniques. The proposed approach requires gene expression (and knockout fitness) experiments for the organism of interest, and thereby can be suitable for predicting novel TF regulons. In particular, we aimed at bacterial transcription factors by using a two-block motif model to represent the binding sites and minimizing an information-theoretic dissimilarity measure between the TFBS cores. The presented results for LexA, PurR and Fur TFs in the model organism E. coli showed high recovery rates for their experimentally-verified regulons. Possible extensions as additional TFBSs for the known regulon genes and new putatively regulated genes showing high biological significance are noted. Experiments with a novel organism D. alaskensis also showed intuitive predictions for the hypothetical regulators. In particular, we observed that our approach is sensitive enough to discover rare TF binding events by recovering structurally low-probability motifs. In the light of results reported, we conclude that a motif-based regulon inference approach can discover the organism-specific regulatory interactions on a single organism, which may be missed by current comparative genomics techniques due to their limitations.
Co-regulated gene set estimation
For a given TF, we first estimate a group of putatively co-regulated genes in which we seek coherent expression patterns with the given TF’s coding gene. The gene expression data sets basically serve as the problem input. This computational problem has been addressed by numerous clustering algorithms [30, 31]. Recently, biclustering methods have gained more attention for their superiority in representing co-regulation in high-dimensional data sets by grouping the genes simultaneously with appropriate set of conditions. Since the generic clustering algorithms classify genes into different functional groups by considering all data points (conditions) at once they often fail to capture true interactions, if the genes exhibit similar behavior under only some but not all conditions.
In this work, we use a biclustering algorithm to select an optimal group among the annotated genes by simultaneously choosing a subset of experiments that best captures the group’s co-expression. The optimization algorithm looks for linear coherency of the data points (i.e., genes’ expressions) with the given TF’s coding gene. This model assumes linear dependency of expression between the co-regulated gene pairs. Although such simplifications may not reflect the real underlying relationships, they often yield effective results by capturing the zero-th and first order interactions . When the coherency in high-dimensional expression data becomes indiscriminate we employ the genome wide “knockout fitness” data as further biological evidence. The latter monitors the organism-level responses (fitness, survival rate) by exposing knockout/knockdown mutant strain libraries of genes to various experimental stress conditions , whereby providing the biclustering algorithm a systems-level insight.
Filtering out uninformative genes
After a high-confidence gene set is found we supply their upstream sequences for motif discovery. It is known that the adjacent genes are often co-regulated in local complexes (i.e., operons) and their expressions are controlled through only a few sites, hence the occurrence of binding sites in the upstream of such genes could be very sparse. So, the great majority of genes in an estimated high-confidence set may in fact belong to a few operons depending on the TF. In such cases, the upstream sequences not containing a cognate binding site will likely deteriorate the discovery of the true motif. We used a correlation-based filtering algorithm to detect those sequences that more likely contain a regulatory site of the underlying motif. Given a set of genes, by comparing each genes’ expression coherency with the TF’s coding gene the algorithm iteratively selects those that strictly follow its adjacent (preceding) gene’s co-expression pattern. (The details –i.e., Algorithm 1– are given in Additional file 1).
For the motif finding problem, we employ a Bayesian algorithm (BAMBI)  for discovering motifs of an unknown length and unknown number of instances in a given set of sequences. Estimating such unknown quantities as the number, length, and locations of the motif instances in each sequence is cast as a probabilistic inference problem through the use of hidden Markov model (HMM) framework. A computationally efficient sequential Monte Carlo algorithm is employed with a sampling procedure for constructing the posterior distributions of the hidden variables .
We modified this algorithm to capture particularly the TF motifs, i.e., by exploiting the intrinsic sequence properties such as base conservation and spatial similarity observed in the transcription factor binding sites. We called the newly proposed algorithm “BAMBI2b”. Since in most TF binding sites the bases variably contribute to the affinity of TF-DNA binding complex, defining a suitable model tailored for TFs is crucial for motif discovery . We employ a “two-block” motif model  to represent the TFBS’s conserved (core) segments by a pair of “blocks” where the information content is mostly concentrated. The length and location of such segments within the motif are not known a priori and they are estimated within the Bayesian framework. On the other hand, most TF binding sites are known to have certain sequence similarities where the half sites (mostly cores) occur to be (i) Watson-Crick complements (palindromic symmetry), (ii) identical sequences (direct-repeat symmetry), or (iii) reversed sequences (inverted-repeat symmetry). We use an information-theoretic measure in order to estimate the correct symmetry type from the TFBS cores. The algorithm looks to find such motif instances that will minimize the sequence dissimilarity between the PWM’s corresponding blocks, whereby maximizing the intrinsic symmetry conformation. (We represent the core dissimilarity as an averaged cross-entropy distance between the base probabilities of the motif blocks – see Additional file 1 for more detail).
Once the motif is established, we scan the entire genome by it for TFBS prediction. For each query sequence a binding score is calculated by a statistical significance metric using the motif’s PWM and the background nucleotide distribution. For this, we used the site recognition method presented in  which evaluates a possible binding site by two metrics, i.e., a likelihood-ratio (raw) score that quantifies the degree of motif’s preference in the respective site, and p-value that indicates the probability of obtaining this score (or a greater score) merely by chance. After setting a sufficient P-value threshold (0.001) and defining an intuitive log-likelihood ratio score threshold (e.g. such that the algorithm will recover the majority of the known TFBSs) we eliminate the structurally weak binding sites in our putative TFBS list, and check the remaining sites for a nearby gene appearance. Binding sequences that are located in the upstream of a gene’s 5′ start site are paired with those genes, and they together constitute the putative operons of the TF.
We allow bi-directional search to identify target genes on both the forward and complementary strands. For example, if a TFBS is predicted to bind on the positive strand, we look for target genes via (i) the site’s 5’–3’ direction on the positive strand and (ii) the complementary site’s 5’–3’ direction on the negative strand. Each time a gene is found, the program checks –as an option– if the adjacent genes are predicted to be in the same operon by using the operon prediction database , and if so the program includes them in the putative regulon.
TFBSs falling within the intragenic regions are often ignored in comparative genomics approaches due to ortholog-dependent reconstruction. Here, we allow the algorithm to look for such binding sites within the coding regions or open reading frames. As a result, this significantly improves the recovery of experimentally-verified binding sites and increases novel predictions.
We used Tomtom  to quantify the similarity between TF motifs. It calculates statistical measures between the given query motifs and a database of known motifs. In this study, we used the SwissRegulon’s motif database  for E. coli TFs. For each motif, we displayed the results for the best hit obtained by Tomtom in its default settings.
The data sets supporting the results of this article are included within the article (and its additional files).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Chua G, Morris Q, Sopko R, Robinson M, Ryan O, Chan E, et al. Identifying transcription factor functions and targets by phenotypic activation. Proc Natl Acad Sci USA. 2006; 103:12045–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Gelfand M, Novichkov P, Novichkov E, Mironov A. Comparative analysis of regulatory patterns in bacterial genomes. Brief Bioinf. 2000; 1:357–71.View ArticleGoogle Scholar
- Price M, Dehal P, Arkin A. Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia Coli. Genome Biol. 2008;9. doi:10.1186/gb-2008-9-1-r4.
- Kazakov A, Rodionov D, Price M, Arkin A, Dubchak I, Novichkov P. Transcription factor family-based reconstruction of singleton regulons and study of the CRP/FNR, ArsR, and GntR families in Desulfovibrionales genomes. J Bacteriol. 2013; 195:29–38.View ArticlePubMedPubMed CentralGoogle Scholar
- Novichkov P, Laikova O, Novichkova E, Gelfand M, Arkin A, Dubchak I, et al. Regprecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes. Nucleic Acids Res. 2010; 38:111–8.View ArticleGoogle Scholar
- Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000; 25:25–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Frith M, Fu Y, Yu L, Chen J, Hansen U, Weng Z. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res. 2004; 32:1372–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Dhaeseleer P. How does DNA sequence motif discovery work?Nat Biotechnol. 2006; 24:959–61.View ArticleGoogle Scholar
- Yang L, Zhou T, Dror I, Mathelier A, Wasserman WW, Gordan R, et al. Tfbsshape: a motif database for dna shape features of transcription factor binding sites. Nucleic Acids Res. 2014; 42(D1):148–55. doi:10.1093/nar/gkt1087.View ArticleGoogle Scholar
- Zhou T, Shen N, Yang L, Abe N, Horton J, Mann RS, et al. Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Nat Acad Sci USA. 2015; 112(15):4654–659. doi:10.1073/pnas.1422023112.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu L, Jin G, Zhou X. Modeling the relationship of epigenetic modifications to transcription factor binding. Nucleic Acids Res. 2015; 43(8):3873–85. doi:10.1093/nar/gkv255.View ArticlePubMedPubMed CentralGoogle Scholar
- Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, et al. Regpredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res. 2010; 38(suppl 2):299–307. doi:10.1093/nar/gkq531.View ArticleGoogle Scholar
- Cho B, Knight E, Barrett C, Palsson B. Genome-wide analysis of Fis binding in Escherichia Coli indicates a causative role for A-/AT-tracts. Genome Res. 2008; 18:900–10.View ArticlePubMedPubMed CentralGoogle Scholar
- Faith J, Hayete B, Thaden J, Mogno I, Wierzbowski J, Cottarel G, et al. Large-scale mapping and validation of Escherichia Coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007; 5:e8.View ArticlePubMedPubMed CentralGoogle Scholar
- Nichols RJ, Sen S, Choo YJ, Beltrao P, Zietek M, Chaba R, et al. Phenotypic landscape of a bacterial cell. Cell. 2011; 144(1):143–56. doi:10.1016/j.cell.2010.11.052.View ArticlePubMedGoogle Scholar
- Pachkov M, Erb I, Molina N, van Nimwegen E. Swissregulon: a database of genome-wide annotations of regulatory sites. Nucleic Acids Res. 2007; 35(suppl 1):127–31. doi:10.1093/nar/gkl857.View ArticleGoogle Scholar
- Price M, Huang K, Alm E, Arkin A. A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res. 2005; 33:880–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Dennis G, Sherman B, Hosack D, Yang J, Gao W, Lane H, et al. David: database for annotation, visualization, and integrated discovery. Genome Biol. 2003; 4:3.View ArticleGoogle Scholar
- Huang D, Sherman B, Lempicki R. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009; 37:1013.View ArticleGoogle Scholar
- Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, MuÃśiz-Rascado L, GarcÃ-a-Sotelo JS, et al.RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013; 41(Database issue):D203-13. doi:10.1093/nar/gks1201.View ArticlePubMedGoogle Scholar
- Jozefczuk S, Klie S, Catchpole G, Szymanski J, Cuadros I, Steinhauser D, et al. Metabolomic and transcriptomic stress response of Escherichia Coli. Mol Syst Biol. 2010; 6:5.View ArticleGoogle Scholar
- Masse E, Vanderpool C, Gottesman S. Effect of RyhB small RNA on global iron use in Escherichia Coli. J Bacteriol. 2005; 187:6962–71.View ArticlePubMedPubMed CentralGoogle Scholar
- Pettis G, Brickman T, McIntosh M. Transcriptional mapping and nucleotide sequence of the Escherichia Coli fepA-fes enterobactin region. Identification of a unique iron-regulated bidirectional promoter. J Biol Chem. 1988; 263:857–63.Google Scholar
- Sauer M, Hantke K, Braun V. Sequence of the fhue outer membrane receptor gene of escherichia coli k12 and properties of mutants. Mol Microbiol. 1990; 4:427–37.View ArticlePubMedGoogle Scholar
- Martinez A, Collado V. Identifying global regulators in transcriptional regulatory networks in bacteria. Curr Opin Microbiol. 2003; 6:482–9.View ArticleGoogle Scholar
- Hauser LJ1, Land ML, Brown SD, Larimer F, Keller KL, Rapp-Giles BJ, et al.Complete genome sequence and updated annotation of Desulfovibrio Alaskensis G20. J Bacteriol. 2011; 193:4268–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Dehal PS, Joachimiak MP, Price MN, Bates JT, Baumohl JK, Chivian D et al. Microbesonline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 2009. doi:10.1093/nar/gkp919.
- Koo H, Wu H, Crothers D. DNA bending at Adenine ·Thymine tracts. Nature. 1986; 320:501–6.View ArticlePubMedGoogle Scholar
- Bradley M, Beach M, Koning A, Pratt T, Osuna R. Effects of Fis on Escherichia Coli gene expression during different growth stages. Microbiology. 2007; 153:2922–40.View ArticlePubMedGoogle Scholar
- Basett D, Eisen M, Boguski M. Gene expression informatics– its all in your mine. Nat Genet. 1999; 21:3–4.View ArticleGoogle Scholar
- Gaasterland, Bekiranov S. Making the most of microarray data. Nat Genet. 2000; 24:204–6.View ArticlePubMedGoogle Scholar
- Gan X, Liew A, Yan H. Discovering biclusters in gene expression data based on high-dimensional linear geometries. BMC Bioinf. 2008; 9:209.View ArticleGoogle Scholar
- Oh J, Fung E, Price M, Dehal P, Ronald W, Giaever G, et al. A universal TagModule collection for parallel genetic analysis of microorganisms. Nucleic Acids Res. 2010; 38:146.View ArticleGoogle Scholar
- Jajamovich G, Wang X, Arkin A, Samoilov M. Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites. Nucleic Acids Res. 2011; 39:146.View ArticleGoogle Scholar
- Dong B, Wang X, Doucet A. A new class of soft MIMO demodulation algorithms. Signal Process IEEE Trans. 2003; 51:2752–63.View ArticleGoogle Scholar
- Stormo G. DNA binding sites: representation and discovery. Bioinformatics. 2000; 16:16–23.View ArticlePubMedGoogle Scholar
- Liang K, Wang X, Anastassiou D. A sequential Monte Carlo method for motif discovery. Signal Process IEEE Trans. 2008; 56:4496–507.View ArticleGoogle Scholar
- Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007; 8(2):24. [doi:10.1186/gb-2007-8-2-r24].View ArticleGoogle Scholar