- Open Access
Analysis tools for the interplay between genome layout and regulation
BMC Bioinformaticsvolume 17, Article number: 191 (2016)
Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes.
Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. GREAT:SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information.
We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.
Advances in genomics, transcriptomics and genome structural biology have revealed significant insights on the interdependence between genome expression, genome layout and the three-dimensional (3D) chromosome conformation . Evidence for non-random genome layout, defined as the relative positioning of co-regulated or co-functional genes, stems from two main insights. First, the analysis of contiguous genome segments across species has highlighted synteny, that is the conservation of gene order along chromosome regions . Secondly, studies of long-range regularities within chromosomes in eubacteria, archaea and yeast have emphasised periodic positioning of genes that are co-regulated, co-expressed, or evolutionarily correlated [3–8] respectively. These studies have all proposed a non-random, periodic arrangement of genomic features (such as genes, operons and gene expression) as a common feature for compact genomes of all phyla of life. This periodic arrangement of genomic features imposes certain 3D conformational advantages which provide a potential mechanism for genome regulatory efficiency and which has been favoured by evolution in genomes that are under selective pressure to remain small. Furthermore, in organisms with more complex genomes, the formation of loops, inter-chromosomal associations and transcription factories affects (and gets affected by) the expression of genes [9–11], suggesting that active transcription might be a shaping force of genomes. A set of tools which are able to investigate genomic positional regularities, in the context of genome expression regulation, could provide bioscience researchers -in combination with the high availability of multi-omics data- with novel and informative insights regarding genome organisation, regulation and function.
We developed GREAT:SCAN (Genome REgulatory Architecture Tools:SCAN), a collection of on-line software tools designed to perform systematic detection of regular patterns along genomes, integrate and interconnect results between available methods and provide informative visualisations. GREAT:SCAN extends two algorithms previously developed by our team for the detection of periodically arranged genes  and the prediction of transcription factor binding sites (TFBS) . It provides a web user interface which streamlines the usage of these algorithms, performs a fully automated analysis of regularities among genomic features, extends with novel functionalities the analytical capabilities of the previous software and reports results in human- (plots and graphs) as well as in machine- (tables) readable formats. GREAT:SCAN is available in two versions: a) running as an online application integrated in the computational framework of the GREAT portal in the servers of abSYNTH platform (absynth.issb.genopole.fr/Bioinformatics/tools/GREAT); b) as a downloadable stand-alone command line Docker image of each individual tool, to facilitate incorporation into pipelines.
Here, we introduce this new collection of tools called GREAT:SCAN, we describe their novel features and we demonstrate their use and analytical capabilities by a) calculating regularities on the regulons of the seven major transcription factors (TFs) in Escherichia coli; and b) predicting new target genes in the corresponding regulons by using data from two different sources: local TFBS sequence and global gene position along the genome.
Genome organisation influences fundamental biological processes such as transcription and replication, and reciprocally, through evolutionary pressure, those fundamental biological processes are shaping genome organisation [14, 15]. In prokaryotes transcription and genome organisation are tightly coupled, with all major TFs playing a dual role as chromosome structural proteins and as transcriptional regulators . Furthermore, transcriptional activity -and therefore expression regulation- is spatially organised both in bacterial nucleoids and eukaryotic nuclei [17, 18], showing indeed regular spatial patterns. Ascertaining the interplay between genome organisation and transcription regulation will provide key insights into whole genome expression, nucleus/nucleoid organisation and genome architecture . Understanding and exploiting this interplay is an essential step towards rational automated whole-genome design and engineering.
The collection currently includes two tools. GREAT:SCAN:PATTERNS, a package for the systematic analyses of regular patterns on genomes, and GREAT:SCAN:PRECISION, a multi-view machine learning tool to predict novel TFBSs.
GREAT:SCAN:PATTERNS performs a complete analysis of periodic patterns along genomes. The analysis comprises three steps: 1) The systematic detection and visualisation of all possible periods from the genome positions of features of interest (such as co-regulated genes); 2) The clustering and visualisation of genomic features which are “in-phase” in the phase coordinates; 3) The mapping of any sub-region of the genome where a periodic pattern can be detected.
The first step commences by exhaustively evaluating all the possible periods in the dataset. A pre-processing step removes features located very proximal to each other (the proximity threshold is a user specified parameter). This is necessary, because proximal genes can bloat the calculation of p-values of the periodic score , thus reporting a lot of false positive periods. The periods are evaluated according to their p-values. The un-normalised p-value is computed for a given period by the probability of having a higher periodicity score by randomly drawing the sites according to a uniform law. The p-values get normalised after applying a correction calculation to account for multiple testing. Indeed, for relatively short periods, many periods get tested, therefore increasing the chances that a significant pattern will be detected. The p-values are corrected to take this fact into account by applying a period-dependent multiple testing correction. The periods which are reported by this first analysis step and which are considered for downstream analysis are the ones with a p-value below a user specified threshold for normalised p-values. The first step ends by illustrating all the selected periods and their p-values in a plot called the “periodobar”, inspired by the periodograms in spectral analysis. A schematic representation of the processes involved in the calculations of periods for this first step of PATTERNS is illustrated in the flowchart of Fig. 1.
In the second step, DBSCAN, an established density based clustering algorithm , is employed to detect clusters of genomic features that are “in-phase” on the phase coordinates. Here all the genomic coordinates of the features of interest are transformed into phases (the remainder of the modulo division of the absolute coordinate over the period length), thus for each period reported as significant from the previous step an individual set of phase coordinates is computed. Then DBSCAN performs a clustering on the phase coordinates by accepting as a minimum distance between two members of a cluster a weighted ratio between each period and the -user specified- proximity threshold . The weight of this ratio is controlled by the “clustering exponent”, a parameter which allows the user to tune the sensitivity of the clustering algorithm. The result for each significant period is visualised by an intuitive plot called the “clustergram” where the phase coordinates are transformed from angular coordinates to linear coordinates on the horizontal axis of the plot. An additional feature of this second step is the calculation of the positional score, which corresponds to the individual contribution that each genomic feature brings to the significance (i.e. the periodicity score) of every particular period. Intuitively, genomic features which belong to clusters will exhibit higher positional score than the ones that appear isolated, (Fig. 3 and the right hand side vertical axis). The “clustergram” reports the clusters detected by DBSCAN and provides the users with visual evidence of potential local spatial proximity of the genomic features of interest (genes, operons etc.).
The third step introduces a novel capability of the periodicity detection algorithm: a variable size sliding window approach. The algorithm performs a similar fine-tuned search for regular patterns as described above, but within a specific genomic region delimited by a sliding window. It starts with a 10-kbp size window which runs along the whole genome and looks for periodicities of the features of interest. The window is then enlarged incrementally until it covers 95 % of the length of the whole genome. By reporting the boundaries of the regions where periodicities are detected, this approach is able to map the observed periods on their respective genomic regions.
GREAT:SCAN:PRECISION (“PRECISION” stands for “PREdiction of CIS-regulatory elements improved by gene positiON”) is a novel implementation in the R language  of PRECISION , a multi-view learning algorithm for TFBS prediction which incorporates two views: a) DNA sequence motif readout calculated by a TFBS position weight matrix (local sequence classifier) and b) individual gene contribution to overall genome periodic pattern calculated as the positional score by GREAT:SCAN:PATTERNS (global position classifier). This ensemble classifier, which is a weighted combination of a set of base classifiers trained on different views, is implemented using a modified version of the AdaBoost algorithm . The underlying rationale is to combine TFBS sequence motif information with gene positioning information to obtain an accurate and robust TFBS prediction model. Computational approaches for TFBS prediction, so far, relied on local sequence information only, in one way or another. With PRECISION, we show that for bacteria, respective gene positioning along the chromosome carries significant information for TFBS prediction. The design and the implementation of GREAT:SCAN:PRECISION boosting algorithm is open to incorporate any suitable algorithm as an additional “view” as long as it provides a scoring function for each genomic feature of interest.
GREAT:SCAN tools focus on detecting periodicities in compact genomes of single cell organisms (as periodicities have been searched only in this kind of organisms so far) and it operates by including information of one chromosome at a time. However, periodicities might appear as prominent genome organisation features in different organisation scales in more complex genomes. We envisage the application of GREAT:SCAN tools in studying intra-chromosomal interactions and arrangements such as complex regulatory regions of higher eukaryotes (plants or mammals).
In this work, we demonstrate the analytical capabilities of GREAT:SCAN:PATTERNS: by conducting a complete analysis of the seven major E. coli regulons, report results of regions of periodic arrangement which are associated with large scale genomic structures such as the organisation in macro-domains  and discuss preliminary results on the use of GREAT:SCAN:PRECISION to formulate and test biological hypotheses.
The features we analyse here include the transcriptionally co-regulated genes (and operons) of the seven TFs of E. coli with the highest number of targets. For the periodicity analysis, all the regulatory network interactions of E. coli were retrieved from RegulonDB  (version 8.6). The target genes and operons of the seven major TFs of E. coli (namely CRP, Lrp, H-NS, Fis, Fnr, ArcA and IHF) were selected. Each predicted interaction from RegulonDB was automatically filtered, by an in-house script, to keep only those which have been identified by at least two “strong” validation experiments or at least three “weak” ones (look figure 4 of  for the classification of each prediction method in RegulonDB as “strong” and “weak”). The start codon coordinate of each gene was taken as the gene’s start site. This information was retrieved from the E. coli EcoCyc “SmartTables” resource . For the novel TFBS prediction each gene regulatory sequences was retrieved from RSAT  and the genomic coordinates from the UCSC microbial genome browser .
Results and discussion
Periodic patterns among E. coli co-regulated genes
For each set of genes co-regulated by the seven most important E. coli TFs a complete GREAT:SCAN:PATTERNS analysis was performed. Here, we present the results of each step from a selected set of genes for demonstrative purposes. The most significant periods of the targets of CRP (the major regulator of E. coli transcription) are illustrated in Fig. 2. The following step allows the visualisation of the clustered genes which, according to a thermodynamic chromosome folding model , suggests that “in-phase” genes may be co-localised and potentially form transcription factories [17, 18, 29]. As the “in-phase” genes appear aligned along the vertical axis in different clusters depicted with different colours (Fig. 3), the clustergram may be interpreted to reflect 3D co-localisation of genes, which can be tested by bench experiments. Figure 3 provides the clustergram of a significant period of Lrp regulated genes. In the final step the system performs a mapping of all the possible significant periods on different regions of the chromosome. An example chromosome mapping plot is depicted in Fig. 4 for the periodic mapping of CRP operons. In Fig. 4, the extremities of the E. coli macrodomains  have been overlaid by the software user, and it appears that the boundaries of periodic regions and those of some macrodomains overlap.
The analysis of all the significant periods in the regulons of the seven major E. coli TFs is summarised in Table 1. The target genes of all regulons appeared to be arranged regularly, as the GREAT:SCAN:PATTERNS analysis has found significant periods for each regulon in the whole genome (corrected p-values lower than the 0.05 threshold). A comparison of the significant periods among all regulons revealed the emergence of a unifying pattern of similarities between periods for four out of the seven regulons. Periods in a very close range from 87–93 kbp were found to be significant for the CRP, H-NS, Fnr and ArcA target genes. This range of period lengths is in agreement with past observations (with much less complete data) in  (∼90 kbp period reported) as well as close to an independent study from  reporting periodicities in the range of 100 kbp.
Interplay between sequence and position with PRECISION
This section builds upon our previous work in  applying PRECISION for the prediction of E. coli TFBS. Those results had indicated both the importance of genome position for the prediction of TFBS of several E. coli TFs, as well as the inter-dependence of position and sequence information for effective boosting learning of TFBS predictions in some other E. coli TFs. Indeed, even when both views are little informative, their optimised combination may be effective (extended discussion in the Fig. 2 and legend at ref. ). Using two different readouts the boosting approach developed in PRECISION was able to take advantage of the balance as well as the inter-dependence of these data in order to improve TFBS prediction in E. coli. This unique multi-view classifier is strong because a) its components (a set of consensus sequence and periods) each fit well to a particular region of the landscape and b) it contains classifiers that are trained to focus on different views of the data. These qualities of the PRECISION boosting algorithm make it suitable to incorporate a diverse set of classifiers with input data from multi-omics studies.
To explore further the interplay between the two views currently used by PRECISION (i.e. sequence and position), two sets of variables were extracted. One set contains the classifier prediction scores, for each gene, calculated during the particular iteration where the position classifier was selected and a second set containing the classifier prediction scores calculated during the iterations when the sequence classifier was selected. At the end of boosting PRECISION constructs a linear combination of all the selected weak classifiers at each iteration to form a strong classifier. Then a per feature multivariate statistical analysis method called canonical correlation analysis (CCA)  was applied on this mixed dataset of the positional and the sequence scores. CCA finds a linear combination of basis vectors for two multidimensional variables (called variates) such that the projections of each variable, called canonical correlations, onto these basis vectors are capturing the maximum correlation between the variables. We used the R package mixOmics -an implementation of multivariate analysis and visualisation tools- to develop numerical and graphical outputs. The results indicate a case of negative correlation between the position and sequence classifiers. The correlation circle plot in Fig. 5 visualises this negative association between the four selected position classifiers and the six sequence ones. These results suggest a balance between the qualities of the local binding sequence (TFBS sequence score) and of the global position (periodicity positional score).
We present a unified computational framework with tools for systematically analysing regular patterns in genomes and for studying their interplay with the regulation of gene expression. We described the first two tools of GREAT:SCAN: a periodicity analysis tool named PATTERNS and a TFBS prediction tool named PRECISION. We also demonstrate and discuss an example application of the GREAT:SCAN tools to the major E. coli regulons, revealing a complex but coherent genome periodic pattern. Some features of this pattern had been reported in numerous previous studies using cruder methods and less complete data [3, 6–8]. Using PRECISION, we demonstrated that insights from the mechanics of a multi-view learning algorithm, able to improve TFBS predictions, can be exploited to formalise and test further biological hypotheses. Moreover, we applied CCA to explore and quantify the interplay of sequence specificity with genome position for the effective binding of TFs. Using this method we uncover for some regulons in E. coli the existence of negative correlations between these two quantities, indicating a potential interplay between sequence quality and the 3D location of the site. Overall, GREAT:SCAN analyses provide novel views on the long-range genome organisation in bacteria, explores its association with genome expression and provide methods to evaluate meaningful biological hypotheses.
Availability and requirements
The software is available to the community as free online tools (Additional file 1) which can be found on the abSYNTH platform af the institute of Systems and Synthetic Biology (iSSB). The software runs as a web application freely for any non-commercial use (i.e. academic, teaching). No installation is required as all computations are performed by the abSYNTH servers (access at: absynth.issb.genopole.fr/Bioinformatics/tools/GREAT). Every user can, after the end of the computations, download a compressed file with all the plots and the tables the program has generated. All input data and results are kept for one week and are available for downloading by the user with the job specific URL that the portal provides (Additional file 2).
Cook PR. A model for all genomes: the role of transcription factories. J Mol Biol. 2010; 395(1):1–10. doi:http://dx.doi.org/10.1016/j.jmb.2009.10.031.
Huynen MA, Snel B. Gene and context: integrative approaches to genome analysis. Adv Protein Chem. 2000; 54:345–79. doi:http://dx.doi.org/10.1016/S0065-3233(00)54010-8.
Képès F. Periodic transcriptional organization of the E.coli genome. J Mol Biol. 2004; 340(5):957–64. doi:http://dx.doi.org/10.1016/j.jmb.2004.05.039.
Képès F. Periodic epi-organization of the yeast genome revealed by the distribution of promoter sites. J Mol Biol. 2003; 329(5):859–65. doi:http://dx.doi.org/10.1016/S0022-2836(03)00535-7.
Bouyioukos C, Elati M, Képès F. Hydrocarbon and Lipid Microbiology Protocols Springer Protocols Handbooks In: McGenity TJ, Timmis KN, Nogales Fernández B, editors. Heidelberg: Humana Press: 2015. p. 1–16, doi:http://dx.doi.org/10.1007/8623_2015_92. http://link.springer.com/protocol/10.1007%8623_2015_92.
Jeong KS, Ahn J, Khodursky AB. Spatial patterns of transcriptional activity in the chromosome of Escherichia coli. Genome Biol. 2004; 5(11):86. doi:http://dx.doi.org/10.1186/gb-2004-5-11-r86.
Junier I, Hérisson J, Képès F. Genomic organization of evolutionarily correlated genes in bacteria: limits and strategies. J Mol Biol. 2012; 419(5):369–86. doi:http://dx.doi.org/10.1016/j.jmb.2012.03.009.
Wright MA, Kharchenko P, Church GM, Segré D. Chromosomal periodicity of evolutionarily conserved gene pairs. Proc Natl Acad Sci U S A. 2007; 104(25):10559–10564. doi:http://dx.doi.org/10.1073/pnas.0610776104.
Dekker J. Gene regulation in the third dimension. Science. 2008; 319(5871):1793–1794. doi:http://dx.doi.org/10.1126/science.1152850.
Spilianakis CG, Lalioti MD, Town T, Lee GR, Flavell RA. Interchromosomal associations between alternatively expressed loci. Nature. 2005; 435(7042):637–45. doi:http://dx.doi.org/10.1038/nature03574.
Papantonis A, Cook PR. Transcription factories: genome organization and gene regulation. Chem Rev. 2013; 113(11):8683–705. doi:http://dx.doi.org/10.1021/cr300513p.
Junier I, Hérisson J, Képès F. Periodic pattern detection in sparse boolean sequences. Algorithm Mol Biol. 2010; 5:31. doi:http://dx.doi.org/10.1186/1748-7188-5-31.
Elati M, Fekih R, Nicolle R, Junier I, Herisson J, Kepes F. Boosting binding sites prediction using gene positions. Lect Notes Comput Sci. 2011:92–103. doi:http://dx.doi.org/10.1007/978-3-642-23038-7_9.
Képès F, Vaillant C. Transcription-based solenoidal model of chromosomes. ComPlexUs. 2003; 1(4):171–80. doi:http://dx.doi.org/10.1159/000082184.
Dorman CJ. Genome architecture and global gene regulation in bacteria: making progress towards a unified model?Nat Rev Microbiol. 2013; 11(5):349–55. doi:http://dx.doi.org/10.1038/nrmicro3007.
Dillon SC, Dorman CJ. Bacterial nucleoid-associated proteins, nucleoid structure and gene expression. Nat Rev Microbiol. 2010; 8(3):185–95. doi:http://dx.doi.org/10.1038/nrmicro2261.
Weng X, Xiao J. Spatial organization of transcription in bacterial cells. Trends Genet. 2014. doi:http://dx.doi.org/10.1016/j.tig.2014.04.008.
Sutherland H, Bickmore WA. Transcription factories: gene expression in unions?Nat Rev Genet. 2009; 10(7):457–66. doi:http://dx.doi.org/10.1038/nrg2592.
Képès F, Jester BC, Lepage T, Rafiei N, Rosu B, Junier I. The layout of a bacterial genome. FEBS Lett. 2012; 586(15):2043–048. doi:http://dx.doi.org/10.1016/j.febslet.2012.03.051.
Ester M, Kriegel H-p, Jörg S, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). Palo Alto: AAAI Press: 1996. p. 226–31.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2015. R Foundation for Statistical Computing. http://www.R-project.org.
Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn. 1999; 37(3):297–336. doi:http://dx.doi.org/10.1023/a:1007614523901.
Valens M, Penaud S, Rossignol M, Cornet F, Boccard F. Macrodomain organization of the Escherichia coli chromosome. EMBO J. 2004; 23(21):4330–341. doi:http://dx.doi.org/10.1038/sj.emboj.7600434.
Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L, García-Sotelo JS, Weiss V, Solano-Lira H, Martínez-Flores I, Medina-Rivera A, Salgado-Osorio G, Alquicira-Hernández S, Alquicira-Hernández K, López-Fuentes A, Porrón-Sotelo L, Huerta AM, Bonavides-Martínez C, Balderas-Martínez YI, Pannier L, Olvera M, Labastida A, Jiménez-Jacinto V, Vega-Alvarado L, Del Moral-Chávez V, Hernández-Alvarez A, Morett E, Collado-Vides J. Regulondb v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013; 41(Database issue):203–13. doi:http://dx.doi.org/10.1093/nar/gks1201.
Karp PD, Weaver D, Paley S, Fulcher C, Kubo A, Kothari A, Krummenacker M, Subhraveti P, Weerasinghe D, Gama-Castro S, Huerta AM, Muñiz-Rascado L, Bonavides-Martinez C, Weiss V, Peralta-Gil M, Santos-Zavaleta A, Schröder I, Mackie A, Gunsalus R, Collado-Vides J, Keseler IM, Paulsen I. The ecocyc database. EcoSal Plus. 2014; 2014. doi:http://dx.doi.org/10.1128/ecosalplus.ESP-0009-2013.
Thomas-Chollier M, Defrance M, Medina-Rivera A, Sand O, Herrmann C, Thieffry D, van Helden J. RSAT 2011: Regulatory sequence analysis tools. Nucleic Acids Res. 2011; 39(Web Server issue):86–91. doi:http://dx.doi.org/10.1093/nar/gkr377.
Riley M, Abe T, Arnaud MB, Berlyn MKB, Blattner FR, Chaudhuri RR, Glasner JD, Horiuchi T, Keseler IM, Kosuge T, Mori H, Perna NT, Plunkett 3rd G, Rudd KE, Serres MH, Thomas GH, Thomson NR, Wishart D, Wanner BL. Escherichia coli k-12: a cooperatively developed annotation snapshot–2005. Nucleic Acids Res. 2006; 34(1):1–9. doi:http://dx.doi.org/10.1093/nar/gkj405.
Junier I, Martin O, Képès F. Spatial and topological organization of dna chains induced by gene co-localization. PLoS Comput Biol. 2010; 6(2):1000678. doi:http://dx.doi.org/10.1371/journal.pcbi.1000678.
Cook PR. Predicting three-dimensional genome structure from transcriptional activity. Nat Genet. 2002; 32(3):347–52. doi:http://dx.doi.org/10.1038/ng1102-347.
Elati M, Nicolle R, Junier I, Fernández D, Fekih R, Font J, Képès F. PreCisIon: PREdiction of CIS-regulatory elements improved by gene’s positION. Nucleic Acids Res. 2013; 41(3):1406–1415. doi:http://dx.doi.org/10.1093/nar/gks1286.
Hotteling H. Relations between two sets of variates. Biometrika. 1936; 28(3-4):321–77. doi:http://dx.doi.org/10.1093/biomet/28.3-4.321.
Lê Cao K-A, González I, Déjean S. integrOmics: an R package to unravel relationships between two omics datasets.Bioinformatics. 2009; 25(21):2855–856. doi:http://dx.doi.org/10.1093/bioinformatics/btp515.
We thank François Bucchini for his help with the web application, Ivan Junier for sharing his preliminary observations on the coincidence of macrodomain and periodic region boundaries, Genopole and the abSYNTH platform for hosting the applications and all the members of MEGA team at iSSB for being avid beta-testers of the tools. This study was supported by the EU FP7 project ST-FLOW.
The authors declare that they have no competing interests.
CB, ME and FK conceived the ideas and tools presented, CB and ME developed the tools, analysed the data and generated results and plots. CB, ME and FK wrote the paper. All authors read and approved the final manuscript.
The publication charges for this article were funded by the Agence Nationale de la Recherche (ANR) grant SYNPATHIC.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 5, 2016: Selected articles from Statistical Methods for Omics Data Integration and Analysis 2014. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-5.
SMODIA2014-S-Bouyioukos-S1.pdf. The full help message of GREAT:SCAN:PATTERNS command line help message. All the available command line options are specified and are mirrored in the online version of the tool. The document provides extended description of each of the command line parameters. (PDF 56.7 kb)
SMODIA2014-S-Bouyioukos-S2.png. A screen capture of the main window of GREAT:SCAN:PATTERNS on the iSSB abSYNTH server with all the available command line parameters as options in the web form and the results of the example data (loaded by clicking the link “Try with example data”). (PNG 280 kb)