- Open Access
MiCoViTo: a tool for gene-centric comparison and visualization of yeast transcriptome states
BMC Bioinformatics volume 5, Article number: 20 (2004)
Information obtained by DNA microarray technology gives a rough snapshot of the transcriptome state, i.e., the expression level of all the genes expressed in a cell population at any given time. One of the challenging questions raised by the tremendous amount of microarray data is to identify groups of co-regulated genes and to understand their role in cell functions.
MiCoViTo (Microarray Comparison Visualization Tool) is a set of biologists' tools for exploring, comparing and visualizing changes in the yeast transcriptome by a gene-centric approach. A relational database includes data linked to genome expression and graphical output makes it easy to visualize clusters of co-expressed genes in the context of available biological information. To this aim, upload of personal data is possible and microarray data from fifty publications dedicated to S. cerevisiae are provided on-line. A web interface guides the biologist during the usage of this tool and is freely accessible at http://www.transcriptome.ens.fr/micovito/.
MiCoViTo offers an easy-to-read picture of local transcriptional changes connected to current biological knowledge. This should help biologists to mine yeast microarray data and better understand the underlying biology. We plan to add functional annotations from other organisms. That would allow inter-species comparison of transcriptomes via orthology tables.
Although the genome is mostly invariant in each cell of an organism, genes can have different expression patterns related to environmental conditions or developmental programs. For a given genome, different transcriptome states can be observed, depending on complex networks regulating adaptation and homeostasis. Fundamental questions about the topology of networks as protein interactome , metabolome  or transcriptional regulation networks [3, 4] have been previously addressed. Interfaces like KEGG , Biocyc  or GenMapp  help biologists to put these results into a cellular context by mapping expression states onto metabolic network representations. Moreover, to visualize and edit networks, different tools like Cytoscape  or Osprey  are available. To understand the biology of the studied systems better, the trend is clearly towards the aggregation of multiple sources of biological information.
This article focuses on the analysis of transcriptome states. Numerous microarray gene expression datasets are now available, giving the opportunity to get a picture of the transcriptome state in various cellular conditions. Various methods to mine compendia of transcriptome states and to try to understand their biological meaning already exist. One of the most successful is based on the assumption that genes having similar expression profiles across a set of conditions are likely to be involved in the same biological process . Many approaches, including multivariate analysis, hierarchical clustering and SOM have been used to find patterns of expression and to create biologically relevant clusters (see  for review). By analysing whole microarray datasets, these "global" clustering approaches have already led to the formulation of interesting testable predictions . However, it has been previously emphasized that classical clustering methods could be inefficient on large number of biologically unrelated datasets . Indeed, in response to environmental changes, gene expression is modified only in a fraction of the transcriptome and the signal is hence diluted over the whole dataset. Few methods have been proposed to find these regulatory sub-signatures [13, 14], and in fine, the most reliable way to go deeper into the data to capture interesting trends is to be an expert in the field. That is why tools allowing the biologist to mine microarray results, in order to find such expression modifications in a sub-set of genes related to his area of expertise, are highly desirable. Such "gene-centric" clustering analysis that distinguishes differentially-expressed genes in specific parts of the transcriptome (centred around a "seed gene") should overcome some drawbacks of global analysis approaches.
The ideal tool would be gene-centric, and would provide understandable outputs mapping biological knowledge onto results. We present here MiCoViTo, a user-friendly tool for identifying and visualizing groups of genes having similar expression in two sets of microarray experiments representing two distinct transcriptome states.
A given transcriptome state can be represented as a network where genes are joined pairwise by a weighted link proportional to a similarity measure between their corresponding expression profiles. The basic idea is therefore to compare the immediate transcriptome neighbourhood of a given gene (referred to hereafter as the seed gene) in two sets of microarray experiments describing two distinct transcriptome states. By neighbourhood, we mean genes whose expression profiles are closely related to that of the seed according to a chosen metric. By relaxing step by step the distance criterion, several neighbourhood levels, corresponding to larger and larger parts of the transcriptome state, can be defined.
Thus, for each set of microarray experiments to be compared, each gene is assigned to the neighbourhood level reflecting the distance between its expression profile and the expression profile of the seed gene. In a second step, all neighbourhood level intersections are computed and arranged in a table (Figure 1, A). In this table, the upper left groups are genes having similar expression in both experiments compared to the seed profile while the upper right and the lower left groups include genes that have very close expression profiles in only one set of experiments. Only the upper left part of the table is of interest, as we do not want to consider genes that are co-expressed with the seed gene in none of the two transcriptome states. Therefore, the lower right part of the table is not analysed and is coloured in grey (Figure 1, A).
Note that MiCoViTo has been designed so that different types of comparisons can be performed, according to the given biological question. Neighbourhood comparison of one seed in two transcriptome states is described above (Figure 1). But, using the same principle, more elaborate comparisons can be performed like neighbourhood comparison of two distinct seeds in the same transcriptome state (Figure 2, A).
Visualization and mining of isolated clusters
In order to capture the biological meaning of gene clusters generated by MiCoViTo, it is necessary as a first step to visualize them in the context of current biological knowledge. Latter options for studying promising clusters are to map list of genes onto the metabolic pathways, to look for co-regulation in another expression dataset or to look for common cis-regulatory elements in promoters. Using a relational database system storing yeast MIPS functional information and external links, MiCoViTo provides access to all this information.
For each neighbourhood intersection constructed as detailed above, a pie chart representing the gene distribution in one of the MIPS catalogues  (functional classification, protein class, EC number, PROSITE motifs, mutant phenotype and complexes catalogue) is displayed (Figure 1, B). When gene listing for one cluster is requested, additional information is available including direct links to the individual gene description pages of the SGD  and MIPS . Furthermore, full gene lists can be posted directly to other online tools like KEGG for metabolism mapping , RSA tools for the discovery cis-regulatory motifs , SGD for Gene Ontology term mapping  or yMGV database for expression mining .
Microarray data standards are now available  and public repositories have been constructed [20, 21]. The community effort will ensure that more and more clean and formatted datasets become available for this kind of study. Here we use data from yMGV , the largest available yeast expression database.
More than fifty previously-published microarray datasets are provided, giving a fair coverage of possible yeast transcriptome states (a listing is available in the web site). But users can also upload files onto the site to confront their results with published data or use MiCoViTo to compare two personal datasets.
MiCoViTo is composed of three parts: a set of programs for microarray data preprocessing and for comparing expression profiles, a web-interface and a relational database. All the softwares used to power MiCoViTo are freely distributed under an open source licence. Data pre-processing options such as pre-scaling or pre-centering expression profiles have been written in PERL while distance computation routines (Pearson distance, Squared Pearson distance and Euclidean distance) have been written in C in order to minimize computation time. The interface has been written in PHP. MIPS and SGD information is stored in a PostgreSQL relational database. Microarray data are stored in flat files. All graphical outputs are dynamically generated using the JpGraph PHP library .
As an illustration, we used MiCoViTo to compare genes co-expressed during the cell cycle (Cho el al. dataset ) for two pairs of cyclin: CLN1/CLN2 and CLN1/CLB2 (Figure 2, B1). Results depend mainly on the proximity of seed expressions. Indeed, two genes close one to another like CLN1 and CLN2 (both implicated in the G1 phase), lead to the identification of numerous correlated genes also implicated in the G1 phase (Figure 2, B2) like PCL1 , MCD1  or RNR1 . But if CLN1 is compared to a gene implicated in another phase of the cycle like CLB2 (G2/M transition), no correlation at all is observed (Figure 2, B3). Moreover, the upper right and the lower left groups show genes implicated in the G2/M transition and the G1 phase like CLB2 and CLN1 respectively.
MiCoViTo aims to help biologists achieve an intuitive understanding of their own data using an approach based on the comparison of sets of microarray experiments. The user-friendly web interface is designed to be accessible to those with no particular technical skill.
The major drawback of visualization tools based on biologists' knowledge and intuition is that it does not give a computer readable output and cannot be applied systematically to find interesting expression patterns. Particularly, seed choice is a crucial point of our approach because it precisely defines which part of the transcriptome to focus on. The gene-centric approach presented in this paper empowers biologists to focus on a particular sub-part of the transcriptome. In order to assist seed selection process, online lists of candidates for each set of microarray experiments are proposed. Those candidates are ranked according to their neighbourhood density (density is the number of genes co-expressed with seed according to a given expression distance threshold). Genes with a large neighbourhood density are those whose nearest neighbourhoods contain a lot of genes, which means that their expression profiles are similar to the expression of many other genes. Such information may be used as a starting point when no prior information about the transcriptome state topology is available.
As initiated in a recent paper , one possible future direction is to use this approach to compare transcriptomes from different organisms captured in the same state. This will be possible if we can find functional annotation in the same format for two organisms and if we are able to define pairs of seeds having inter-organism correspondence. An annotation project like GO  addresses the first question. To define relevant inter-organism seeds, two ways appear realistic. The first one is to assume that sequence-based conservation implies function conservation. This is not always the case, but one can expect MiCoViTo results to be incoherent if seeds are not functionally related. Alternatively, one can start from pairs of genes previously described as orthologous. S. pombe could provide a good benchmark system to test this approach, since GO annotations are available and microarray datasets describing states previously studied in S. cerevisiae are publicly available. Moreover well-characterised S. cerevisiae/S. Pombe orthologous genes have been listed by the Sanger Center (Valerie Wood, personnal communication).
MiCoViTo allows users to compare and visualize different transcriptome states in a gene-centric way. The generated clusters of genes are mapped onto existing biological knowledge to gain a higher level view of the ongoing transcriptional changes. Upload of personal data onto the site is possible but not mandatory since a compendium of more than fifty yeast microarray datasets is available.
At present, this tool is restricted to S. cerevisiae, but a natural future direction will be to incorporate data originating from different species as well as orthology tables to allow comparison of seeds from different organisms.
Availability and requirements
MiCoViTo is available at http://www.transcriptome.ens.fr/micovito/. The source code and database scheme are freely distributed to academic users upon request to the authors.
A more detailed description of MiCoViTo including a step by step tutorial can be found online.
Microarray Comparison Visualization Tool
Munich Information center for Protein Sequences
- RSA tools:
Regulatory Sequence Analysis Tools
Saccharomyces Genome Database
yeast Microarray Global Viewer.
Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N, Li G, Chen R: Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res 2003, 31: 2443–2450. 10.1093/nar/gkg340
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407: 651–654. 10.1038/35036627
Guelzim N, Bottani S, Bourgine P, Kepes F: Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 2002, 31: 60–63. 10.1038/ng873
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298: 799–804. 10.1126/science.1075090
Kanehisa M, Goto S, Kawashima S, Nakaya A: The KEGG databases at GenomeNet. Nucleic Acids Res 2002, 30: 42–46. 10.1093/nar/30.1.42
Karp PD, Riley M, Paley SM, Pellegrini-Toole A: The MetaCyc Database. Nucleic Acids Res 2002, 30: 59–61. 10.1093/nar/30.1.59
Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet 2002, 31: 19–20. 10.1038/ng0502-19
Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002, 18 Suppl 1: S233–40.
Breitkreutz BJ, Stark C, Tyers M: Osprey: a network visualization system. Genome Biol 2003, 4: R22. 10.1186/gb-2003-4-3-r22
Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863
Quackenbush J: Computational analysis of microarray data. Nat Rev Genet 2001, 2: 418–427. 10.1038/35076576
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22: 281–285. 10.1038/10343
Gasch AP, Eisen MB: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol 2002, 3: RESEARCH0059. 10.1186/gb-2002-3-11-research0059
Cheng Y, Church GM: Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 2000, 8: 93–103.
Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res 2002, 30: 31–34. 10.1093/nar/30.1.31
Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, Fisk DG, Issel-Tarver L, Schroeder M, Sherlock G, Sethuraman A, Weng S, Botstein D, Cherry JM: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res 2002, 30: 69–72. 10.1093/nar/30.1.69
van Helden J: Regulatory sequence analysis tools. Nucleic Acids Res 2003, 31: 3593–3596. 10.1093/nar/gkg567
Le Crom S, Devaux F, Jacq C, Marc P: yMGV: helping biologists with yeast microarray data mining. Nucleic Acids Res 2002, 30: 76–79. 10.1093/nar/30.1.76
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29: 365–371. 10.1038/ng1201-365
Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30: 207–210. 10.1093/nar/30.1.207
Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone SA: ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 2003, 31: 68–71. 10.1093/nar/gkg091
JpGraph - An 00 Graph library for PHP4[http://www.aditus.nu/jpgraph]
Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 1998, 2: 65–73.
Ogas J, Andrews BJ, Herskowitz I: Transcriptional activation of CLN1, CLN2, and a putative new G1 cyclin (HCS26) by SWI4, a positive regulator of G1-specific transcription. Cell 1991, 66: 1015–1026.
Guacci V, Koshland D, Strunnikov A: A direct link between sister chromatid cohesion and chromosome condensation revealed through the analysis of MCD1 in S. cerevisiae. Cell 1997, 91: 47–57. 10.1016/S0092-8674(01)80008-8
Elledge SJ, Davis RW: Two genes differentially regulated in the cell cycle and by DNA-damaging agents encode alternative regulatory subunits of ribonucleotide reductase. Genes Dev 1990, 4: 740–751.
Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302: 249–255. 10.1126/science.1087447
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556
Gasch AP, Werner-Washburne M: The genomics of yeast responses to environmental stress and starvation. Funct Integr Genomics 2002, 2: 181–192. 10.1007/s10142-002-0058-2
The authors want to thanks Stéphane Le Crom, Frédéric Devaux and Serge Hazout for helpful discussions and Boris Barbour for English correction. The MiCoViTo project was funded by the Programme Bioinformatique Inter-EPST-CNRS 2003, GL is supported by a MENRT, PM is supported by the French Therapeutical Research Association (AFRT) and the PhRMA foundation Center of Excellence in Integration of Genomics and Informatics (CEIGI), SV is supported by Hoechst Marion Roussel – Aventis grant number FRHMR2/9908.
GL conceived, implemented MiCoViTo and drafted the manuscript, PMsuggested the first version of the visualization interface, contributed todiscussions and drafted the manuscript, PV and CJ are GL's Ph.D. advisors, they both contributed to discussions, SV provided help with algorithms and coordinated the project. All authors read and approved the final manuscript.