A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification
© Marco and Marín; licensee BioMed Central Ltd. 2007
Received: 27 April 2007
Accepted: 15 November 2007
Published: 15 November 2007
Classification procedures are widely used in phylogenetic inference, the analysis of expression profiles, the study of biological networks, etc. Many algorithms have been proposed to establish the similarity between two different classifications of the same elements. However, methods to determine significant coincidences between hierarchical and non-hierarchical partitions are still poorly developed, in spite of the fact that the search for such coincidences is implicit in many analyses of massive data.
We describe a novel strategy to compare a hierarchical and a dichotomic non-hierarchical classification of elements, in order to find clusters in a hierarchical tree in which elements of a given "flat" partition are overrepresented. The key improvement of our strategy respect to previous methods is using permutation analyses of ranked clusters to determine whether regions of the dendrograms present a significant enrichment. We show that this method is more sensitive than previously developed strategies and how it can be applied to several real cases, including microarray and interactome data. Particularly, we use it to compare a hierarchical representation of the yeast mitochondrial interactome and a catalogue of known mitochondrial protein complexes, demonstrating a high level of congruence between those two classifications. We also discuss extensions of this method to other cases which are conceptually related.
Our method is highly sensitive and outperforms previously described strategies. A PERL script that implements it is available at http://www.uv.es/~genomica/treetracker.
To our knowledge, in the whole bioinformatics literature there have been only four recent studies in which it has been attempted to establish general methods to compare hierarchical and non-hierarchical classifications, all of them in the context of microarray data analysis. In two of these studies [15, 16], the method is similar, and very much related to those used for the two simpler cases discussed above and exemplified in Figures 1A and 1B. Starting with a hierarchical classification of expression data, which may be obtained with any conventional method, such as UPGMA, the degree of enrichment for a particular class (i.e. a group derived from a flat classification such as whether a gene product is annotated or not with a particular GO term) for each cluster is estimated by calculating the probability p of finding such enrichment by chance, using either a cumulative hypergeometric distribution, the equivalent Fischer's exact test or a cumulative binomial distribution. Then, the most significant cluster, the one with smallest p value, is determined and all clusters that contain any element in common with it ("parent clusters" and "child clusters", according to whether they contain or are contained in the most significant cluster) are eliminated. The process is repeated until all non-overlapping clusters with small p values are determined. Finally, Bonferroni's correction is used to take into account the effect of multiple tests either considering the number of classes tested  or the number of clusters tested . A third study followed the same strategy, but only up to the calculation of the p values, without further refinement of the results . Finally, a fourth study  followed a totally different strategy, based on establishing a heuristic search for minimization of edge crossings in the bigraph generated by the two classifications.
We became interested in this topic after generating a strategy, implemented in the program UVCLUSTER  that allows the efficient conversion of complex graphs into dendrograms. We recently used this strategy of analysis both on graphs derived from protein-protein interaction data  and on those based on protein domain data . Although we determined that the results obtained in those two works were biologically meaningful, an obvious question to be solved was to establish a standard procedure to determine whether the hierarchical classification obtained was congruent with other classifications (such as GO, division in protein complexes, etc). In this work, we describe a method that follows on the steps of previous studies [15, 16], but improves the characterization of the significant classes by using permutation tests that take into account the topology of the hierarchical classification. The method is applied to several cases and, most especially, to explore a hierarchical representation of the mitochondrial interactome, characterizing the clusters that correspond to known protein complexes.
Our goal is to detect the clusters of a hierarchical tree that contain an overrepresentation of elements belonging to a particular class. A class is defined by a dichotomic flat partition of the elements in such a way that each element in the tree either belongs or not to it. The likelihood of finding a particular level of enrichment by chance must be evaluated and, in this evaluation, we want to consider the topology of the tree. As we describe in detail below, evaluation is based on a quantitative comparison of the observed enrichment value with the enrichment values of a set of simulated results, generated by random permutation of class labels while keeping constant the topology of the tree.
In this formula, m is, as we just said, the total number of elements in the tree; n is the number of elements among those m that are included in a class, according to a defined flat partition; r is the number of common items between the class and the cluster analyzed; finally, k is the cluster size.
The process described for the first cluster is then repeated for all the rest, going from that with the second lowest p value to the one with the maximum p value. That is, in each case, the p value of the cluster that is analyzed is compared with the p values of those clusters found in the same relative position in a new set of simulations (Figure 2) and p' values are determined. A significant point is that, given that each ranked value is compared with the values of the same rank of sets of independent simulations, we avoid the need of a further correction for multiple comparisons (the same idea was applied in a different context in ).
Once all the results are obtained, it is necessary to filter the results to avoid multiple significant correlated clusters. The rule used is that a significant cluster eliminates all the clusters in the tree that contain any element in common with it ("parent" and "child" clusters) with less significant p values. Finally, all significant clusters in which the number of elements belonging to the class that is being considered is lower than 2 are also discarded.
This method has been implemented in a PERL script which is freely available, together with instructions for using it, in our web page . This script may be used for analyses as those shown in the next section, i.e. with up to 500–1000 elements. As an example, the mitochondrial interactome analyses shown in the last section of this article, focused on detecting classes in a tree of 308 elements, required on a standard personal computer from 29 to 188 minutes (average 112 minutes) with 10000 permutations. This range of times is related to how soon significant clusters are found in the particular case examined. We are currently developing a C program called TreeTracker (Arnau, Marco and Marín, in preparation) to be used in cases in which more than 1000 units must be analyzed. Although the program is still not fully optimized, its current version already allows to study large trees in relatively short times. We have performed analyses with a tree of 4860 elements generated from microarray-derived transcriptional data for Saccharomyces cerevisiae genes. Its exploration, again with 10000 permutations, required an average of 199 minutes (range 44 – 456 minutes) on a standard PC. Permutations can be easily divided among multiple computers or processors and therefore the analyses can be speed up using more sophisticated hardware.
Examples of application of this novel strategy
To check for the ability of our new strategy to detect significant similarities between a hierarchical and a flat classification, we have analyzed several real cases. We also tested how our method compares to the already published procedures to determine whether it significantly improves on them.
Comparison of a hierarchical classification based on coexpression data and a flat classification based on GO
Comparison of coexpression data versus protein complexes
It is widely accepted that, at least in yeasts, proteins appearing together in a protein complex are likely to be encoded by genes with a degree of coexpression higher than expected by chance [24, 25]. Thus, as a second example of our strategy, we decided to search in a hierarchical tree based on coexpression data for clusters in which genes encoding proteins of particular complexes were overrepresented. To do so, 34 protein complexes, containing a total of 207 proteins, were arbitrarily selected from SGD (see Methods; for a complete list, see Additional File 1 Table 1). Then, we grouped, using hierarchical clustering based on coexpression data (see again Methods for the details), the 207 corresponding genes. Applying our novel strategy to compare both datasets, we detected significant associations for 15 out of the 34 protein complexes. The coverage was here 24.6 % (51/207; with the coverage of particular complexes ranging from 0 to 75 %; see Additional File 1 Table 1) and the purity was in average 37.0 % (again see Additional File 1 Table 1). These results show that, in this particular dataset, the classification based on coexpression is only partially congruent with the classification based on protein complexes. This relatively low level of congruence and the fact that the complexes were in general of small size (average 6.1 proteins/complex) makes difficult the characterization of significant clusters. However, our results (15/34 = 44% of complexes detected as significant) are again qualitatively better than those provided by the other related strategies, because in this case we failed to detect a single significant cluster following the procedures of Toronen  or Buehler et al. , even if cluster sizes < 5 are considered.
Comparison of two unsupervised clustering processes
Another type of situation in which it is relevant to compare a hierarchical and a non-hierarchical classification is found when two different methods of clustering, one of them hierarchical and the second not, are used with the same data. For example, it is often significant to establish whether a hierarchical clustering result is compatible with a k-means-based flat partition. To see how our strategy performed in this case, we randomly select microarray data for a set of 200 yeast genes (see Methods) and obtained a hierarchical UPGMA tree , and two k-means classifications , in which the results were fitted into 10 or 20 clusters, respectively. In this example, we obtained significant hierarchical clusters for all the either 10 or 20 classes defined by the k-means partitions. However, when 10 classes were used, only 2 of them had corresponding single significant clusters, while the other 8 produced multiple separated significant clusters. On the other hand, when 20 classes were established with k-means, we found that 12 of them corresponded to single significant clusters. The weakest point in the k-means strategy of partition is the need of an a priori definition of the number of classes and our strategy can be useful to establish the optimal value for that number. In our example, it was clear than the division into 20 classes was more similar to the hierarchical classification of the same elements that a division into 10 classes. This type of comparison may be thus used both as to roughly establish the best number of clusters in which to divide a group of elements by k-means analyses and also to determine which ones of those clusters are supported by both hierarchical and k-means classifications.
The structure of the yeast mitochondrial interactome
In this case, the coverage was 75.5% and the purity of the clusters was 88.3%. Therefore, the correlation between the hierarchical and the flat partition was very high, demonstrating that significant portions of the yeast interactome can be meaningfully characterized using unsupervised, fully automated UVCLUSTER hierarchical clustering.
The comparison of the hierarchical, global, structure of the protein-protein interaction graph with the known protein complexes provides two different levels of novel knowledge. First, we can visualize the relationships between elements inside a protein complex. Second, we can study the global hierarchical relationships among protein complexes. Several well-known relationships among complexes are observed in Figure 4, being the most obvious the close proximity between the subunits of the large and small ribosomal subunits. Similarly, we can see the close relationships among different complexes involved in protein translocation from the cytosol to the mitochondria, such as the TOM complex, the Tim10 and Tim22 complexes and the presequence translocase-associated import motor (see ). The analysis of the congruence of the two types of data may also provide novel information about proteins of unknown function. For instance, the cluster that contains known members of the TOM complex (Tom20, Tom22 and Tom40; ) contains four additional proteins: Pet8, Mdm10, Tim17 and the unknown protein YHR003c. Our analysis suggests that a functional relationship among these proteins and the TOM complex is likely and in fact data exist confirming this relationship for two of the proteins: Mdm10 is involved in the assembly of the TOM complex  and Tim17 has been recently described as a mediator between the TOM complex and the presequence translocase import motor  (both closely related in our tree, see Figure 4). Pet8 is known to be involved in the transport of S-adenosylmethionine (SAM) into the mitochondria, although the molecular mechanism is still unknown . Our results suggest a possible involvement of the TOM machinery in SAM transport, with Pet8 linking both processes. Finally, YHR003c has been located to the mitochondrial outer membrane in a massive screen , which is congruent with an interaction with the TOM complex. Interestingly, YHR003c has a high similarity with the ThiF family of proteins, which in eukaryotes comprise a large family of ubiquitin-activating enzymes  suggesting a possible control of these aspects of the mitochondrial protein transport by the ubiquitin-proteasome system.
Of course, congruence is not perfect. For example, other known members of the TOM complex, such as Tom6, Tom7 and Tom70 are outside the corresponding significant cluster. However, inspection of our datasets demonstrated that this is due to the fact that protein-protein interaction data for these proteins and the rest of the TOM complex is lacking in the DIP database, and therefore we can attribute this problem to incompleteness of the data used to generate the hierarchical tree. In any case, all these results demonstrate that our strategy of analysis is useful to detect relevant correlations between a hierarchical and a non-hierarchical partition of the same data and that UVCLUSTER can be used to extract significant portions of a complex graph in order to determine its hierarchical structure, confirming our previous findings [19, 20].
In this paper, we describe a new strategy that allows establishing the clusters of a hierarchical tree that are congruent with a non-hierarchical classification of elements. The main novelty of our strategy is the use of permutation analyses, which generate a distribution of ranked probability values, to check for significantly enriched clusters. The method outlined here has the main advantage of being more sensitive than similar, previously published methods [15, 16] due to taking into consideration the topology of the tree to evaluate the likelihood of obtaining by chance each level of overrepresentation. The advantage of our method is especially evident in cases in which many classes are analyzed, due to the fact that Bonferroni's correction, used by those authors, is then too conservative.
There are many papers already published using permutation methods to establish the significance levels of enrichment for the particular cases shown in Figures 1A (e.g. [3, 4] and many others) and Figure 1B (e.g. [10, 11]). It is relevant to point out here that our method reduces to those other methods provided that the hierarchy has the particular structure shown in those figures. This is obvious for the case depicted in Figure 1A, in which only one test is performed. However, it is also true for the more complex case in Figure 1B. It can be analyzed with our strategy and will provide exactly the same results that those already established by other methods, in spite of the fact that here we use a ranking for the significant clusters. The reason is that, if, and only if, the hierarchical structure is as shown in Figure 1B, the maximally significant cluster (i.e. the one with the smallest p) eliminates all the other possible clusters, because all the rest are "parents" or "children" of it. Our study may thus be considered the logical conclusion of a line of research that has been developed in the last years without actually being studied in the right, general context. It is the first one in which it is shown that those two are just particular cases of the general situation in which a hierarchical and a dichotomic flat partition are compared, and a general solution for such comparison is offered.
As we have indicated in the introduction of this work, all published methods that we know of in which hierarchical and non-hierarchical classifications are compared are restricted to the field of microarray data analysis. We have shown above an example of its use in a different context, namely protein-protein interaction data, and of course this method may be used in many other different contexts to undertake biological or non-biological problems. In particular, the combination of UVCLUSTER, which establishes the hierarchical structure that more faithfully reflects a graph, and this strategy, which may be used for establishing the meaning of that hierarchical structure, allows for the analyses of complex graphs in ways that had not been hitherto possible.
Here we present a new strategy for the comparison of a hierarchical and a non-hierarchical classification of elements. This strategy can be applied to very different situations, among them the particular cases of comparing flat classifications (e.g. functional information) with two lists of genes (e.g. Experimental vs. Control), or with an ordered list of genes. The main improvement is that our strategy considers the topology of the tree during the calculation of significance levels. The use of permutation analysis and the rank-based comparisons of p values allows the method to be highly sensitive.
Expression data for the comparison of coexpression analysis vs. GO were extracted from . The ten GO classes examined were "Response to heat", "Response to cold", "Response to DNA damage stimulus", "Response to oxidative stress", "Response to water deprivation", "Response to osmotic stress", "Response to unfolded protein", "Response to starvation", "Response to hydrostatic pressure" and "Regulation of transcription in response to stress". All them are included in the general class "Response to stress" that was used as criterion to select the 454 genes that were clustered. GO information for this and the rest of analyses shown in this work was obtained from SGD . Hierarchical clustering of expression data was performed using euclidean distances and the UPGMA method, with MeV 4.0  using its default parameters . For the comparison of coexpression analysis and protein complexes the data were extracted from , a dataset that includes transcriptional information for experiments involving 300 different mutations or treatments in Saccharomyces cerevisiae. Expression data used in the comparison of k-means vs. UPGMA classifications were all extracted from the webpage of Michael Eisen's laboratory . Both hierarchical and k-means clustering of microarray data were performed again with MeV 4.0 with default parameters.
Protein complex information
Protein complex information were also extracted from SGD. In the comparison involving coexpression data, we randomly selected 34 protein complexes that included a total of 207 proteins. In the analysis of the yeast mitochondrial interactome, we considered a total of 16 protein complexes annotated in SGD as mitochondrial.
Generation of the hierarchical dendrogram for the yeast mitochondrial interactome
A protein interaction network of the Saccharomyces cerevisiae mitochondrial interactome was built by extracting the 438 proteins annotated as mitochondrial in SGD and obtaining all the protein-protein interactions among them reported in the DIP database . Protein interaction information was available for 308 proteins. The resulting interaction network was analyzed with UVCLUSTER  (parameter AC = 100; 10000 iterations) to obtain a hierarchical representation of the graph.
Applications of our strategy
For all analyses described in that work, we performed 10000 simulations in order to establish the expected distributions and the critical values for p' < 0.01.
Our group is supported by Grant SAF2006-08977 (Ministerio de Educación y Ciencia [MEC], Spain). A.M. was the recipient of a predoctoral fellowship from the MEC.
- Grabmeier J, Rudolph A: Techniques of cluster algorithms in data mining. Data Min Knowl Disc 2002, 6: 303–360. 10.1023/A:1016308404627View ArticleGoogle Scholar
- Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA: Global functional profiling of gene expression. Genomics 2003, 81: 98–104. 10.1016/S0888-7543(02)00021-6View ArticlePubMedGoogle Scholar
- Hosack DA, Dennis G Jr, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE. Genome Biol 2003, 4: R70. 10.1186/gb-2003-4-10-r70PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Shahrour F, Díaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004, 20(4):578–580. 10.1093/bioinformatics/btg455View ArticlePubMedGoogle Scholar
- Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G: GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 2004, 20: 3710–3715. 10.1093/bioinformatics/bth456PubMed CentralView ArticlePubMedGoogle Scholar
- Zeeberg BR, Qin H, Narasimhan S, Sunshine M, Cao H, Kane DW, Reimers M, Stephens RM, Bryant D, Burt SK, Elnekave E, Hari DM, Wynn TA, Cunningham-Rundles C, Stewart DM, Nelson D, Weinstein JN: High-Throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics 2005, 6: 168. 10.1186/1471-2105-6-168PubMed CentralView ArticlePubMedGoogle Scholar
- Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with FuncAssociate. Bioinformatics 2003, 19: 2502–2504. 10.1093/bioinformatics/btg363View ArticlePubMedGoogle Scholar
- Breslin T, Edén P, Krogh M: Comparing functional annotation analysis with Catmap. BMC Bioinformatics 2004, 5: 193. 10.1186/1471-2105-5-193PubMed CentralView ArticlePubMedGoogle Scholar
- Breitling R, Amtmann A, Herzyk P, Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments. BMC Bioinformatics 2004, 5: 34. 10.1186/1471-2105-5-34PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Shahrour F, Díaz-Uriarte R, Dopazo J: Discovery molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics 2005, 21: 2988–2993. 10.1093/bioinformatics/bti457View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Khatri P, Dragici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005, 21: 3587–3595. 10.1093/bioinformatics/bti565PubMed CentralView ArticlePubMedGoogle Scholar
- Dopazo J: Functional interpretation of microarray experients. OMICS 2006, 10: 398–410. 10.1089/omi.2006.10.398View ArticlePubMedGoogle Scholar
- Rivals I, Personnaz L, Taing L, Potier MC: Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics 23: 401–407. 10.1093/bioinformatics/btl633Google Scholar
- Toronen P: Selection of informative clusters from hierarchical cluster tree with gene classes. BMC Bioinformatics 2004, 5: 32. 10.1186/1471-2105-5-32PubMed CentralView ArticlePubMedGoogle Scholar
- Buehler EC, Sachs JR, Shao K, Bagchi A, Ungar LH: The CRASSS plug-in for integrating annotation data with hierarchical clustering results. Bioinformatics 2004, 20: 3266–3269. 10.1093/bioinformatics/bth362View ArticlePubMedGoogle Scholar
- Pasquier C, Girardot F, Jevardat de Fombelle K, Christen R: THEA: ontology-driven analysis of microarray data. Bioinformatics 2004, 20: 2636–2643. 10.1093/bioinformatics/bth295View ArticlePubMedGoogle Scholar
- Torrente A, Kapushesky M, Brazma A: A new algorithm for comparing and visualizing relationships between hierarchical and flat expression data clusterings. Bioinformatics 2005, 21: 3993–3999. 10.1093/bioinformatics/bti644View ArticlePubMedGoogle Scholar
- Arnau V, Mars S, Marín I: Iterative cluster analysis of protein interaction data. Bioinformatics 2005, 21: 364–378. 10.1093/bioinformatics/bti021View ArticlePubMedGoogle Scholar
- Lucas JI, Arnau V, Marín I: Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism. J Mol Biol 2006, 357: 9–17. 10.1016/j.jmb.2005.12.068View ArticlePubMedGoogle Scholar
- Arnau V, Gallach M, Lucas JI, Marín I: UVPAR: fast detection of functional shifts in duplicate genes. BMC Bioinformatics 2006, 7: 174. 10.1186/1471-2105-7-174PubMed CentralView ArticlePubMedGoogle Scholar
- Web page containing a PERL script implementing the strategy developed in this article[http://www.uv.es/~genomica/treetracker]
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 11: 4241–4257.Google Scholar
- Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res 2002, 12: 37–46. 10.1101/gr.205602PubMed CentralView ArticlePubMedGoogle Scholar
- Kemmeren P, van Berkum NL, Vilo J, Bijma T, Donders R, Brazma A, Holstege FC: Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell 2002, 9: 1133–1143. 10.1016/S1097-2765(02)00531-2View ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95: 14863–14868. 10.1073/pnas.95.25.14863PubMed CentralView ArticlePubMedGoogle Scholar
- MacQueen J: Some methods for classification and analysis of multivariate observation. Proc 5th Berk Symp 1967, 1: 281–297.Google Scholar
- Yook SH, Oltvai ZN, Barabasi AL: Functional and topological characterization of protein interaction networks. Proteomics 2004, 4: 928–942. 10.1002/pmic.200300636View ArticlePubMedGoogle Scholar
- Pfanner N, Geissler A: Versatility of the mitochondrial protein import machinery. Nat Rev Mol Cell Biol 2001, 2: 339–349. 10.1038/35073006View ArticlePubMedGoogle Scholar
- Meisinger C, Rissler M, Chacinska A, Szklarz LK, Milenkovic D, Kozjak V, Schonfisch B, Lohaus C, Meyer HE, Yaffe MP, Guiard B, Wiedemann N, Pfanner N: The mitochondrial morphology protein Mdm10 functions in assembly of the preprotein translocase of the outer membrane. Dev Cell 2004, 7: 61–71. 10.1016/j.devcel.2004.06.003View ArticlePubMedGoogle Scholar
- Chacinska A, Lind M, Frazier AE, Dudek J, Meisinger C, Geissler A, Sickmann A, Meyer HE, Truscott KN, Guiard B, Pfanner N, Rehling P: Mitochondrial presequence translocase: switching between TOM tethering and motor recruitment involves Tim21 and Tim17. Cell 2005, 120: 817–829. 10.1016/j.cell.2005.01.011View ArticlePubMedGoogle Scholar
- Marobbio CM, Agrimi G, Larorsa FM, Palmieri F: Identification and functional reconstitution of yeast mitochondrial carrier for S-adenosylmethionine. EMBO J 2003, 22: 5975–5982. 10.1093/emboj/cdg574PubMed CentralView ArticlePubMedGoogle Scholar
- Zahedi RP, Sickmann A, Boehm AM, Winkler C, Zufall N, Schonfisch B, Guiard B, Pfanner N, Meisinger C: Proteomic analysis of the yeast mitochondrial outer membrane reveals accumulation of a subclass of preproteins. Mol Biol Cell 2006, 17: 1436–1450. 10.1091/mbc.E05-08-0740PubMed CentralView ArticlePubMedGoogle Scholar
- Xi J, Ge Y, Kinsland C, McLafferty FW, Begley TP: Biosynthesis of the thiazole moiety of thiamin in Escherichia coli: identification of an acyldisulfide-linked protein – protein conjugate that is functionally analogous to the ubiquitin/E1 complex. Proc Natl Acad Sci USA 2001, 98: 8513–8518. 10.1073/pnas.141226698PubMed CentralView ArticlePubMedGoogle Scholar
- Saccharomyces Genome Database[http://www.yeastgenome.org]
- MeV 4.0[http://www.tm4.org]
- Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003, 34: 374–378.PubMedGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell 2000, 102: 109–126. 10.1016/S0092-8674(00)00015-5View ArticlePubMedGoogle Scholar
- Public microarray expression data at Michael Eisen laboratory[http://rana.lbl.gov/data/yeast/yeastall_public.txt.gz]
- Database of Interacting Proteins[http://dip.doe-mbi.ucla.edu]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.