CLOE: Identification of putative functional relationships among genes by comparison of expression profiles between two species
© Pellegrino et al; licensee BioMed Central Ltd. 2004
Received: 11 August 2004
Accepted: 19 November 2004
Published: 19 November 2004
Public repositories of microarray data contain an incredible amount of information that is potentially relevant to explore functional relationships among genes by meta-analysis of expression profiles. However, the widespread use of this resource by the scientific community is at the moment limited by the limited availability of effective tools of analysis. We here describe CLOE, a simple cDNA microarray data mining strategy based on meta-analysis of datasets from pairs of species. The method consists in ranking EST probes in the datasets of the two species according to the similarity of their expression profiles with that of two EST probes from orthologous genes, and extracting orthologous EST pairs from a given top interval of the ranked lists. The Gene Ontology annotation of the obtained candidate partners is then analyzed for keywords overrepresentation.
We demonstrate the capabilities of the approach by testing its predictive power on three proteomically-defined mammalian protein complexes, in comparison with single and multiple species meta-analysis approaches. Our results show that CLOE can find candidate partners for a greater number of genes, if compared to multiple species co-expression analysis, but retains a comparable specificity even when applied to species as close as mouse and human. On the other hand, it is much more specific than single organisms co-expression analysis, strongly reducing the number of potential candidate partners for a given gene of interest.
CLOE represents a simple and effective data mining approach that can be easily used for meta-analysis of cDNA microarray experiments characterized by very heterogeneous coverage. Importantly, it produces for genes of interest an average number of high confidence putative partners that is in the range of standard experimental validation techniques.
The availability of genome sequences from several model organisms, including humans, and of high-throughput technologies to study gene function is dramatically changing the approach to biological problems. In particular, the consolidated reductionist gene-by-gene strategy is being replaced by a 'modular approach', in which several genes are studied simultaneously to gather a more comprehensive picture of the many different cellular processes : in living organisms, the majority of gene products are part of intricate molecular circuits, composed of physical, functional and regulatory interactions. In higher eukaryotes, the study of gene function is further complicated by the alternative use of transcriptional units, frequently resulting in the production of proteins with different or even antagonistic activities from the same genes [2, 3].
It is well recognized that one of the most important and widespread mechanisms used by cells to regulate functional modules is the coordinate transcriptional and/or post-transcriptional modulation of mRNA levels of the interacting genes. Therefore, DNA microarrays represent a fundamental tool to unravel biological complexity on a genome-wide scale. Information concerning the expression of thousands of genes, and also of different transcripts from the same gene, can be obtained in a single experiment, and the relationships among gene expression patterns can be studied systematically . The extensive use of this technology by hundreds laboratories has resulted in the production of an enormous amount of data, many of which have been deposited in public databases [5, 6].
Besides being useful to other researchers to confirm the published results, the deposited datasets can be used as a substrate for new analysis, aimed at discovering functional modules by searching for related expression profiles. Recent studies have shown that, if the expression of two or more genes is consistently related throughout many independent microarray datasets, the genes display a significant degree of functional similarity [7, 8]. However, if this approach were applied to predict physical and functional relationships, a very high number of false positives would still be expected. A first method that can be used to reduce the number of false positives is to consider only co-expression links that are consistent among many different experimental datasets . Nevertheless, even when the co-expression of two genes is reproducibly observed under a certain number of experimental conditions, this does not imply necessarily that they are functionally related. For instance, extensive meta analysis of microarray data across different species has revealed that neighboring genes are likely to be co-expressed, even though they are not functionally related in any obvious manner [9, 10].
Phylogenetic conservation has been recently proposed as a very strong criterion to identify functionally relevant co-expression links among genes . Significant co-expression of two or more orthologous genes across many species is very likely due to selective advantages, strongly suggesting a functional relation. In fact, the comparison of data across species as distant as Homo sapiens, Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans was very effective in identifying new genes involved in core biological functions . Although extremely specific, this multi-species approach would be unable to identify the relationships among genes involved in more specialized biological processes.
Since regulatory regions diverge much more rapidly than coding sequences [12, 13], a similar approach would be predicted to succeed even when comparing expression patterns in more closely related species, such as mice and humans. In this case, the possible loss of specificity would be strongly compensated by the increased sensitivity in the identification of functional links related to mammalian-specific gene modules. This possibility has not been so far explored.
Additionally, when using microarray data to establish significant correlations among gene expression profiles, almost invariably the information obtained with probes covering different gene portions is averaged . Though useful in many cases to reduce the experimental noise, this procedure could result in a significant loss of information in the case of genes expressing different isoforms with distinct expression patterns : on one hand the isoform-specific expression profiles would not be detected; on the other hand, the average expression profile would be artificial and non-informative.
In this study we describe CLOE (C oexpression-based L inking of O rthologous E STs) a new data mining method for the identification of transcripts showing evolutionary conserved co-expression in cDNA microarray datasets. This approach is based on the pairwise comparison of data from two species. The predictive capability of the method was proved by comparing human with mouse data. Our results show that CLOE is a valuable tool for biologists that can be used to identify putative partners for genes of interest and/or to predict some of their functional properties.
Results and discussion
The top percentiles of expression similarity ranked lists obtained with human-mouse orthologous ESTs pairs are strongly enriched of orthologous ESTs
The aim of our method is to use the available microarray expression data to identify high- confidence putative partners for genes of interest.
The basic assumption is that, if two or more genes are part of a functional module, conserved between two species, they will be likely co-expressed in both species. In contrast, if the co-expression of two genes in one species has no functional meaning, it should not be conserved in the other.
Although in principle our method could be applied to every pair of organisms for which cDNA microarray data are available, we decided to compare the human and mouse datasets contained in the Stanford Microarray Database (SMD)  (2803 experiments for 74588 EST probes and 145 experiments for 37521 ESTs, respectively, data downloaded in Jan. 2004). The first reason for doing so is that this comparison is particularly relevant in the perspective of identifying mammalian-specific gene modules. The second is that, considering the widely different number of experiments and the relatively short phylogenetic distance between the two species, this represents a particularly severe test.
As a first proof of the method's effectiveness, and in order to empirically determine a reasonable default cutoff for obtaining the final list of candidates, we analyzed whether genes with the highest ranks in the single organism lists are actually enriched of orthologous sequences. To this aim, we randomly chose 100 orthologous gene pairs represented in both the human and mouse datasets and selected, for each one, the most representative EST (i.e. the probes with the highest number of experiments in each dataset). We then generated the respective ranked lists, subdivided them in 1% rank intervals and analyzed the number of orthologous pairs in corresponding rank intervals. As a control, we performed the same analysis on an equal number of randomly chosen (and hence non-orthologous) human-mouse EST pairs. The analysis was repeated three times with essentially identical results.
Predictive value of CLOE compared to single organisms and multiple species co-expression analysis
We next investigated the effectiveness of our approach, by comparing it to single and multiple organisms co-expression analysis. To address this point, we analyzed the ability of the three methods to predict known physical and functional interactions among mammalian proteins. Protein-protein complexes have begun to be determined on a genome-wide scale only for Saccharomyces cerevisiae , Drosophila melanogaster  and Caenorhabditis elegans , but no comparable datasets have been so far published for mammalians, making it impossible to perform a systematic comparison. Therefore, we focused on three supramolecular structures, which have been analyzed by different proteomic strategies at a high level of detail: the centrosome  (110 proteins), the post-synaptic density  (105 proteins) and the TNF-alpha/NFkB signalosome  (128 proteins). For the single organism and CLOE approaches, the analysis was restricted to proteins represented in the SMD by at least one human and one murine EST probe. These corresponded to 62, 67 and 97 ESTs pairs, respectively, covering on average 66% of the proteins found in these complexes. In contrast, only 37% of these genes were represented in the multiple species network, thus confirming that the previous two methods can be applied to a number of genes much higher than the latter.
Percentage of known protein-protein interactions in the lists of candidate partners generated by the different co-expression-based approaches. For every protein found in the three analyzed complexes, and represented by at least one EST probe in both datasets, we selected the most representative human and mouse probes. A CLOE analysis with a top 1% cutoff was performed on these sequences. In parallel, the human dataset was ranked for each human EST, and lists of candidates corresponding to the top 1% ranks where obtained (Single organism). The prevalence of ESTs corresponding to other proteins of the same complexes was then determined for both approaches. Finally, to determine the prevalence of correct predictions by the Multiple Species approach, we determined the ratio between the number of links with other proteins of the same complex and the total number of links for all the complex components (data from ).
Prevalence of functionally compatible predictions obtained with the three different methods. The percentage of compatible predictions was determined as in the previous table using the functional index described in the text.
Interestingly, even though also in this case our approach and the multiple species comparison gave, on average, a higher percentage of compatible predictions, this was not dramatically different from the single-organism method. These results strongly suggest that, compared to the single organism approach, the highly reduced number of candidate partners produced by multiple organism co-expression analysis and CLOE is strongly enriched of genes characterized by more stringent functional relationships.
We have shown that CLOE represents a very flexible and effective data mining approach to infer a list of putative partners and the potential functions for genes of interest. It can be easily used for meta-analysis of cDNA microarray experiments characterized by very heterogeneous coverage, producing significant results even when data from two species as close as mouse and human are analyzed. Compared to single organisms co-expression analysis, it strongly reduces the number of potential partners for genes of interest, producing a list of targets that is highly enriched in physically interacting proteins. On the other hand, compared to multiple species co-expression analysis, it retains a comparable specificity, but can find candidate partners for a greater number of genes. Since the number of candidate partners obtained by this analysis is, on average, in the range of standard experimental validation techniques, we believe CLOE represents a useful tool for the exploration of gene function.
Definition of orthologous ESTs
The first step of our procedure is the identification of orthologous ESTs in the two datasets. Although many different methods could be used to this purpose, we relied on the InParanoid algorithm , which is ideally suited for the identification of orthologous sequences between two species. The results shown were obtained using the pre-computed release 2.6 of the InParanoid database . ESTs were linked to InParanoid clusters through their UniProt codes [25, 26], and their association to UniGene identifiers [27, 28].
Choice of representative probes for the gene of interest
The procedure has been implemented for the analysis of cDNA microarray datasets, such as those contained in the SMD [6, 29]. However, it could be adapted, with minor modifications, to the analysis of Affymetrix datasets.
Ratiometric data for the different organisms are not subjected to any further normalization, and downloaded as log-transformed (base 2) ratios.
Within these datasets, the number of ESTs representing a given transcription unit, as well as the number of valid experiments for each EST, are highly variable. While this feature would pose serious problems, if one should attempt to average the data of all the probes belonging to the same gene, it may offer extremely valuable information when every EST is considered independently, since each clone explores the expression properties of a particular group of exons, in a particular combination of experimental situations.
Non-correlated or anti-correlated expression between two well-represented EST probes belonging to the same gene would strongly suggest that they correspond to alternatively expressed transcripts. For this reason, we decided not to merge the data of probes belonging to the same gene, but to treat separately every EST probe in the datasets.
The choice of the most representative EST for the genes of interest represents a particularly critical aspect of our procedure. If interested to a single gene, the more exhaustive solution to this problem would be to generate and evaluate a list of candidate partners for every possible pairwise combination of ESTs probes. However, this would greatly complicate the analysis should one be interested in analyzing the potential partners of many genes. An alternative possibility is to focus, for every Unigene cluster, on the most representative ESTs, i.e. the probes with the highest number of experiments. For these reasons, the decision about what ESTs to analyze is left to the end user.
Our implementation of the method can accept as input both a UniGene cluster ID or the results of a BLAST search performed with the sequence of interest against the EST database. In both cases, it retrieves a list of all probes found in the two datasets for the orthologous UniGene clusters.
To help the user decide which ESTs to analyze, the program provides basic information about all the EST probes representing the gene of interest in the two datasets. Moreover, to help the user identify the most representative EST probe in each dataset, i.e. the probe with the highest number of experiments, it also provides the number of valid data points for every probe. Finally, to assess the redundancy of the information provided by the different probes, i.e. whether they represent different experiments and display similar/different expression profiles, the program calculates, for every pair of probes belonging to the same UniGene cluster, the number of common experiments and the Pearson correlation coefficient between their expression profiles.
Identification of orthologous sequences coexpressed in both species
List of candidate partners generated by CLOE analysis on the most representative ESTs corresponding to the protein of unknown function FAD104.
Human EST with highest rank
Human UniGene name
Mouse EST with highest rank
Mouse UniGene name
Functional characterization of the co-expressed orthologous clusters
After obtaining a list of putative partners for the genes under study, we analyze their functional characterization according to the GO vocabulary [16, 30]. This is very useful to obtain new insight about the putative functional properties of the gene of interest. GO terms are associated to ESTs through the corresponding UniProt identifiers. For each list of candidates, we compute the prevalence of all GO terms among the annotated ESTs, and the probability that such prevalence would occur in a randomly chosen set of ESTs of the same size. We always consider a gene annotated to a GO term if it is directly annotated to it or to any of its descendants in the GO graph. For a given GO term t let K(t) be the total number of ESTs annotated to it in the first organism dataset that have an orthologous sequence in the second organism dataset, and k(m, t) the number of ESTs annotated to it in the final list S(m). If J and j(m) denote the number of orthologous ESTs in the dataset and in S(m) respectively, such probability is given by the right tail of the appropriate hypergeometric distribution:
Gene Ontology keywords overrepresented in the list shown in the previous supplementary table. The results strongly suggest that this protein could be involved in some aspects of the functional interaction between the cytoskeleton and the extracellular matrix.
Peptidyl-prolyl cis-trans isomerase
Structural constituent of muscle
The programs used for this work are publicly available at the URL:
http://www.personalweb.unito.it/ferdinando.dicunto/CLOE/CLOE.html. This page contains the following files:
programs.zip (the program files and the corresponding General Public License);
readme.txt (detailed instructions for using our program).
We are grateful to Michele Caselle, Davide Corà and Mara Brancaccio for helpful discussion. This work was supported by the '60% local research fund' of the Italian Ministry for Education and Scientific Research.
- Ge H, Walhout AJ, Vidal M: Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet 2003, 19: 551–60. 10.1016/j.tig.2003.08.009View ArticlePubMedGoogle Scholar
- Gupta S, Zink D, Korn B, Vingron M, Haas SA: Genome wide identification and classification of alternative splicing based on EST data. Bioinformatics 2004, 20: 2579–85. 10.1093/bioinformatics/bth288View ArticlePubMedGoogle Scholar
- Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S: Increase of functional diversity by alternative splicing. Trends Genet 2003, 19: 124–8. 10.1016/S0168-9525(03)00023-4View ArticlePubMedGoogle Scholar
- Brazhnik P, de la Fuente A, Mendes P: Gene networks: how to put the function in genomics. Trends Biotechnol 2002, 20: 467–72. 10.1016/S0167-7799(02)02053-XView ArticlePubMedGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30: 207–10. 10.1093/nar/30.1.207PubMed CentralView ArticlePubMedGoogle Scholar
- Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G: The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003, 31: 94–6. 10.1093/nar/gkg078PubMed CentralView ArticlePubMedGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res 2004, 14: 1085–94. 10.1101/gr.1910904PubMed CentralView ArticlePubMedGoogle Scholar
- Price MN, Rieffel E: Finding coexpressed genes in counts-based data: an improved measure with validation experiments. Bioinformatics 2004, 20: 945–52. 10.1093/bioinformatics/bth011View ArticlePubMedGoogle Scholar
- Fukuoka Y, Inaoka H, Kohane IS: Inter-species differences of co-expression of neighboring genes in eukaryotic genomes. BMC Genomics 2004, 5: 4. 10.1186/1471-2164-5-4PubMed CentralView ArticlePubMedGoogle Scholar
- Spellman P, Rubin G: Evidence for large domains of similarly expressed genes in the Drosophila genome. Journal of Biology 2002, 1: 5. 10.1186/1475-4924-1-5PubMed CentralView ArticlePubMedGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules. Science 2003, 302: 249–255. 10.1126/science.1087447View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860–921. 10.1038/35057062View ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES: Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420: 520–62. 10.1038/nature01262View ArticlePubMedGoogle Scholar
- Moreau Y, Aerts S, De Moor B, De Strooper B, Dabrowski M: Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet 2003, 19: 570–7. 10.1016/j.tig.2003.08.006View ArticlePubMedGoogle Scholar
- Kochiwa H, Suzuki R, Washio T, Saito R, Bono H, Carninci P, Okazaki Y, Miki R, Hayashizaki Y, Tomita M: Inferring alternative splicing patterns in mouse from a full-length cDNA library and microarray data. Genome Res 2002, 12: 1286–93. 10.1101/gr.220302PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–9. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415: 141–7. 10.1038/415141aView ArticlePubMedGoogle Scholar
- Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL Jr, White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A protein interaction map of Drosophila melanogaster. Science 2003, 302: 1727–36. 10.1126/science.1090289View ArticlePubMedGoogle Scholar
- Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans. Science 2004, 303: 540–3. 10.1126/science.1091403PubMed CentralView ArticlePubMedGoogle Scholar
- Andersen JS, Wilkinson CJ, Mayor T, Mortensen P, Nigg EA, Mann M: Proteomic characterization of the human centrosome by protein correlation profiling. Nature 2003, 426: 570–4. 10.1038/nature02166View ArticlePubMedGoogle Scholar
- Stevens SM Jr, Zharikova AD, Prokai L: Proteomic analysis of the synaptic plasma membrane fraction isolated from rat forebrain. Brain Res Mol Brain Res 2003, 117: 116–28. 10.1016/S0169-328X(03)00282-1View ArticlePubMedGoogle Scholar
- Bouwmeester T, Bauch A, Ruffner H, Angrand PO, Bergamini G, Croughton K, Cruciat C, Eberhard D, Gagneur J, Ghidelli S, Hopf C, Huhse B, Mangano R, Michon AM, Schirle M, Schlegl J, Schwab M, Stein MA, Bauer A, Casari G, Drewes G, Gavin AC, Jackson DB, Joberty G, Neubauer G, Rick J, Kuster B, Superti-Furga G: A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. Nat Cell Biol 2004, 6: 97–105. 10.1038/ncb1086View ArticlePubMedGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 2001, 314: 1041–52. 10.1006/jmbi.2000.5197View ArticlePubMedGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 2004, 32(Database):D115–9. 10.1093/nar/gkh131PubMed CentralView ArticlePubMedGoogle Scholar
- Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L: Database resources of the National Center for Biotechnology. Nucl Acids Res 2003, 31: 28–33. 10.1093/nar/gkg033PubMed CentralView ArticlePubMedGoogle Scholar
- Stanford Microarray Database[http://genome-www.stanford.edu/microarray]
- Gene Ontology[http://www.geneontology.org]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.