Extracting consistent knowledge from highly inconsistent cancer gene data sources
© Gong et al; licensee BioMed Central Ltd. 2010
Received: 15 September 2009
Accepted: 5 February 2010
Published: 5 February 2010
Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.
First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.
Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
Cancer is an extremely heterogeneous disease that is induced by mutations or other alterations in many genes [1, 2]. Identification of genes that are causally implicated in oncogenesis is of basic importance for predicting novel cancer genes [3–6], and studying their evolutionary conservation [6, 7], biological network features [4, 8] and functions [9–11]. It can also provide valuable biomarkers for cancer diagnosis and drug development [2, 11]. Until now, hundreds of cancer genes that have been found in small-scale experiments have been collected in various databases such as Cancer Gene Census (CGC) , Online Mendelian Inheritance in Man (OMIM) , and many others [14–17]. Recently, by high-throughput somatic mutational screening of cancer genomes [18–24], hundreds of new cancer genes that carry driver mutations are being identified rapidly. These increasingly abundant data provide us with an excellent opportunity to understand the underlying complex mechanisms of oncogenesis.
Nevertheless, we face new challenges to interpret and apply these abundant yet diverse data sources efficiently. In particular, it is important to evaluate the consistency and reliability of the information from different data sources. In this work, we analyzed six lists of cancer genes separately from six major databases [12–17] and two lists of candidate cancer genes identified by two types of high-throughput techniques [19, 20, 22, 23, 25, 26]. First, we showed that these gene lists were highly inconsistent in terms of overlapping genes, which reflected partially their various types of cancer and mutations. In particular, most cancer genes accumulated in small-scale experiments could not be reproduced in current high-throughput mutational screening of cancer genomes, even when comparing cancer type-specific genes. This suggests that the sample sizes used in the small-scale studies or high-throughput genome screening might have been too small to capture consistently genes that are causally related to cancers with extremely heterogeneous genetic mechanisms.
On the other hand, various gene lists might capture separately different genes in a few functional pathways that are related to human cancer [1, 18, 20, 21, 27–29]. Based on protein-protein interaction (PPI) data, we introduced the POGF (Percentage of Overlapping Genes Functionally related) metric to evaluate the functional consistency of gene lists, and found that most of them were actually highly functionally consistent. Specifically, most cancer genes accumulated in previous small-scale studies could be functionally reproduced in current high-throughput studies.
The CGC database is the most widely utilized cancer gene data source [3–6, 8, 11, 22, 23], therefore, we used it as a benchmark for evaluating and selecting functionally consistent cancer genes from other data sources. We found that the selected genes were more significantly enriched in cancer pathways than the rest of the genes. Finally, we developed the database F-Census for collecting functionally consistent cancer genes from various data sources http://bioinfo.hrbmu.edu.cn/fcensus/.
Cancer gene lists
The eight cancer gene lists analyzed in this paper
Full names and URLs
No. of genes
Cancer Gene Census
Online Mendelian Inheritance in Man
Atlas of Genetics and Cytogenetics in Oncology and Haematology
The Tumor Gene Family of Databases
Tumor suppressor gene database
Candidate cancer genes provided by genome mutation scans
Candidate cancer genes identified by retroviral insertional mutagenesis screens
PPI and Gene Ontology (GO) data
The PPI data were derived from the Human Protein Reference Database (HPRD, release 7) , which contains 34 998 interactions that involve 9303 proteins after removing self-interactions, including 13 080 interactions between 6311 proteins derived from high-throughput yeast two-hybrid experiments. The GO annotation data  were downloaded on September 1, 2008.
Evaluating the consistency of gene lists by POG scores
where E(POG12) and E(POG21) are the POG scores expected by random chance, which are estimated separately as the average of the scores for 10 000 pairs of gene lists (with length l1 and l2) extracted randomly from the human genome.
Evaluating the functional consistency of gene lists at the network level
where N is the number of all the possible links between this gene and other genes in the PPI network, and n is the observed number.
where O is the number of genes shared by the two lists, and Of12 (or Of21) is the number of genes in list 1 (or list 2) not shared by but functionally similar to genes in list 2 (or list 1).
where E(POGF12) and E(POGF21) are the scores expected by random chance for two gene lists (with length l1 and l2), which are estimated separately as the average of the scores of 10 000 pairs of gene lists (with length l1 and l2) extracted randomly from all the genes in the PPI network.
Statistical significance of a consistency score
To evaluate the significance of an observed POG (or POGF) score between two lists (with length l1 and l2), we selected randomly a pair of gene lists (with length l1 and l2) and calculated the score by the same method. This process was repeated 10 000 times. The significance (P value) of the score was calculated as the percentage of the random scores that were larger than the observed score. The P value of a nPOG or nPOGF score is the same as that of the corresponding POG or POGF score because the E(POG) or E(POGF) that is used to normalize the POG or POGF score is a constant .
Selecting functionally consistent cancer genes
The CGC database comprises cancer genes with relatively stringent criteria. Therefore, we filtered other gene lists according to their functional similarity to the genes included in the CGC database. A gene was selected if its functional links to genes from CGC were significantly more than expected by random chance, with the P value calculated by formula (3) and corrected by the FDR control . Then, for the selected genes and the remaining ones, respectively, we calculated the probabilities of their enrichment in each of the 10 cancer pathways described in the Cancer Cell Map database , by the hypergeometric distribution model.
Consistency between gene lists in terms of gene overlapping
The above results showed that these lists of cancer genes were highly inconsistent in terms of gene overlapping. However, all the observed POG and nPOG scores were significantly larger than the scores expected by random chance (P < 1.0 E-04).
Functional consistency between gene lists
All the observed POGF and nPOGF scores were statistically significant (P < 1.0 E-04).
Consistency of gene lists discovered in low- and high-throughput studies
The scores between sub-lists of cancer genes for each cancer type
From L to H
From H to L
On the other hand, the POGF (nPOGF) score from the L-list to the H-list was 0.69 (0.67) and 0.74 (0.70) in the other direction. Thus, functionally, cancer genes found in small-scale experiments were consistent with those found in the high-throughput studies. As shown in Table 2, from the sub-lists of cancer genes discovered by the genome screening to the sub-lists of cancer genes discovered in small-scale experiments for breast, colon and pancreatic cancers, and glioblastoma, the POGF (nPOGF) scores were as high as 0.62 (0.60), 0.82 (0.81), 0.62 (0.60) and 0.83 (0.83), respectively. In the other direction, the POGF (nPOGF) scores were much lower, which were 0.58 (0.56), 0.64 (0.63), 0.34 (0.32) and 0.64 (0.63) for the four cancer types, respectively. Thus, for each cancer type, the cancer genes discovered by the genome screening might cover more functions of cancer genomes than the cancer genes accumulated from small-scale experiments.
We used the R-list for the 645 cancer genes indentified by the high-throughput retroviral insertional mutagenesis screening. As shown in Figure 2B, the POG (nPOG) scores from the R-list to the L-list were 0.12 (0.10) and 0.22 (0.18) in the other direction. However, the POGF (nPOGF) scores were as high as 0.70 (0.68) and 0.78 (0.75) in the two directions, respectively. These results were similar to those for the H-list. The POG (nPOG) score from the R-list to the H-list was only 0.05 (0.03) and 0.04 (0.02) in the other direction. The POGF (nPOGF) scores in the two directions were 0.57 (0.53) and 0.62 (0.60), respectively, which suggested that these two lists of cancer genes were less functionally overlapped.
Cancer genes selected by functional consistency and the F-Census database
The enrichment of the selected genes in cancer pathways (FDR < 0.01)
Signal pathway names
Based on the above results, we have developed a database named F-Census for extracting functionally consistent cancer genes from different data sources. This database is available at http://bioinfo.hrbmu.edu.cn/fcensus/. Using this database, users can extract cancer genes from several databases to obtain their union and intersection gene sets, thus providing information about cancer genes, such as their type (oncogenes and tumor suppressor genes), their occurrence in different cancers, and their mutation frequencies estimated from the high-throughput studies. Also, the users can obtain the cancer gene list pre-selected by our criteria based on their functional similarity to genes in CGC. The users can upload a list of candidate genes and prioritize the genes in the list according to their functional similarity to cancer genes in CGC. Finally, the users can look up the functional categories enriched with cancer genes from various cancer gene lists (please see the Help page on our website for details).
In this study, we showed that current cancer gene data sources were highly inconsistent in terms of gene overlapping. This suggested that the sample sizes used in either the small-scale studies or high-throughput genome screening might be too small to provide enough power for consistently capturing genes causally related to the extremely heterogeneous cancer [1, 12, 40, 41]. Nevertheless, most cancer gene lists were functionally consistent, which indicated that they might all come from some key pathways associated with cancer. Based on this assumption, for a list of cancer genes, there should be subsets of non-redundant genes that could functionally represent the full list of genes. Actually, by the algorithm described in additional file 1, we could select 75 genes from GCG, which could represent all the 377 cancer genes from CGC, in the sense that all 377 cancer genes are frequently connected to the 75 cancer genes in the PPI network (POGF score = 1). A future study is warranted to establish whether such a non-redundant subset of genes hints at the organization of cancer-related functions.
The biological function of a gene can be defined at several levels, ranging from the basic biological attributes of a protein product, to the nature of physical and regulatory interactions, membership in a given biological pathway, and membership of a specific biological network (such as a PPI sub-network) [10, 11]. We could consider that the functional consistency of gene lists evaluated by the POGF score based on PPI links is at the PPI network level. We could also evaluate the consistency of gene lists at other functional levels. For example, using GO terms at separate levels of the GO hierarchy, we could evaluate the consistency of gene lists at various levels of pathway specificity, and find the most specific level at which the consistency changes from high to low. To design such GO-based consistency scores, we need to consider the limitations that GO levels are artificially defined, and a large fraction of genes are only annotated to general high-level terms.
It would be interesting to identify a functional level at which cancer genes of the same cancer type overlap strongly and cancer genes of different cancer types can be distinguished. However, it might be difficult, if not impossible, to achieve this goal because most genes responsible for tumorigenesis of different cancer types might disrupt the same or similar pathways . In the KEGG database, all the 14 pathways labelled with cancer types, according to some so far agreed cancer-type-specific genes, such as APC of colorectal cancer, actually consist of similar biological pathways, such as mitogen-activated protein kinase, p53, transforming growth factor-β and Jak-Stat pathways . Statistically, because of the small samples studied for some cancers, the lists of cancer genes accumulated so far for different cancers might be inconsistent and insufficient for functional discrimination of cancer types. As demonstrated in our previous work , even for the same cancer, the true disease markers identified in different studies with insufficient samples (and thus low statistical power) are highly likely to be inconsistent. We believe that it might be necessary to use more samples and combine functional data with tissue expression data to study cancer-type-specific mechanisms.
The literature-based interaction data in the HPRD database might be biased towards well-studied cancer genes. However, Ciccarelli et al.  have argued that such a bias might be ignorable because, in the high-throughput PPI data, cancer genes also tend to have higher degrees in the PPI network than other genes. Similarly, using cancer genes with both literature-based interaction data and high-throughput interaction data in the HPRD database, we found that the literature-based degrees of these cancer genes were significantly correlated with their high-throughput data-based degrees (r = 0.4, P < 0.01, Spearman's rank correlation), indicating our functional assessment would not be severely affected by the research bias. This problem should be further addressed when more high-throughput PPI data become available. Another concern that should be addressed is that current PPI data are incomplete. However, as in the present study, the functional similarity measure based on indirect PPI links might lessen the effect of the incompleteness of the direct PPI links.
In our study, CGC was employed as a benchmark for the comparison because it is the most widely applied data source. However, this benchmark might be biased because genes collected in CGC tend to originate from lymphoma/leukaemia, and most genes were of translocation mutations. Thus, in our future work, we will exploit other criteria to define more reliable and unbiased benchmark cancer gene sets. One approach might be to find genes non-randomly co-mutated with other genes in cancer samples. As implied by our work  and Yeang et al. , this statistically sound approach could bypass the unsolved difficulty of the background mutation rate estimation in so-far used prediction methods.
Finally, we note that the F-Census database is still under development, and is aimed at including more comprehensive information on cancer genes. For example, we have included in the database genes non-randomly co-mutated with other genes in cancer samples, which can provide strong statistical evidence on their involvement and functional coordination in cancer [9, 44]. Additionally, we have collected miRNAs that could play important roles in oncogenesis by regulating cancer genes [45–47]. We will also try to consider the full spectrum of genetic and epigenetic changes in cancer in our future studies [48, 49].
Because cancer is an extremely heterogeneous disease, low consistency in the discovery of cancer genes could have been expected in studies that have used insufficient samples. Although most data sources have low gene overlapping, they are highly consistent at the functional level, which indicates that they might capture separately different genes in a few key pathways associated with cancer. Our database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 30670539, 30770558).
- Vogelstein B, Kinzler KW: Cancer genes and the pathways they control. Nat Med 2004, 10: 789–799. 10.1038/nm1087View ArticlePubMedGoogle Scholar
- Sjoblom T: Systematic analyses of the cancer genome: lessons learned from sequencing most of the annotated human protein-coding genes. Curr Opin Oncol 2008, 20: 66–71. 10.1097/CCO.0b013e3282f31108View ArticlePubMedGoogle Scholar
- Furney SJ, Madden SF, Kisiel TA, Higgins DG, Lopez-Bigas N: Distinct patterns in the regulation and evolution of human cancer genes. In Silico Biol 2008, 8: 33–46.PubMedGoogle Scholar
- Furney SJ, Higgins DG, Ouzounis CA, Lopez-Bigas N: Structural and functional properties of genes involved in human cancer. BMC Genomics 2006, 7: 3. 10.1186/1471-2164-7-3View ArticlePubMedPubMed CentralGoogle Scholar
- Furney SJ, Calvo B, Larranaga P, Lozano JA, Lopez-Bigas N: Prioritization of candidate cancer genes--an aid to oncogenomic studies. Nucleic Acids Res 2008, 36: e115. 10.1093/nar/gkn482View ArticlePubMedPubMed CentralGoogle Scholar
- Rambaldi D, Giorgi FM, Capuani F, Ciliberto A, Ciccarelli FD: Low duplicability and network fragility of cancer genes. Trends Genet 2008, 24: 427–430. 10.1016/j.tig.2008.06.003View ArticlePubMedGoogle Scholar
- Huang H, Winter EE, Wang H, Weinstock KG, Xing H, Goodstadt L, Stenson PD, Cooper DN, Smith D, Alba MM, Ponting CP, Fechtel K: Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol 2004, 5: R47. 10.1186/gb-2004-5-7-r47View ArticlePubMedPubMed CentralGoogle Scholar
- Jonsson PF, Bates PA: Global topological features of cancer proteins in the human interactome. Bioinformatics 2006, 22: 2291–2297. 10.1093/bioinformatics/btl390View ArticlePubMedPubMed CentralGoogle Scholar
- Ma W, Yang D, Gu Y, Guo X, Zhao W, Guo Z: Finding disease-specific coordinated functions by multi-function genes: insight into the coordination mechanisms in diseases. Genomics 2009, 94: 94–100. 10.1016/j.ygeno.2009.05.001View ArticlePubMedGoogle Scholar
- Guo Z, Wang L, Li Y, Gong X, Yao C, Ma W, Wang D, Li Y, Zhu J, Zhang M, Yang D, Rao S, Wang J: Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network. Bioinformatics 2007, 23: 2121–2128. 10.1093/bioinformatics/btm294View ArticlePubMedGoogle Scholar
- Hu P, Bader G, Wigle DA, Emili A: Computational prediction of cancer-gene function. Nat Rev Cancer 2007, 7: 23–34. 10.1038/nrc2036View ArticlePubMedGoogle Scholar
- Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat Rev Cancer 2004, 4: 177–183. 10.1038/nrc1299View ArticlePubMedPubMed CentralGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005, 33: D514–517. 10.1093/nar/gki033View ArticlePubMedPubMed CentralGoogle Scholar
- Higgins ME, Claremont M, Major JE, Sander C, Lash AE: CancerGenes: a gene selection resource for cancer genome projects. Nucleic Acids Res 2007, 35: D721–726. 10.1093/nar/gkl811View ArticlePubMedPubMed CentralGoogle Scholar
- Yang Y, Fu LM: TSGDB: a database system for tumor suppressor genes. Bioinformatics 2003, 19: 2311–2312. 10.1093/bioinformatics/btg300View ArticlePubMedGoogle Scholar
- Levine AE, Steffen DL: OrCGDB: a database of genes involved in oral cancer. Nucleic Acids Res 2001, 29: 300–302. 10.1093/nar/29.1.300View ArticlePubMedPubMed CentralGoogle Scholar
- Huret JL, Dessen P, Bernheim A: Atlas of Genetics and Cytogenetics in Oncology and Haematology, year 2003. Nucleic Acids Res 2003, 31: 272–274. 10.1093/nar/gkg126View ArticlePubMedPubMed CentralGoogle Scholar
- TCGA: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455: 1061–1068. 10.1038/nature07385View ArticleGoogle Scholar
- Parsons DW, Jones S, Zhang X, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu IM, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S, Nikolskaya T, Nikolsky Y, Busam DA, Tekleab H, Diaz LA Jr, Hartigan J, Smith DR, Strausberg RL, Marie SK, Shinjo SM, Yan H, Riggins GJ, Bigner DD, Karchin R, Papadopoulos N, Parmigiani G, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321: 1807–1812. 10.1126/science.1164382View ArticlePubMedPubMed CentralGoogle Scholar
- Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Kamiyama H, Jimeno A, Hong SM, Fu B, Lin MT, Calhoun ES, Kamiyama M, Walter K, Nikolskaya T, Nikolsky Y, Hartigan J, Smith DR, Hidalgo M, Leach SD, Klein AP, Jaffee EM, Goggins M, Maitra A, Iacobuzio-Donahue C, Eshleman JR, Kern SE, Hruban RH, et al.: Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 2008, 321: 1801–1806. 10.1126/science.1164368View ArticlePubMedPubMed CentralGoogle Scholar
- Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, et al.: Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008, 455: 1069–1075. 10.1038/nature07423View ArticlePubMedPubMed CentralGoogle Scholar
- Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JK, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PV, et al.: The genomic landscapes of human breast and colorectal cancers. Science 2007, 318: 1108–1113. 10.1126/science.1145720View ArticlePubMedGoogle Scholar
- Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O'Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al.: Patterns of somatic mutation in human cancer genomes. Nature 2007, 446: 153–158. 10.1038/nature05610View ArticlePubMedPubMed CentralGoogle Scholar
- Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JK, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus coding sequences of human breast and colorectal cancers. Science 2006, 314: 268–274. 10.1126/science.1133427View ArticlePubMedGoogle Scholar
- Akagi K, Suzuki T, Stephens RM, Jenkins NA, Copeland NG: RTCGD: retroviral tagged cancer gene database. Nucleic Acids Res 2004, 32: D523–527. 10.1093/nar/gkh013View ArticlePubMedPubMed CentralGoogle Scholar
- Uren AG, Kool J, Matentzoglu K, de Ridder J, Mattison J, van Uitert M, Lagcher W, Sie D, Tanger E, Cox T, Reinders M, Hubbard TJ, Rogers J, Jonkers J, Wessels L, Adams DJ, van Lohuizen M, Berns A: Large-scale mutagenesis in p19(ARF)- and p53-deficient mice identifies cancer genes and their collaborative networks. Cell 2008, 133: 727–741. 10.1016/j.cell.2008.03.021View ArticlePubMedPubMed CentralGoogle Scholar
- Hahn WC, Weinberg RA: Rules for making human tumor cells. N Engl J Med 2002, 347: 1593–1603. 10.1056/NEJMra021902View ArticlePubMedGoogle Scholar
- Hahn WC, Weinberg RA: Modelling the molecular circuitry of cancer. Nat Rev Cancer 2002, 2: 331–341. 10.1038/nrc795View ArticlePubMedGoogle Scholar
- Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100: 57–70. 10.1016/S0092-8674(00)81683-9View ArticlePubMedGoogle Scholar
- Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A: Human Protein Reference Database--2009 update. Nucleic Acids Res 2009, 37: D767–72. 10.1093/nar/gkn892View ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–29. 10.1038/75556View ArticlePubMedPubMed CentralGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods 2005, 2: 345–350. 10.1038/nmeth756View ArticlePubMedGoogle Scholar
- Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D, Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 2009, 25: 1662–1668. 10.1093/bioinformatics/btp295View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang M, Yao C, Guo Z, Zou J, Zhang L, Xiao H, Wang D, Yang D, Gong X, Zhu J, Li Y, Li X: Apparently low reproducibility of true differential expression discoveries in microarray studies. Bioinformatics 2008, 24: 2057–2063. 10.1093/bioinformatics/btn365View ArticlePubMedGoogle Scholar
- Chua HN, Sung WK, Wong L: Using indirect protein interactions for the prediction of Gene Ontology functions. BMC Bioinformatics 2007, 8(Suppl 4):S8. 10.1186/1471-2105-8-S4-S8View ArticlePubMedPubMed CentralGoogle Scholar
- Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 2006, 22: 1623–1630. 10.1093/bioinformatics/btl145View ArticlePubMedGoogle Scholar
- Chua HN, Sung WK, Wong L: An efficient strategy for extensive integration of diverse biological data for protein function prediction. Bioinformatics 2007, 23: 3364–3373. 10.1093/bioinformatics/btm520View ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 1995, 57: 289–300.Google Scholar
- The Cancer Cell Map[http://cancer.cellmap.org/cellmap/]
- Loeb LA, Loeb KR, Anderson JP: Multiple mutations and cancer. Proc Natl Acad Sci USA 2003, 100: 776–781. 10.1073/pnas.0334858100View ArticlePubMedPubMed CentralGoogle Scholar
- Fox EJ, Salk JJ, Loeb LA: Cancer genome sequencing--an interim analysis. Cancer Res 2009, 69: 4948–4950. 10.1158/0008-5472.CAN-09-1231View ArticlePubMedPubMed CentralGoogle Scholar
- Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res 2008, 36: D480–484. 10.1093/nar/gkm882View ArticlePubMedPubMed CentralGoogle Scholar
- Zhu J, Shen X, Zhang Y, Xiao H, Gu Y, Guo Z: Identifying candidate cancer genes based on their somatic mutations co-occurring with cancer genes in cancer genome profiling. 2nd International Conference On Biomedical Engineering and Informatics 2009, 3: 1448–1451.Google Scholar
- Yeang CH, McCormick F, Levine A: Combinatorial patterns of somatic gene mutations in cancer. Faseb J 2008, 22: 2605–2622. 10.1096/fj.08-108985View ArticlePubMedGoogle Scholar
- Zhang B, Pan X, Cobb GP, Anderson TA: microRNAs as oncogenes and tumor suppressors. Dev Biol 2007, 302: 1–12. 10.1016/j.ydbio.2006.08.028View ArticlePubMedGoogle Scholar
- Spizzo R, Nicoloso MS, Croce CM, Calin GA: SnapShot: MicroRNAs in Cancer. Cell 2009, 137: 586–586 e581. 10.1016/j.cell.2009.04.040View ArticlePubMedGoogle Scholar
- Negrini M, Nicoloso MS, Calin GA: MicroRNAs and cancer--new paradigms in molecular oncology. Curr Opin Cell Biol 2009, 21: 470–479. 10.1016/j.ceb.2009.03.002View ArticlePubMedGoogle Scholar
- Chan TA, Glockner S, Yi JM, Chen W, Van Neste L, Cope L, Herman JG, Velculescu V, Schuebel KE, Ahuja N, Baylin SB: Convergence of mutation and epigenetic alterations identifies common genes in cancer that predict for poor prognosis. PLoS Med 2008, 5: e114. 10.1371/journal.pmed.0050114View ArticlePubMedPubMed CentralGoogle Scholar
- Schuebel KE, Chen W, Cope L, Glockner SC, Suzuki H, Yi JM, Chan TA, Van Neste L, Van Criekinge W, Bosch S, van Engeland M, Ting AH, Jair K, Yu W, Toyota M, Imai K, Ahuja N, Herman JG, Baylin SB: Comparing the DNA hypermethylome with gene mutations in human colorectal cancer. PLoS Genet 2007, 3: 1709–1723. 10.1371/journal.pgen.0030157View ArticlePubMedGoogle Scholar
- Mitelman F: Recurrent chromosome aberrations in cancer. Mutat Res 2000, 462: 247–253. 10.1016/S1383-5742(00)00006-5View ArticlePubMedGoogle Scholar
- Baasiri RA, Glasser SR, Steffen DL, Wheeler DA: The breast cancer gene database: a collaborative information resource. Oncogene 1999, 18: 7958–7965. 10.1038/sj.onc.1203335View ArticlePubMedGoogle Scholar
- Steffen DL, Levine AE, Yarus S, Baasiri RA, Wheeler DA: Digital reviews in molecular biology: approaches to structured digital publication. Bioinformatics 2000, 16: 639–649. 10.1093/bioinformatics/16.7.639View ArticlePubMedGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA 2007, 104: 8685–8690. 10.1073/pnas.0701361104View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.