Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes
© Wu et al.; licensee BioMed Central Ltd. 2012
Received: 7 March 2012
Accepted: 17 July 2012
Published: 28 July 2012
To understand the roles they play in complex diseases, genes need to be investigated in the networks they are involved in. Integration of gene expression and network data is a promising approach to prioritize disease-associated genes. Some methods have been developed in this field, but the problem is still far from being solved.
In this paper, we developed a method, Networked Gene Prioritizer (NGP), to prioritize cancer-associated genes. Applications on several breast cancer and lung cancer datasets demonstrated that NGP performs better than the existing methods. It provides stable top ranking genes between independent datasets. The top-ranked genes by NGP are enriched in the cancer-associated pathways. The top-ranked genes by NGP-PLK1, MCM2, MCM3, MCM7, MCM10 and SKP2 might coordinate to promote cell cycle related processes in cancer but not normal cells.
In this paper, we have developed a method named NGP, to prioritize cancer-associated genes. Our results demonstrated that NGP performs better than the existing methods.
To understand the roles they play in complex diseases, genes need to be investigated in the networks they are involved in . Integration of gene expression and network data is a promising approach to prioritize disease-associated genes . The prioritized genes can facilitate us to understand the molecular mechanism of disease and discover the promising candidates of drug targets.
Up to now, three main types of methods have been developed to prioritize disease-associated genes with gene expression and network data. The assumption of the first type is that the genes surrounded by differentially expressed (DE) genes in networks tend to be disease associated genes [3–7]. A recently published example of this type is the Heat Kernel Ranking method . The second type is based on a network rewiring (NR) model to prioritize disease-associated genes [8–11]. In the NR model, the interactions of the candidate gene with other genes are assumed to be changed between normal and disease samples. The method by Taylor et al.  is a typical representative of this type. The third type considers the changes of gene interactions between normal and disease samples and their effects on gene expression to prioritize disease-associated genes [12, 13]. The method of Regulatory Impact Factor or RIF  is a recently developed method of this type.
However, there are some drawbacks in the existing methods. For the first type of methods, networks are assumed to be static and may not reflect the specific condition under the study, and therefore it may produce many false positive results. For the second type of methods, it considers network variations but doesn’t consider their effects on gene expression. In some case, a top-ranked gene by this method may not play important roles in disease because its network variations may have little effects on the expression of its interacting genes. The third type of methods considers network variations and their effects on gene expression. However, in some situation, a disease-associated gene may lead to the differential expression of its interacting genes even there is no network rewiring. We call this situation as networked differential expression or ND for short.
In this paper, we have developed a method named Networked Gene Prioritizer (NGP) to prioritize cancer-associated genes. In our method, we assume that between compared samples, cancer-associated genes cause the differential expression of their interacting genes by NR and/or ND. We applied the proposed NGP method and three representative methods on 4 independent breast cancer patient microarray datasets and 3 independent non-small-cell lung cancer (NSCLC) patient microarray datasets. The compared methods include the Heat Kernel Ranking method , the method by Taylor et al.  and the RIF method . We call them as HKR, the Taylor method and RIF, respectively, for the convenience of description. The results demonstrated that the proposed NGP method performs better than the compared methods. The top-ranked genes by NGP are stable between independent datasets and enriched in the cancer-associated pathways. Our results suggest that the top-ranked genes by NGP-PLK1, MCM2, MCM3, MCM7, MCM10 and SKP2 might coordinate to promote cell cycle related processes in cancer but not normal cells.
We applied NGP, HKR, the Taylor method and RIF on 4 independent breast cancer patient datasets and 3 independent NSCLC patient datasets (see Materials and Methods for the description of the methods and data). NGP can use two models: the NR model and the ND model (see Materials and Methods for details). We call them as NGP-NR and NGP-ND for short, respectively. RIF also has two models: RIF1 and RIF2 . Two questions are asked when we compare their performances: 1). whether the top-ranked genes by the methods are stable between independent datasets; 2). whether the top-ranked genes are enriched in the cancer-associated pathways.
Application on breast cancer patient datasets
We prioritized cancer-associated genes between ER positive and ER negative breast cancer patient samples.
First we investigated whether the methods can produce stable top 10, 25 and 50 genes between independent datasets. It is analyzed by the GSEA strategy and measured by p value (see Materials and Methods for detail). If the p values of the dataset pair are smaller than 0.005, the top ranking genes between the datasets are regarded as stable. As Additional file 1: Table S1, Additional file 2: Table S2 and Additional file 3: Table S3 show, HKR and NGP-ND produce stable top 10, 25 and 50 genes between all datasets while other methods fail. However, HKR’s results are not specific. It always ranked certain genes at top positions no matter what types of gene expression data (e.g., differential gene expression data of disease datasets, shuffled differential gene expression data of diseases datasets or differential gene expression data from the datasets of different diseases) were used (Additional file 4: Table S4). It has been demonstrated that a systematic bias that favors highly connected genes exists in many networked gene prioritization methods [3, 14], and such systematic bias exists in HKR . There is no obvious bias toward certain genes in the other methods because we can see that the top-ranking genes of these methods are unstable in either breast cancer or lung cancer datasets (Additional file 1: Table 1, Additional file 2: Table S2, Additional file 3: Table S3, Additional file 5: Table S5, Additional file 6: Table S6 and Additional file 7: Table S7).
The top 10 genes of different methods in breast cancer patient datasets
The Taylor method
Pathways that the top ranking genes of different methods are enriched in in breast cancer patient datasets
REACT_152:Cell Cycle, Mitotic
REACT_1538:Cell Cycle Checkpoints
REACT_1538:Cell Cycle Checkpoints
REACT_152:Cell Cycle, Mitotic
REACT_152:Cell Cycle, Mitotic
REACT_1538:Cell Cycle Checkpoints
Application on NSCLC patient datasets
We prioritized cancer-associated genes between lung cancer and normal samples.
When taking p < 0.005 as the threshold, NGP-NR, HKR and RIF2 produced stable top 10, 25 and 50 genes between independent datasets (Additional file 5: Table S5, Additional file 6: Table S6, Additional file 7: Table S7). However, as mentioned above, the top-ranking genes of HKR are not specific.
The top 10 genes of different methods in NSCLC patient datasets
The Taylor method
MCM2-7 are eukaryotic replicative helicases, they unwind DNA double strands in DNA replication process . Ge XQ et al.  demonstrated that: 1). Inhibiting the expression of MCM5 with siRNA will reduce chromatin bound MCM2, MCM3, MCM5, MCM6, MCM7 ~50%, but this will not obstruct DNA replication in normal cells; 2). When cells suffer from replicative stress and replicative forks are slowed or stalled, this will make cells not survive. Their result suggests that cells may depend on excess MCM2-7 to efficiently replicate DNA when replicative forks are slowed or stalled. PLK1 is a marker of cellular proliferation, and plays important roles in cancer development . High level of PLK1 expression is detected in NSCLC and other tumors . Trenz K et al.  showed that: 1). Plx1, the Xenopus orthologue of Plk1, is dispensable in unchallenged chromosomal DNA replication; 2). When cells suffer from DNA replication stress and forks are stalled, Plx1 will bind with MCM2-7 to promote DNA replication. The genome of tumor cells is highly unstable. Many DNA lesions exist in tumor genome and they would normally interfere with replication progression. The coordination of PLK1, MCM complex and their interacting genes (Figure 1 and 2) might driver DNA replication forward, overcoming the effects of replication stress in breast cancer and lung cancer cells.
SKP2 plays roles in the transition of cell cycle and behaves as an oncogene . Lin HK et al. demonstrated that under the oncogenic condition (e.g., aberrant proto-oncogenic signals or inactivation of tumor suppressor genes) inactivation of Skp2 will cause cell senescence, but in normal condition inactivation of Skp2 will not influence the senescence of cell . With the work of Lin HK et al. and our results, it is suggested that SKP2 promotes the transition of cell cycle in cancer but not normal cells.
With above results, it is suspected that PLK1, MCM complex, SKP2 and some of their interacting genes (Figure 1 and 2) may play important roles to promote cell cycle related processes in cancer but not normal cells. They could be considered as the promising candidates of drug targets for cancer therapy.
In this paper, we have developed a method named Networked Gene Prioritizer (NGP), to prioritize cancer-associated genes. We compared the performances of NGP with 3 existing methods—HKR, the Taylor method and RIF. The results showed NGP performs better than the compared methods. The different models (NR and ND) make it be able to produce stable top-ranking genes and detect genes in the cancer associated pathways in breast cancer and lung cancer patient datasets. RIF2 succeeds in producing stable top-ranking genes and detecting genes in the cancer associated pathways in breast cancer patient datasets. The other methods fail in breast cancer and lung cancer patient datasets.
NGP outperforms HKR, the Taylor method and RIF because: 1). In HKR, network is static and may not reflect the specific condition under the study. So the systematic bias favoring highly connected genes makes HKR always rank certain genes at top positions no matter what gene expression data is input. In NGP, the PPIs are weighted by gene expression correlations to reflect the network dynamics under the study. 2). The Taylor method considers network variations but doesn’t consider their effects on gene expression. The top prioritized gene by this method may not play important roles in the disease because its network variations may have little effects on the expression of its interacting genes. In NGP-NR, the effects of network variations are measured by the differential expression of interacting genes. 3). Although RIF and NGP-NR both consider network variations and their effects on gene expression to prioritize disease-associated genes, they adopt different models to integrate gene expression and network. In RIF, prioritization of the candidate gene depends on the network variations of the candidate gene with DE interacting genes and their effects on the expression of the DE interacting genes. But the GSEA strategy facilitates NGP-NR to consider the network variations of the candidate gene with all its interacting genes and their effects on the expression of the interacting genes. NGP-NR considers more information about candidate genes’ interacting genes than RIF. Moreover, besides the situation RIF considered (a cancer-associated genes cause the differential expression of their interacting genes by NR); NGP considers one more situation that a disease-associated gene’s dysregulation of expression can lead to the differential expression of its interacting genes when there is no network rewiring. NGP-ND is developed to prioritize genes in this situation.
However, in our experiments, we can see that the two models of NGP (NR model and ND model) do not work equally well on different applications. By their definition, we can see that the two models aim to detect the different types of cancer-associated genes. The problem of NGP is that how to choose a proper model (NGP-NR, NGP-ND, or them both) to prioritize cancer-associated genes when we don’t know the molecular interaction mechanisms of cancer-associated genes with their interacting partners between compared conditions. Our experiences suggest that a judgment can help us find the proper model: whether the model adopted can produce stable top ranking genes between independent datasets.
Additional effort is needed to improve NGP. Genetic studies identify genetic variation locations (e.g., copy number variations or SNPs) involved in disease and provide candidate genes associated with the disease. By investigating the impacts of candidate genes’ variations on their own and their target genes’ expression, people can discover disease-associated genes. Recently, Akavia UD et al. integrated gene expression and copy number variation data to uncover drivers of cancers . With some modifications NGP can also be used to rank candidate genes inferred from other studies such as genetic studies.
In this paper, we have developed a method named NGP, to prioritize cancer-associated genes. The results showed NGP performs better than the existing methods.
Data and data pre-processing
Seven microarray datasets are used in this paper (Additional file 9: Table S9), including 4 independent breast cancer patient datasets: gse5460 , gse7390 , gse21653  and gse2034 , and 3 independent NSCLC patient datasets: gse19804 , gse18842 , gse10072 . All datasets were downloaded from Gene Expression Omnibus (GEO) database . Each microarray dataset was processed as following: at first, raw data were processed by RMA ; then, the samples were classified into different classes (e.g., ER positive and ER negative in breast cancer patient datasets). However, in the breast cancer patient datasets with histological type information, only the samples of ductal type were selected for further analyses.
For each dataset, gene expression profiles were processed as following: at first, the ambiguous probe sets (mapped to more than one gene) were filtered out; then, differential expression analysis of probe sets between compared samples was conducted (t test, two tails, unequal variation); at last, for each gene, the probe set with most significant p value was selected and the probe set’s expression level was assigned as the gene’s expression level. The reason to assign gene expression in this way is that we assume gene expression has significantly changed between compared samples. Thus, the expression of the probe set with most significant p value between compared samples is the best candidate to represent the expression of the gene.
Protein-protein interactions were downloaded from HPRD  database (Release 9). Excluded self-interactions, 9453 proteins and 36867 PPIs were selected.
Pre-selection of candidate genes
Candidate genes were pre-selected as hub genes in affy 133a PPI network; the hub genes are genes with more than 15 interacting genes in the PPI network; affy 133a network is the PPI network constructed by the genes in both Affymetrix HGU-133A chip and PPIs of HPRD. We pre-selected candidate genes in this way because: 1). It is believed that genes with many interacting partners in the network tend to play important roles in cell, for example, they tend to be essential genes or disease-associated genes . 2). In NGP, it is required that a candidate gene should have more than 15 interacting genes in the network (please see NGP methodology for detail). To make sure the different methods start from the same candidate genes set, the hub genes are determined as the genes with more than 15 interacting genes in PPI network. 3). PPI network is constructed by genes both in microarray and PPIs of HPRD. Two types of microarray datasets, Affymetrix HGU-133A and Affymetrix HGU-133Plus2, are used in this paper. Affymetrix HGU-133Plus2 covers the probe sets of Affymetrix HGU-133A. In the experiments on different types of microarray datasets, different PPI networks will be used. To make sure the experiments of different types of microarray datasets start from the same candidate gene sets, the hub genes are determined as the hub genes in affy 133a PPI network. Totally, 953 genes were selected. Then in each method, the pre-selected genes were further processed to get priority list of the method.
Brief introduction of HKR, the Taylor method and RIF
HKR prioritizes genes based on the differential expression of their neighboring genes . A characteristic of HKR is that it takes the random walk strategy to detect candidate genes’ neighboring genes in the network. Two inputs-network and gene differential expression are required to run HKR. The output of HKR is a gene rank list. In our experiments, network is the HPRD network. Gene differential expression is assigned as –log(p),where p was estimated by t test of gene expression between compared samples (two-tailed, unequal variation). Please refer PINTA  for more information.
The Taylor method prioritizes genes based on the network variations of candidate genes with their interacting genes between compared samples . Gene expression correlation is used to measure the dynamic action of the PPIs. The difference of gene expression correlations of the PPIs in compared samples is used to test whether the interactions have been varied. In our experiments, the HPRD network was used. Pearson Coefficient was used to measure the dynamic action of the PPIs. Candidate genes were ranked by averaged absolute difference of Pearson Coefficient of the candidate genes with their interacting genes between compared samples. Please refer the Taylor method  for more information.
where n DE is the number of DE genes that candidate gene i interacted; e1 j and e2 j are the average expression of gene j in compared samples, respectively; r1 ij and r2 ij are Pearson Coefficient between gene i and j in compared samples, respectively.
In our experiments, the HPRD network was used. Gene differential expression was measured by t test (two-tailed, unequal variation). False discovery rate (FDR) was used to correct for multiple comparisons. DE genes were selected by FDR < 0.01. We ranked candidate genes by their normalized RIF1 scores and RIF2 scores.
Networked Gene Prioritizer
PPI network was constructed by genes both in microarray and in PPIs of HPRD. It is defined as G = (V,E), where V is the set of genes, E is the set of interactions.
Weighted PPI network
Where and represent the Spearman coefficients of E ij in compared samples respectively.
However, we first filtered out the PPIs that had been changed between compared samples in NGP-ND. We calculated the of each PPI, then we permuted samples labels (e.g., lung cancer or normal) 10000 times and generated a random , at last the PPIs whose are larger than 90% random were filtered out.
Subnets enriched for the PPIs with high weight
where E j is the jth PPI in the ranked PPIs; r j is the weight of the jth PPI in background set; P is a parameter and set as 1; N is the number of PPIs in E; N H is the number of PPIs in the subnet S.
Where is the mean of the random ES set; S′ is the standard deviation of the random ES set.
At last, we trimmed the subnet by filtering out its PPIs that didn’t contribute to ES.
Subnets enriched for differentially expressed genes
where g j is the jth gene in the ranked genes; r j is the magnitude of differential expression of the jth gene; P is a parameter and set as 1; M is the number of genes in L; M H is the number of genes in S trimmed .
Statistical significance of ES of the trimmed subnet was estimated by Z score. The Z score was estimated on the similar way that we did in “Subnets enriched for PPI with high weight” section.
At last, the trimmed subnet (S trimmed ) was further trimmed by filtering out its genes that didn’t contribute to ES. The trimmed S trimmed is defined as DE subnet.
Prioritization of candidate genes
i represents the ith candidate gene, Z′s_i is the normalized Z S of the ith candidate gene, Ztrimmed_i’ is the normalized Z trimmed of the ith candidate gene.
Stability analysis of top-ranking genes between independent datasets
The aim of stability analysis of top-ranking genes between independent datasets is to investigate whether the methods can produce stable top-ranking genes between independent datasets. For the priority lists in two datasets A and B, if most of the top-ranking genes in A are also the upper ranking genes in B, we regard the top-ranking genes in A are stable in B. If the top-ranking genes in A are stable in B and vice versa, we regard the top-ranking genes are stable between A and B. The stability analysis was also conducted by the GSEA strategy. Objective set was the top-ranking genes in one dataset (e.g., A). Background set was the candidate genes in the other dataset (e.g., B). The parameter P of GSEA was set as 0 and the random time was set as 1000. The significance of the observed ES was measured by nominal p, which was estimated by comparing the observed ES with a set of randomized ES. Please refer the original paper of the GSEA method  for the detail of the estimation of nominal p.
We thank Dr. Yunfei Pei for critical reading of our manuscript. This work is partially supported by the National Basic Research Program of China (2012CB316504) and NSFC grants (Nos. 61021063 and 60928007).
- Schadt EE: Molecular networks as sensors and drivers of common human diseases. Nature. 2009, 461 (7261): 218-223. 10.1038/nature08454.View ArticlePubMedGoogle Scholar
- de la Fuente A: From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. Trends Genet. 2010, 26 (7): 326-333. 10.1016/j.tig.2010.05.001.View ArticlePubMedGoogle Scholar
- Nitsch D, Goncalves JP, Ojeda F, de Moor B, Moreau Y: Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics. 2010, 11: 460-10.1186/1471-2105-11-460.PubMed CentralView ArticlePubMedGoogle Scholar
- Nitsch D, Tranchevent LC, Thienpont B, Thorrez L, Van Esch H, Devriendt K, Moreau Y: Network analysis of differential expression for the identification of disease-causing genes. PLoS One. 2009, 4 (5): e5526-10.1371/journal.pone.0005526.PubMed CentralView ArticlePubMedGoogle Scholar
- Ma X, Lee H, Wang L, Sun F: CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics. 2007, 23 (2): 215-221. 10.1093/bioinformatics/btl569.View ArticlePubMedGoogle Scholar
- Morrison JL, Breitling R, Higham DJ, Gilbert DR: GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics. 2005, 6: 233-10.1186/1471-2105-6-233.PubMed CentralView ArticlePubMedGoogle Scholar
- Nitsch D, Tranchevent LC, Goncalves JP, Vogt JK, Madeira SC, Moreau Y: PINTA: a web server for network-based gene prioritization from expression data. Nucleic Acids Res. 2011, 39 (Web Server issue): W334-338.PubMed CentralView ArticlePubMedGoogle Scholar
- Hu R, Qiu X, Glazko G, Klebanov L, Yakovlev A: Detecting intergene correlation changes in microarray analysis: a new approach to gene selection. BMC Bioinformatics. 2009, 10: 20-10.1186/1471-2105-10-20.PubMed CentralView ArticlePubMedGoogle Scholar
- Ahn J, Yoon Y, Park C, Shin E, Park S: Integrative gene network construction for predicting a set of complementary prostate cancer genes. Bioinformatics. 2011, 27 (13): 1846-1853. 10.1093/bioinformatics/btr283.View ArticlePubMedGoogle Scholar
- Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A: A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. MolSyst Biol. 2008, 4: 169-Google Scholar
- Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, Bull S, Pawson T, Morris Q, Wrana JL: Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol. 2009, 27 (2): 199-204. 10.1038/nbt.1522.View ArticlePubMedGoogle Scholar
- Hudson NJ, Reverter A, Dalrymple BP: A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoSComput Biol. 2009, 5 (5): e1000382-Google Scholar
- Reverter A, Hudson NJ, Nagaraj SH, Perez-Enciso M, Dalrymple BP: Regulatory impact factors: unraveling the transcriptional regulation of complex traits from expression data. Bioinformatics. 2010, 26 (7): 896-904. 10.1093/bioinformatics/btq051.View ArticlePubMedGoogle Scholar
- Erten S, Bebek G, Ewing RM, Koyuturk M: DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization. BioData Min. 2011, 4: 19-10.1186/1756-0381-4-19.PubMed CentralView ArticlePubMedGoogle Scholar
- da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4 (1): 44-57.View ArticlePubMedGoogle Scholar
- Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, et al: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011, 39 (Database issue): D691-697.PubMed CentralView ArticlePubMedGoogle Scholar
- Reis-Filho JS, Pusztai L: Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet. 2011, 378 (9805): 1812-1823. 10.1016/S0140-6736(11)61539-0.View ArticlePubMedGoogle Scholar
- Hanahan D, Weinberg RA: Hallmarks of cancer: the next generation. Cell. 2011, 144 (5): 646-674. 10.1016/j.cell.2011.02.013.View ArticlePubMedGoogle Scholar
- Bochman ML, Schwacha A: The Mcm complex: unwinding the mechanism of a replicative helicase. MicrobiolMolBiol Rev. 2009, 73 (4): 652-683.Google Scholar
- Ge XQ, Jackson DA, Blow JJ: Dormant origins licensed by excess Mcm2-7 are required for human cells to survive replicative stress. Genes Dev. 2007, 21 (24): 3331-3341. 10.1101/gad.457807.PubMed CentralView ArticlePubMedGoogle Scholar
- Strebhardt K: Multifaceted polo-like kinases: drug targets and antitargets for cancer therapy. Nat Rev Drug Discov. 2010, 9 (8): 643-660. 10.1038/nrd3184.View ArticlePubMedGoogle Scholar
- Takai N, Hamanaka R, Yoshimatsu J, Miyakawa I: Polo-like kinases (Plks) and cancer. Oncogene. 2005, 24 (2): 287-291. 10.1038/sj.onc.1208272.View ArticlePubMedGoogle Scholar
- Trenz K, Errico A, Costanzo V: Plx1 is required for chromosomal DNA replication under stressful conditions. EMBO J. 2008, 27 (6): 876-885. 10.1038/emboj.2008.29.PubMed CentralView ArticlePubMedGoogle Scholar
- Carrano AC, Pagano M: Role of the F-box protein Skp2 in adhesion-dependent cell cycle progression. J Cell Biol. 2001, 153 (7): 1381-1390. 10.1083/jcb.153.7.1381.PubMed CentralView ArticlePubMedGoogle Scholar
- Lin HK, Chen Z, Wang G, Nardella C, Lee SW, Chan CH, Yang WL, Wang J, Egia A, Nakayama KI, et al: Skp2 targeting suppresses tumorigenesis by Arf-p53-independent cellular senescence. Nature. 2010, 464 (7287): 374-379. 10.1038/nature08815.PubMed CentralView ArticlePubMedGoogle Scholar
- Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, Pochanard P, Mozes E, Garraway LA, Pe'er D: An integrated approach to uncover drivers of cancer. Cell. 2010, 143 (6): 1005-1017. 10.1016/j.cell.2010.11.013.PubMed CentralView ArticlePubMedGoogle Scholar
- Lu X, Wang ZC, Iglehart JD, Zhang X, Richardson AL: Predicting features of breast cancer with gene expression patterns. Breast Cancer Res Treat. 2008, 108 (2): 191-201. 10.1007/s10549-007-9596-6.View ArticlePubMedGoogle Scholar
- Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d’Assignies MS, et al: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007, 13 (11): 3207-3214. 10.1158/1078-0432.CCR-06-2765.View ArticlePubMedGoogle Scholar
- Sabatier R, Finetti P, Cervera N, Lambaudie E, Esterni B, Mamessier E, Tallet A, Chabannon C, Extra JM, Jacquemier J, et al: A gene expression signature identifies two prognostic subgroups of basal breast cancer. Breast Cancer Res Treat. 2011, 126 (2): 407-420. 10.1007/s10549-010-0897-9.View ArticlePubMedGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, et al: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.View ArticlePubMedGoogle Scholar
- Lu TP, Tsai MH, Lee JM, Hsu CP, Chen PC, Lin CW, Shih JY, Yang PC, Hsiao CK, Lai LC, et al: Identification of a novel biomarker, SEMA5A, for non-small cell lung carcinoma in nonsmoking women. Cancer Epidemiol Biomarkers Prev. 2010, 19 (10): 2590-2597. 10.1158/1055-9965.EPI-10-0332.View ArticlePubMedGoogle Scholar
- Sanchez-Palencia A, Gomez-Morales M, Gomez-Capilla JA, Pedraza V, Boyero L, Rosell R, Farez-Vidal ME: Gene expression profiling reveals novel biomarkers in nonsmall cell lung cancer. Int J Cancer. 2011, 129 (2): 355-364. 10.1002/ijc.25704.View ArticlePubMedGoogle Scholar
- Landi MT, Dracheva T, Rotunno M, Figueroa JD, Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames M, Bergen AW, et al: Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One. 2008, 3 (2): e1651-10.1371/journal.pone.0001651.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles--database and tools update. Nucleic Acids Res. 2007, 35 (Database issue): D760-765.PubMed CentralView ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-193. 10.1093/bioinformatics/19.2.185.View ArticlePubMedGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 2003, 13 (10): 2363-2371. 10.1101/gr.1680803.PubMed CentralView ArticlePubMedGoogle Scholar
- Vidal M, Cusick ME, Barabasi AL: Interactome networks and human disease. Cell. 2011, 144 (6): 986-998. 10.1016/j.cell.2011.02.016.PubMed CentralView ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. ProcNatlAcad Sci U S A. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.