Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network
© Yoon et al; licensee BioMed Central Ltd. 2010
Published: 7 October 2010
Finding reliable gene markers for accurate disease classification is very challenging due to a number of reasons, including the small sample size of typical clinical data, high noise in gene expression measurements, and the heterogeneity across patients. In fact, gene markers identified in independent studies often do not coincide with each other, suggesting that many of the predicted markers may have no biological significance and may be simply artifacts of the analyzed dataset. To find more reliable and reproducible diagnostic markers, several studies proposed to analyze the gene expression data at the level of groups of functionally related genes, such as pathways. Studies have shown that pathway markers tend to be more robust and yield more accurate classification results. One practical problem of the pathway-based approach is the limited coverage of genes by currently known pathways. As a result, potentially important genes that play critical roles in cancer development may be excluded. To overcome this problem, we propose a novel method for identifying reliable subnetwork markers in a human protein-protein interaction (PPI) network.
In this method, we overlay the gene expression data with the PPI network and look for the most discriminative linear paths that consist of discriminative genes that are highly correlated to each other. The overlapping linear paths are then optimally combined into subnetworks that can potentially serve as effective diagnostic markers. We tested our method on two independent large-scale breast cancer datasets and compared the effectiveness and reproducibility of the identified subnetwork markers with gene-based and pathway-based markers. We also compared the proposed method with an existing subnetwork-based method.
The proposed method can efficiently find reliable subnetwork markers that outperform the gene-based and pathway-based markers in terms of discriminative power, reproducibility and classification performance. Subnetwork markers found by our method are highly enriched in common GO terms, and they can more accurately classify breast cancer metastasis compared to markers found by a previous method.
Given the high-throughput genomic data from microarray experiments, one challenge is to find effective biomarkers associated with a complex disease, such as cancer. Extensive work has been done to identify differentially expressed genes across different phenotypes [1–5], which can be used as diagnostic markers for classifying different disease states or predicting the clinical outcomes [6–11]. However, finding reliable gene markers is very challenging for a number of reasons. The small sample size of typical clinical data is one important factor that makes this problem difficult. We often have to select a small number of gene markers from thousands of genes based on a limited number of samples, which makes the performance of the traditional feature selection methods very unpredictable . In addition to this, the inherent measurement noise in microarray experiments and heterogeneity across samples aggravate this problem further [13–16]. Moreover, previous methods often select gene markers only based on their expression data. Therefore, it is possible that some of the selected gene markers may be functionally related, hence contain redundant information which may lead to the degradation of the overall classification performance.
To address the aforementioned problems, several studies proposed to interpret the expression data at the level of groups of functionally related genes, such as pathways derived from microarray studies [17–19], GO annotations , and other sources. Methods have been developed to capture the overall expression changes of a given pathway by jointly analyzing the expression levels of its member genes. For example, Guo et al.  used the mean or median expression level of the member genes as the pathway activity. Tomfohr et al.  analyzed the expression levels of genes in a pathway using singular value decomposition (SVD), and they used the eigenvector with the largest eigenvalue as the pathway activity. Lee et al.  estimated the pathway activity by computing the average expression level of the condition responsive genes (CORGs) within a pathway. More recently, another method has been proposed based on a simple probabilistic model, which estimates the pathway activity that contributes to the phenotype of interest based on the log likelihood ratios (LLR) of the member genes . These pathway-based methods showed that pathway markers are generally more reliable compared to gene markers and that they lead to better or comparable classification performance [21–24]. The main advantage of the pathway-based methods is that they can reduce the effect of the measurement noise and that of the correlations between genes that belongs to the same pathway. Moreover, pathway markers can provide important biological insights into the underlying mechanisms that lead to different disease phenotypes. However, pathway-based methods also have some shortcomings. First, currently known pathways cover only a limited number of genes and may not include key genes with significant expression changes across different phenotypes. Besides, many pathways overlap with each other, hence the activity of such pathway markers may be highly correlated. One possible way to alleviate these problems is to directly identify such markers in a large protein-protein interaction (PPI) network. In a recently published paper , Chuang et al. tried to identify subnetwork markers by overlaying gene expression data on the corresponding proteins in a PPI network. They started from the so-called seed proteins in the PPI network that have high discriminative power and greedily grew subnetworks from them to maximize the mutual information between the subnetwork activity score and the class label. They showed that subnetwork markers yield more accurate classification results and have better reproducibility compared to gene markers.
In this paper, we propose a new method for identifying effective subnetwork markers from a PPI network by performing a global search for differentially expressed linear paths using dynamic programming. After finding the most discriminative linear paths, we combine overlapping paths into subnetworks through a greedy approach and use those subnetworks as diagnostic markers for classifying breast cancer metastasis. To test the effectiveness of our subnetwork markers, we perform cross validation experiments based on two independent breast cancer datasets. We compare the performance of our method with a gene-based method, a pathway-based method  and a previously proposed subnetwork-based method . The results show that the proposed method finds reliable subnetwork markers that can accurately classify breast cancer metastasis. We also perform an enrichment analysis and show that the identified subnetwork markers are highly enriched with proteins that have common GO terms.
Results and discussion
Identification of subnetwork markers
We obtained two independent breast cancer datasets from the large-scale expression studies in Wang et al.  (referred as the USA dataset) and van’t Veer et al.  (referred as the Netherlands dataset). The USA dataset contains 286 samples and the Netherlands dataset contains 295 samples. Metastasis had been detected for 78 patients in the Netherlands dataset and 107 patients in the USA dataset during the five-year follow-up visits after the surgery. The PPI network has been obtained from Chuang et al. , which contains 57,235 interactions among 11,203 proteins. Since not all proteins have corresponding genes in the microarray platforms used by the two breast cancer studies, we used the induced network which contains 9,263 proteins and 49,054 interactions for the USA dataset, and 8,380 proteins and 31,201 interactions for the Netherlands dataset.
Our proposed method integrates the gene expression data and the PPI data by overlaying the expression value of each gene on its corresponding protein in the PPI network. The subnetwork identification algorithm consists of the following three major steps:
Step 1: Search for highly discriminative linear paths whose member genes are closely correlated to each other
To find discriminative linear paths in the large PPI network, we define a scoring scheme that incorporates both the t-test statistics scores of the member genes and the correlation coefficient between their expression values. This scoring scheme takes a weighted sum of the t-scores of the member genes within a given path. The weights depend on the correlation between the member genes and the parameter θ, where θ is introduced to control the trade off between the “discriminative power” of individual genes and the “correlation” between the member genes (see Methods). Based on the above scoring scheme, we developed an algorithm that searches for the top scoring linear paths that have length l and end at node g i .
Step 2: Combine top scoring linear paths into a subnetwork
We initialize the subnetwork using the path with the highest score. As long as there exists a high scoring path that overlaps with the current subnetwork, we combine them and check if the discriminative power of the new subnetwork is larger than that of the previous subnetwork. If the discriminative power improves, we keep the new subnetwork. Otherwise, we keep the previous subnetwork and check the next best path. To evaluate the discriminative power of subnetworks, we applied the probabilistic pathway activity inference method proposed in  to infer the subnetwork activity. The discriminative power of a subnetwork is assessed by computing the t-test statistics score of the subnetwork activity.
Step 3: Update the PPI network
After identifying the discriminative subnetwork, we update the PPI network by removing the proteins in the identified subnetwork from the current PPI network. In order to find additional non-overlapping subnetworks, we repeat the search process from Step 1.
Statistics of the subnetwork markers identified by the proposed method.
Number of genes
Number of genes in common
The identified subnetworks are enriched with proteins that have common GO terms
Enrichment analysis results for the sample subnetworks shown in Figure 1.
cell fate commitment
programmed cell death
nucleotide-excision repair, DNA damage removal
DNA catabolic process
structure-specific DNA binding
cell cycle phase
cell cycle process
regulation of cell cycle
proteasome regulatory particle
DNA replication checkpoint
negative regulation of DNA replication initiation
regulation of DNA replication initiation
anaphase-promoting complex-dependent proteasomal ubiquitin-dependentprotein catabolic process
negative regulation of ubiquitin-protein ligase activity during mitotic cell cycle
negative regulation of ligase activity
positive regulation of ubiquitin-protein ligase activity during mitotic cell cycle
negative regulation of ubiquitin-protein ligase activity
positive regulation of ubiquitin-protein ligase activity
regulation of ubiquitin-protein ligase activity during mitotic cell cycle
positive regulation of ligase activity
regulation of ubiquitin-protein ligase activity
regulation of ligase activity
proteasomal protein catabolic process
proteasomal ubiquitin-dependent protein catabolic process
cell cycle process
ubiquitin-dependent protein catabolic process
DNA metabolic process
The subnetwork markers identified by the proposed method are more discriminative and reproducible
Next, we compared the identified subnetwork markers with gene markers, pathways markers  and the subnetwork markers identified by Chuang et al. . For gene markers, we selected the top 50 genes based on the absolute t-score among all genes covered by the 50 identified subnetworks. For pathway markers, we selected the top 50 pathways among the 639 pathways in the C2 curated gene sets in MsigDB (Molecular Signatures Database) . We also obtained the subnetworks identified by Chuang et al.  from the Cell Circuits database  (149 discriminative subnetworks for the Netherlands dataset and 243 subnetworks for the USA dataset). We chose the top 50 subnetworks out of 149 subnetworks based on the Netherlands dataset and the top 50 subnetworks out of 243 subnetworks based on the USA dataset. The pathways and subnetworks were ranked using the scheme proposed by Tian et al. , based on the average absolute t-test statistics score of all the member genes. For subnetwork markers identified by Chuang et al., we computed the t-scores of their member genes using the original expression values. For pathway markers, t-scores of the member genes were computed using their log-likelihood ratios as in  (see Methods). To assess the discriminative power of the subnetwork markers identified using the proposed method, their activity score was inferred using the probabilistic inference method proposed in . For subnetwork markers identified by Chuang et al., we inferred their activity score using the mean expression value of the member genes as reported in their paper .
Subnetwork markers identified by the proposed method improve classification performance
To evaluate the performance of the classifiers that are constructed using the subnetwork markers identified by the proposed method, we performed the following within-dataset and cross-dataset cross-validation experiments.
In the within-dataset experiments, the top 50 subnetwork markers identified using one of the two breast cancer datasets were used to build the classifier. The dataset was divided into ten folds of equal size, one of them was withheld as the “test set” and the remaining nine were used for training the classifier. In the training set, six folds (referred as the “marker ranking set”) were used to rank the subnetwork markers according to their discriminative power and to build the classifier using logistic regression. The other three folds (referred as the “feature selection set”) were used for feature selection. We started with the top ranked subnetwork marker and enlarged the feature set by adding features sequentially. Every time we included a new subnetwork marker into the feature set, a new classifier was built using the marker ranking set and it was tested on the feature selection set. For all the samples in the feature selection set, the classifier can compute the posterior probabilities of the class label (metastasis versus metastasis-free), based on which we can estimate the AUC (Area Under ROC Curve) . The AUC metric provides a useful statistical summary of the classification performance over the entire range of sensitivity and specificity. We retained the new subnetwork marker if the AUC (estimated on the feature selection set) increased; otherwise, we discarded the subnetwork marker and continued to test the remaining ones. The above experiment was repeated 500 times based on 50 random ten-fold splits. The average AUC was reported as the classification performance measure.
To evaluate the reproducibility of the subnetwork markers, we performed the following cross-dataset experiments. We first identified the top 50 subnetwork markers based on one dataset and performed cross-validation experiments on the other dataset, following a similar procedure that was used in the previously described within-dataset experiments.
In this paper, we proposed a new method for identifying effective subnetwork markers in a protein-protein interaction (PPI) network. As shown throughout this paper, integrating the PPI network with microarray data can overcome some of the shortcomings of the gene-based and pathway-based methods. First of all, using a genome-scale PPI network provides a better coverage of the genes in the microarray studies compared to using known pathways obtained from public databases. Second, the network topology provides prior information about the relationship between proteins, hence the genes that code for these proteins. Subnetworks identified by integrating the network structure and the gene expression data can cluster proteins (or genes) that are functionally related to each other. By aggregating the expression values of the member genes, subnetwork markers can avoid selecting single gene markers with redundant information. Furthermore, the discriminative subnetworks identified by the proposed method can also provide us with important clues about the biological mechanisms that lead to different disease phenotypes. The proposed method finds top scoring linear paths using dynamic programming and combines them into a subnetwork by greedily optimizing the discriminative power of the resulting subnetwork marker. We developed a scoring scheme that is used by the search algorithm to find linear paths that consist of discriminative genes that are highly correlated to each other. The proposed algorithm allows us to control the trade off between maximizing the discriminative power of the member genes within a given linear path and increasing the correlation between the member genes, by choosing the appropriate value for θ. As the subnetwork markers are constructed based on the top scoring linear paths, instead of single genes, the proposed method is expected to yield more robust subnetwork markers. Another important advantage of our method is that it can find non-overlapping subnetwork markers. This can reduce the overall redundancy among the identified markers. In this paper, the activity of the identified subnetwork markers were inferred using the probabilistic activity inference scheme proposed in . This allows us to find better subnetwork markers, since it can assess their discriminative power more effectively.
As shown in this paper, the identified subnetwork markers consist of proteins that share common GO terms. The classifiers based on the subnetwork markers identified using the proposed method were shown to achieve higher classification accuracy in both within-dataset and cross-dataset experiments compared to classifiers based on other markers. These results suggest that the method proposed in this paper can find effective subnetwork markers that can more accurately classify breast cancer metastasis and are more reproducible across independent datasets.
Probabilistic inference of subnetwork activity
In order to estimate the conditional probability density function , we assume that the gene expression level of gene g i under phenotype k follows a Gaussian distribution with mean and standard deviation . The parameters are empirically estimated using the samples with phenotype k. Given the log-likelihood ratio of each gene, the subnetwork activity is defined as the sum of the log-likelihood ratios of the member genes .
Evaluating the discriminative power of linear paths in the PPI network
where (I is the identity matrix), and J is an all-one-column vector. We use a normalization factor of to ensure that the overall score does not depend on the length of the path. We use θ to control the trade off between maximizing the discriminative power of the genes within the identified path and increasing the correlation between its member genes. When θ = 0, the weight for the t-score of a given gene g i is determined by the average correlation between the log-likelihood ratios of g i and g j , where j ≠ i. As θ increases, we give more weight on the discriminative power of individual genes than the correlation between member genes. Especially, when θ → ∞, we get Σ′(λ) = I. In this case, the pathway score S(λ) is simply the average t-score of the member genes in λ, and the proposed subnetwork marker identification method reduces to its preliminary version proposed in . The above scoring scheme is used for finding the top linear paths in the network G as we describe in the following section.
Searching for discriminative linear paths
Let G = (E, V) denote the PPI network, where V is the set of nodes (i.e., proteins), E is the set of edges (i.e., protein interactions). Suppose there are N proteins in G. Then we can represent E as an N-dimensional binary matrix. For any protein pair (v a , v b ), where v a , v b ∊V, we let E[v a , v b ] = 1, if v a , v b are connected; E[v a , v b ] = 0, otherwise. Based on the scoring scheme defined in the previous section, we search for top discriminative linear paths using dynamic programming. We define λ(v i , l) as the optimal linear path among all linear paths that have length l and end at v i . The score of this optimal path is defined as
s(v i , l) = t α [λ(v i ,l)] Σ′[λ(v į , l)].
Initialization: l = 1, ∀v i ∊ V,
for l = 2 to L,
for ∀v i ∊V, 1 ≤ l ≤ L,
S(λ(v i , l)) = s(v i , l)/l2.(1)
Although the above algorithm finds only the top path for every (v i , l), we can easily modify it to find the top M discriminative paths. Increasing M allows us to find better linear paths with higher discriminative power, but it will also increase the computational complexity of the algorithm.
Combining top overlapping paths into a subnetwork
G s ← λ i , G temp ← G s , i = 1.
i = i + 1; If λ i ∩ G s ≠ ø, G temp ← G temp ∪ λ i .
If R(G temp ) > (1 +∊)R(G s ), G s ← G temp ; else G temp ← G s .
Go to (ii) if i < m; otherwise, terminate.
Here ∊ is set as 0.01 to avoid over-fitting to the expression data. We used the activity inference method in  to computed the actual activity score of G s . Then, R(G s ) is computed as the t-test statistics of the subnetwork activity score.
After obtaining a subnetwork G s , we removed it from the network G by setting E[v s , v i ] = E[v i , v s ] = 0, ∀v s εG s ,v i ∊G. Then, the whole process was repeated using the updated network to find additional subnetwork markers.
Conceived and designed the experiments: JS BJY ERD. Performed the experiments: JS. Analyzed the data: JS BJY ERD. Wrote the paper: JS BJY ERD.
We would like to thank the authors of , especially H.Y. Chuang and T. Ideker, for sharing the PPI network and their helpful communication.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 6, 2010: Proceedings of the Seventh Annual MCBIOS Conference. Bioinformatics: Systems, Biology, Informatics and Computation. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S6.
- Efron B, Tibshirani R: Empirical bayes methods and false discovery rates for microarrays. Genet. Epidemiol 2002, 23: 70–86. 10.1002/gepi.1124View ArticlePubMedGoogle Scholar
- Baldi P, Long AD: A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001, 17: 509–519. 10.1093/bioinformatics/17.6.509View ArticlePubMedGoogle Scholar
- Kepler TB, Crosby L, Morgan KT: Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol. 2002., 3: RESEARCH0037 RESEARCH0037 10.1186/gb-2002-3-7-research0037Google Scholar
- Ideker T, Thorsson V, Siegel AF, Hood LE: Testing for differentially-expressed genes by maximum-likelihood analysis of microarray data. J. Comput. Biol. 2000, 7: 805–817. 10.1089/10665270050514945View ArticlePubMedGoogle Scholar
- Chen Y, Dougherty ER, Bittner ML: Ratio-based decisions and the quantitative analysis of cDNA microarray images. Journal of Biomedical Optics 1997, 2: 364–374. 10.1117/12.281504View ArticlePubMedGoogle Scholar
- Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403: 503–511. 10.1038/35000501View ArticlePubMedGoogle Scholar
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286: 531–537. 10.1126/science.286.5439.531View ArticlePubMedGoogle Scholar
- Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat. Genet. 2003, 33: 49–54. 10.1038/ng1060View ArticlePubMedGoogle Scholar
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415: 530–536. 10.1038/415530aView ArticleGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EM, Atkins D, Foekens JA: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005, 365: 671–679.View ArticlePubMedGoogle Scholar
- West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U.S.A. 2001, 98: 11462–11467. 10.1073/pnas.201162998PubMed CentralView ArticlePubMedGoogle Scholar
- Hua DEJ, Tembe WD: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition 2008, 42: 409–424. 10.1016/j.patcog.2008.08.001View ArticleGoogle Scholar
- Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2005, 21: 171–178. 10.1093/bioinformatics/bth469View ArticlePubMedGoogle Scholar
- Symmans WF, Liu J, Knowles DM, Inghirami G: Breast cancer heterogeneity: evaluation of clonality in primary and metastatic lesions. Hum. Pathol. 1995, 26: 210–216. 10.1016/0046-8177(95)90039-XView ArticlePubMedGoogle Scholar
- Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM: Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310: 644–648. 10.1126/science.1117679View ArticlePubMedGoogle Scholar
- Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003, 34: 267–273. 10.1038/ng1180View ArticlePubMedGoogle Scholar
- Subramanian A, Tamayo P, feature VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 2005, 102: 15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. U.S.A. 2005, 102: 13544–13549. 10.1073/pnas.0506577102PubMed CentralView ArticlePubMedGoogle Scholar
- Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA, Marks JR, Dressman HK, West M, Nevins JR: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439: 353–357. 10.1038/nature04296View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25: 25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Guo Z, Zhang T, Li X, Wang Q, Xu J, Yu H, Zhu J, Wang H, Wang C, Topol EJ, Wang Q, Rao S: Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics 2005, 6: 58. 10.1186/1471-2105-6-58PubMed CentralView ArticlePubMedGoogle Scholar
- Tomfohr J, Lu J, Kepler TB: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 2005, 6: 225. 10.1186/1471-2105-6-225PubMed CentralView ArticlePubMedGoogle Scholar
- Lee E, Chuang HY, Kim JW, Ideker T, Lee D: Inferring pathway activity toward precise disease classification. PLoS Comput. Biol. 2008, 4: e1000217. 10.1371/journal.pcbi.1000217PubMed CentralView ArticlePubMedGoogle Scholar
- Su J, Yoon BJ, Dougherty ER: Accurate and reliable cancer classification based on probabilistic inference of pathway activity. PLoS ONE 2009, 4: e8161. 10.1371/journal.pone.0008161PubMed CentralView ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 2007, 3: 140. 10.1038/msb4100180PubMed CentralView ArticlePubMedGoogle Scholar
- Berriz GF, Beaver JE, Cenik C, Tasan M, Roth FP: Next generation software for functional trend analysis. Bioinformatics 2009, 25: 3043–3044. 10.1093/bioinformatics/btp498PubMed CentralView ArticlePubMedGoogle Scholar
- Mak HC, Daly M, Gruebel B, Ideker T: CellCircuits: a database of protein network models. Nucleic Acids Res. 2007, 35: D538–545. 10.1093/nar/gkl937PubMed CentralView ArticlePubMedGoogle Scholar
- Fawcett T: An introduction to ROC analysis. Patt Recog Letters 2006, 27: 861–874. 10.1016/j.patrec.2005.10.010View ArticleGoogle Scholar
- Su J, Yoon BJ: Identifying reliable subnetwork markers in protein-protein interaction network for classification of breast cancer metastasis. Acoustics, Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on 2010, 525–528. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5495633&isnumber=5494886] full_textView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.