- Methodology article
- Open Access
A comparison of the functional modules identified from time course and static PPI network data
© Tang et al; licensee BioMed Central Ltd. 2011
- Received: 8 April 2011
- Accepted: 15 August 2011
- Published: 15 August 2011
Cellular systems are highly dynamic and responsive to cues from the environment. Cellular function and response patterns to external stimuli are regulated by biological networks. A protein-protein interaction (PPI) network with static connectivity is dynamic in the sense that the nodes implement so-called functional activities that evolve in time. The shift from static to dynamic network analysis is essential for further understanding of molecular systems.
In this paper, Time Course Protein Interaction Networks (TC-PINs) are reconstructed by incorporating time series gene expression into PPI networks. Then, a clustering algorithm is used to create functional modules from three kinds of networks: the TC-PINs, a static PPI network and a pseudorandom network. For the functional modules from the TC-PINs, repetitive modules and modules contained within bigger modules are removed. Finally, matching and GO enrichment analyses are performed to compare the functional modules detected from those networks.
The comparative analyses show that the functional modules from the TC-PINs have much more significant biological meaning than those from static PPI networks. Moreover, it implies that many studies on static PPI networks can be done on the TC-PINs and accordingly, the experimental results are much more satisfactory. The 36 PPI networks corresponding to 36 time points, identified as part of this study, and other materials are available at http://bioinfo.csu.edu.cn/txw/TC-PINs
- Gene Ontology
- Functional Module
- Periodic Gene
- Real Complex
- Gene Express Profile
Over the past decade, most research on biological networks has been focused on static topological properties, describing networks as collections of nodes and edges. Computational analysis of these networks has great potential in aiding our understanding of gene function, biological pathways and cellular organization. But, in reality, cellular systems are highly dynamic and responsive to cues from the environment . Cellular function and response patterns to external stimuli are regulated by biological networks, such as PPI, metabolic, signaling, transcription regulatory networks and neural synapses. Such networks are representations of large-scale dynamic systems. While significant progress has been made in computational analysis of proteome-scale cellular networks, the dynamics inherent within these networks are often overlooked in computational network analysis. Since there typically is little direct information available on the temporal dynamics of these network interactions, the majority of molecular interaction network modeling and analysis have been solely focused on static properties. However, proper cellular functioning requires the precise coordination of a large number of events and identifying the temporal and contextual signals underlying proposed interactions is a crucial part of understanding cellular function. Network maps are graphical representations of dynamic systems in life. A network with a static connectivity is dynamic in the sense that the nodes implement so-called functional activities evolving in time. In a biological context, these activities may represent the concentration of a molecule, the phosphorylation state of an enzyme, the expression level of a gene, or the depolarization of a neuron or circadian rhythm.
The moment has now come when the shift from static to dynamic network analysis is essential for further understanding of molecular systems. One of the very first things is to determine what we mean by interaction or network 'dynamics'. In simple terms, whether an interaction occurs or not depends upon spatial, temporal and/or contextual variation. Interactions may be constitutive or obligate, or may instead occur only in specific situations. Among these dynamically varying interactions (sometimes referred to as transient interactions), the variation may be either reactive (i.e., caused by exogenous factors, such as a response to some environmental stimulus) or programmed (i.e., due to endogenous signals, such as cell-cycle dynamics or developmental processes). Contextual variation overlaps heavily with temporal variation, but focuses more specifically on characterizing reactive variation and the conditions that cause it. Studying context may also encompass examining sequence or genetic variation within a population of contemporaries and exploring how that variation affects network interactions, topology and function . When development, disease progression and cyclical biological processes, e.g., the cell cycle, metabolic cycle  and even entire life cycles, are studied, time course analysis becomes an important tool. Recent research efforts have considered using static measurements to 'fill in the gaps' (the gaps refer to accurate temporal parameters that are not yet available for many protein-protein interactions) in the time series data , quantifying timing differences in gene expression and reconstructing regulatory relationships. By integrating yeast PPI networks with gene expression data, Han et al. suggested that some modules are active at specific times and locations . In a study that described dynamic protein complex formation during cell cycles, it was found that constitutively expressed and cell cycle-regulated proteins form protein complexes together at particular time points during the cell cycle . Qi et al. further noted that the integration of a variety of datasets, including binary interactions, protein complexes and expression profiles enables the identification of subnetworks that are active under certain conditions . Here we focus on the temporal aspects of networks, which allow us to study the dynamics of protein module assembly during the S. cerevisiae cell cycle. Although accurate temporal parameters are not yet available for PPI systems, by integrating additional biological resources that contain such information (e.g., gene expression data), people can solve or partially solve this problem. In this paper, Time Course Protein Interaction Networks (TC-PINs) are reconstructed by incorporating time series gene expression data  into a PPI network (http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM= 7).
Because we have unfolded a static PPI network in time (dynamics), it will be necessary to make a principal distinction between two biological concepts, namely, protein complexes and functional modules. A protein complex is a physical aggregation of several proteins (and possibly other molecules) via molecular interaction (binding) with each other at the same location and time. A functional module also consists of a number of proteins (and other molecules) that interact with each other to control or perform a particular cellular function. However, unlike protein complexes, these proteins do not necessarily interact at the same time and location . Song et al. utilized an external measure - the Gene Ontology(GO)  - to define functional modules . That is, for a GO biological process or cellular component functional term, the corresponding module contains all the proteins that are annotated with that term.
After the TC-PINs are constructed, a representative clustering algorithm  is selected and used to create functional modules from the TC-PINs. Then repetitive modules and those modules that are contained in bigger modules are removed. The same method used by Bader et al.  is also used to determine how effectively the remaining modules match the known modules. To have a further understanding of biological significance of the modules, Gene Ontology enrichment analysis is performed. Finally, as the point of reference, the clustering algorithm also uncovers modules from the static PPI network and the pseudorandom network and analyses of these similar results are underway.
The DIP (Database of Interacting Proteins) database lists protein pairs that are known to interact with each other. The interaction indicates that two amino acid chains are experimentally identified to bind to each other. The database lists such pairs to aid those studying a particular PPI, but it also aids those investigating entire regulatory and signaling pathways, as well as those studying the organization and complexity of the PIN at the cellular level. The PPI data of S. cerevisiae used in this work is from DIP (http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM = 7/), updated on Oct. 10, 2010. The static yeast PPI network includes 4,950 distinct proteins and 21,788 interactions totally. As is customary, self interactions representing autoregulation or protein homodimerization are not included in the analysis. Furthermore, duplicated interactions are ignored.
Time course gene expression data and periodic transcripts data of S. cerevisiae are from , updated on Apr 14, 2011. Raw microarray data are also available from Gene Expression Omnibus, accession number GSE3431. The dataset, in the form of a 9,335 × 36 matrix, includes expression profiles of 9,335 probes under 36 different time points. We map probe sets to gene symbols according to the annotation file provided by Affymetrix and thus obtain 6,777 budding yeast S. cerevisiae gene products. The periodic transcripts file contains data for 3552 unique expressed genes that are periodic with at least 95% confidence, which corresponds to 3656 probes .
Gene ontologies and annotations used in GO enrichment analysis are downloaded from http://geneontology.org (http://www.geneontology.org/gene-associations/submission/), updated on July 24, 2010.
Reconstruction of the TC-PINs
Before that, the first issue, perhaps, is to determine the consistency of both datasets selected. Upon comparing the 4,950 proteins extracted from a static PPI network (http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM=7) with 6,777 gene products from gene-expressing profiles , we find that they share 4,858 proteins. Thus, the gene-expressing profiles can cover more than 98% of the proteins in the static PPI network. In other words, the result shows that it is reasonable to combine the two datasets.
Then, a bigger challenge is how to choose an appropriate cutoff threshold in order to filter the gene expression profiles and retain merely the most biologically significant gene products. This threshold application step is a major juncture in which errors can be introduced in the form of both false negatives and false positives. By setting this threshold too high, important gene products can be lost. Similarly, we must be sure to remove gene products that have no apparent biological significance. Some of the methods that have been applied to the threshold selection problem in various types of networks include using an arbitrary threshold , retaining only the top x percent of the strongest relationships , permutation testing  and filtering based upon control spot correlations  or the statistical significance of the relationships [17, 18].
Tu at al.  used a continuous culture system to reveal a robust, metabolic cycle in budding yeast. Each cycle was characterized by a reductive, nonrespiratory phase followed by an oxidative, respiratory phase wherein the synchronized culture rapidly consumed molecular oxygen. After performing microarray analysis of gene expression, they found that over half of yeast genes (~ 3552) exhibited periodic expression patterns at a confidence level of 95% over three consecutive yeast metabolic cycle (12 time intervals per cycle). 1023 periodic genes encoding ribosomal proteins, translation initiation factors, amino acid biosynthetic enzymes, small nuclear RNAs, RNA processing enzymes and proteins required for the uptake and metabolism of sulfur exhibit a similar expression pattern of peaking in the Ox(oxidative) metabolism. 977 periodic genes during the R/B (reductive/building) metabolism peak when cells begin to cease oxygen consumption. This set consists primarily of nuclear encoded mitochondrial genes as well as genes encoding histones, spindle pole components and proteins required for DNA replication and cell division. 1510 genes expressed maximally in the R/C (reductive/charging) metabolism encode proteins involved in nonrespiratory modes of metabolism and protein degradation. These periodic genes play an important role in yeast metabolic cycle, so they have biological significance. Moreover, Tu at al also indicate that the average expression levels of periodic transcripts is 1.7-fold higher than that of non-periodic transcripts. After looking into the expressing peak value of every periodic gene during one cycle (12 time intervals), we discover that about 82% periodic genes have expression peak value more than 1.6.
Therefore, to select a large number of periodic genes, we take a similar tactic as used by Ala et al.  to determine a potential threshold value. That is, for every time point, we set a fix threshold to filter the transcripts. Only the transcripts whose expression levels are higher than the threshold value will be remained.
Filtering gene expressing profiles
Reconstruction of the TC-PINs
If two interacting proteins in the static PPI network also present in the gene product set at a certain time point, the two proteins and their interaction form a part of a TC-PIN at the time point. The process is repeated until the TC-PIN is created. Similarly, 36 TC-PINs can be reconstructed.
Identifying functional modules from the TC-PINs
The next urgent task is to identify meaningfully functional modules from the TC-PINs. So far, the Markov Cluster (MCL)  algorithm seems to be one of the most successful clustering procedures used in partitioning a PPI network into densely connected modules. In 2001, Enright et al.  used MCL to assign proteins into families based on precomputed sequence similarity information. Their results show that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. Brohee and Helden  applied four algorithms, Molecular Complex Detection (MCODE) , Super Paramagnetic Clustering (SPC) , Restricted Neighborhood Search Clustering (RNSC)  and Markov Clustering (MCL) to six protein interaction networks obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. They found that the analysis of high-throughput data supported the superiority of MCL for the extraction of complexes from interaction networks. Vlasblom and Wodak  found that the advantage of MCL over a number of other procedures which were specifically designed for partitioning protein interactions graphs was dramatic for unweighted protein interaction graphs. Their experimental results show that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the other algorithms. For MCL algorithm, the inflation parameter can be set as different values. Wu et al.  concluded that 1.9 was the best inflation parameter for the DIP data. Our experimental results show that the optimal inflation parameter is 2.0 when the MCL algorithm is applied to the yeast PPI network. MCL thus remains the method of choice for identifying protein functional modules from the TC-PINs. The following paragraph will briefly outline the principles of MCL. The MCL process consists of two operators called expansion and inflation. It involves changing the values of a transition matrix toward either 0 or 1 at each step in a random walk, until the stochastic condition is satisfied. The algorithm first adds self-loops to the input graph - by default, the loop weight for each node is assigned as the maximum weight of all edges connected to the node - and then translates this graph into a stochastic 'Markov' matrix. This matrix represents the transition probabilities between all pairs of nodes and the probability of a random walk of length n between any two nodes can be calculated by raising this matrix to the exponent n- a process referred to as expansion. The inflation step introduces a non-linearity into the process, in order to strengthen intra-cluster flow and weaken inter-cluster flow. Since greater path lengths are more common within clusters than between different clusters, the probabilities between nodes in the same module will typically be higher in expanded matrices. MCL further exaggerates this effect by taking entry wise exponents of the expanded matrix and then rescaling each column so that it remains stochastic. Iterating expansion and inflation will subdivide the PPI network into many segments as protein functional modules or complexes.
The MCL procedure is applied to create candidate functional modules from each TC-PIN. Then, a script program implemented using the Perl language is used to remove modules that include only one gene product or belong to another one. Redundant modules are also removed.
A problem of evaluation is that a certain proportion of interacting proteins can be assigned to the same modules by chance. In order to estimate the random expectation of correct grouping, NetworkAnalyzer (http://www.mpi-inf.mpg.de/) is used to preserve the connectivity of each node, while edges are reallocated at random to build a pseudorandom network of the same size (consisting of the same number of nodes and edges) as static yeast PPI networks.
where, i refers to the number of proteins shared by a predicted module and a benchmark module, g is the number of proteins in the predicted module and h is the number of proteins in the benchmark module. If OS (Overlap Score) is 1, it means that a predicted module has the same proteins as a benchmark module. On the contrary, when OS equals to 0, there is not a shared protein between the predicted module and the benchmark module .
The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains: Cellular Component (CC), the parts of a cell or its extracellular environment; Molecular Function (MF ), the elemental activities of a gene product at the molecular level, such as binding or catalysis; and Biological Process (BP), operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs and organisms. The GO ontology is structured as a directed acyclic graph and each term has defined relationships to one or more other terms in the same domain and sometimes to other domains. The GO vocabulary is designed to be species-neutral and includes terms applicable to prokaryotes and eukaryotes, single and multicellular organisms.
A module is associated with a known function by determining whether the number of proteins known to be annotated with the function is enriched, as judged by the hypergeometric distribution. The P-value can be used to determine the probability that a given set of proteins is enriched by a given functional group by random chance. In , it is used as a criterion to assign each cluster to a known function. The smaller the P-value, the more evidence the clustering is not random. In terms of GO annotations, a group of genes with a smaller P-value is more significant than the one with a higher P-value.
Based on above formulation, a P-value is calculated for each of three ontologies. In the case of multiple annotations from the same ontology, the one with the smaller P-value is assigned to the cluster as functional annotation. That being said, the P-value without any restriction is not enough to label clusters as significant. Hence we use the recommended cutoff value of 0.01  in order to select significant modules within each ontology.
A popular software package for evaluating the statistical significance of GO terms represented in a set of genes extracted from a population is GO::TermFinder, which calculates P-values using formula (3) . GO::TermFinder accepts a list of genes of interest and returns a list of GO terms with which the genes are associated, with corresponding P-values and FDR values (if desired) associated with the enrichment of these terms in the gene list. In this research, the direct use of GO TermFinder is not convenient for analyzing GO enrichment of more than 2000 modules uncovered from the TC-PINs, because this software package can only handle one module at a time. Therefore, combined with the latest version of this toolkit , we have used the Perl language to develop a procedure that can automatically process a large number of functional modules in turn.
Functional modules in various networks
The properties of functional modules predicted from various networks.
Static PPI network
Comparison with the known modules
The results of various networks.
Static PPI network
OS ≥ 0.0
OS ≥ 0.1
OS ≥ 0.2
OS ≥ 0.3
OS ≥ 0: 4
OS ≥ 0.5
OS ≥ 0.6
OS ≥ 0.7
OS ≥ 0.8
OS ≥ 0.9
OS = 1.0
The number of predicted modules is shown against the number of matched known modules over a range of OS thresholds from threshold of 0 to 1.0 (in 0.1 increments). Threshold of 0 means that a predicted module need not share any proteins with a known module to be considered a match. That is, all modules in three different networks can perfectly match all of the 408 benchmark modules with OS = 0.
Table 2 indicates that the matched results from the TC-PINs are considerably better than those from the static PPI network-not to mention the pseudorandom network. In , Bader et al. research the effect of Overlap Score threshold on number of predicted and matched known complexes and find that the average and maximum number of matched known complexes drops more quickly from zero until an OS threshold of 0.2 than from 0.2 to 0.9 indicating that many predicted complexes only have one or a few proteins that overlap with known complexes. An OS threshold value which falls within the region from 0.2 to 0.3 thus seems to filter out most predicted complexes that have insignificant overlap with known complexes. In Table 2, Mp is the number of correct predictions which match at least a real complex and Mk is the number of real complexes that match at least a predicted functional module. As shown in Table 2, when OS = 0.2, out of 2063 functional modules predicted from the TC-PINs, 443 match 232 real complexes; but out of 932 complexes identified from the static PPI network, 175 match 197 real complexes. With OS = 0.3, out of 2063 functional modules predicted from the TC-PINs, 290 match 159 real complexes; but out of 932 complexes identified from the static PPI network, 131 match 142 real complexes. Next, the three types of evaluation metrics described earlier are used to evaluate the quality of the predicted modules. For the reason stated above, the typical value of 0.2 is chosen as the threshold to take the specificity, sensitivity and f-measure analysis.
Comparison of the Sn(sensitivity), Sp(specificity) and f-measure of various networks.
Static PPI Network
Comparative analysis results in this subsection confirm that dynamic networks using temporal information (gene expressing profiles) improves our ability to discover biologically meaningful modules.
GO enrichment analyses
In many studies, the GO has been used as the 'gold standard' to validate the functional relevance of the obtained network modules. In this subsection, as described by GO enrichment analysis, we used the GO biological process annotation, the GO molecular function annotation, and the GO cellular component to take GO enrichment analysis with the developed analytical tool based on GO::TermFinder software package .
BP analysis of modules identified from three kinds of networks.
Proportion of Significant modules
Static PPI Network
MF analysis of modules identified from three kinds of networks.
Proportion of Significant modules
Static PPI Network
CC analysis of modules identified from three kinds of networks.
Proportion of Significant modules
Static PPI Network
BP functional enrichment of the identified modules of size ≥3.
E-15 to E-10
E-10 to E-5
E-5 to 0.01
Static PPI Network
MF functional enrichment of the identified modules of size ≥3.
E-15 to E-10
E-10 to E-5
E-5 to 0.01
Static PPI Network
CC functional enrichment of the identified modules of size ≥3.
E-15 to E-10
E-10 to E-5
E-5 to 0.01
Static PPI Network
Selected functional modules predicted from the TC-PINs and their P-values.
Predicted functional modules
YNL151C YJL011C YOR116C YNL113W YNR003C YOR224C YPR187W YPR110C YOR207C YDL150W YDR045C YPR190C YKR025W YBR154C YKL144C YOR341W YBR150C YDL164C YDR200C YBL015W YKL103C YKL218C YFR040W YPR067W YOR210W YLL019C YNL248C YPL150W
DNA-directed RNA polymerase III complex
YOR179C YJR093C YLR115W YKL018W YNL222W YAL043C YDR228C YNL317W YKL059C YDR195W YER133W YDR301W YPR107C YGR156W YLR277C YDR412W YMR260C YHR100C YJL033W YOR250C YKR002W YML030W YGL256W YOR227W
mRNA cleavage and polyadenylation specificity factor complex
YGL004C YFR004W YLR421C YDR363W-A YDL147W YKL145W YPR108W YFR052W YGR232W YOR261C YIL075C YGL048C YER021W YDR427W YDL097C YDL007W YFR010W YHL030W YBR272C YBL084C
19/22S regulator complex
YFR036W YDL008W YKL022C YLR102C YDR118W YNL172W YOR249C
anaphase- promoting complex
YLR166C YER008C YGL233W YIL068C YDR166C YBR102C YIL068C YPR055W YMR002W
YLR192C YOR361C YMR309C YDR429C YBR079C YDR091C YMR146C YPR041W YNL029C
YOR115C YDR246W YGR166W YBR254C YDR407C YKR068C YDR108W YML077W YGR143W
YNR035C YKL013C YIL062C YLR370C YBR234C YJR065C YPR019W YNL040W YDL029W YNL012W
Arp2/3 protein complex
YHL025W YMR033W YPR034W YJL176C YBR289W YOR290C YPL016W YNR023W YFL049W YOR038C
YGL226C-A YDL232W YJL002C YOR103C YGL022W YOR085W YEL002C YBL105C YGL247W YLR220W
oligosaccharyl transferase complex
Periodic genes in predicted functional modules
Effect of the threshold selection
The results of the TC-PINs corresponding to various threshold values and the static PPI network.
Mp Mk f-measure
TC-PINs Threshold = 2.0
TC-PINs Threshold = 1.6
TC-PINs Threshold = 1.4
TC-PINs Threshold = 1.2
TC-PINs Threshold = 1.0
TC-PINs Threshold = 0.9
TC-PINs Threshold = 0.7
TC-PINs Threshold = 0.5
TC-PINs Threshold = 0.3
TC-PINs Threshold = 0.02
Static PPI network
In this paper, the TC-PINs are reconstructed by incorporating gene expression profiles into a static PPI network in order to discover the new biologically significant functional modules. And then we employ the MCL procedure to predict functional modules from the TC-PINs. Moreover, a series of comparative analyses on the matching and GO functional enrichment are carried out. The results show that compared with the static PPI network, there are much more biologically significant functional modules identified from the TC-PINs.
In addition, how to process the slightly different candidate functional modules is another problem worthy of research. A lot of functional modules are produced when the MCL algorithm runs on the TC-PINs. A few of these functional modules detected from the TC-PINs have probably the identical biological significance. Therefore, it is necessary to merge the functional modules. Certainly, it is easy to eliminate functional modules redundancy by discarding the modules whose all proteins belong to another module. Yet, merging the functional modules also creates a particular challenge: how to handle two modules sharing most but not all the proteins. Does the decision of whether or not to merge two slightly different modules depend on a small number of proteins not shared by the two modules? We believe that it will be valuable in the future to do an in-depth study of this problem.
In spite of the issues that still need to be resolved in the study of the TC-PINs, our research represents a successful, fundamental shift in the study of PPI from the static network to dynamic networks. Previous research on static PPI networks, such as identification of functional modules, protein function predictions, essential proteins and so on, can be performed on the TC-PINs and the resulting experimental results are much more satisfactory.
This work is supported in part by the National Natural Science Foundation of China under Grant Nos.61073036, 61003124, the Ph.D. Programs Foundation of Ministry of Education of China No.20090162120073, the Freedom Explore Program of Central South University No.201012200124, High-tech Program of China Hunan Provincial Science and Technology Department No.2010GK3049, Aid program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province No.2010212, the U.S. National Science Foundation under Grants CCF-0514750, CCF-0646102 and CNS-0831634.
- Jin R, Mccallen S, Liu C, Xiang Y, Almaas E, Zhou XH: Identify Dynamic Network Modules with Temporal and Spatial Constraints. Proc Pacific Symp Biocomputing (PSB) 14: 203–214.Google Scholar
- Przytycka TM, Singh M, Slonim DK: Toward the dynamic interactome: it's about time. Brief Bioinform 2010, 11: 15–29. 10.1093/bib/bbp057PubMed CentralView ArticlePubMedGoogle Scholar
- Tu BP, Kudlicki A, Rowicka M, McKnight SL: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310: 1152–1158.Google Scholar
- Simon I, Siegfried Z, Ernst J, Bar-Joseph Z: Combined static and dynamic analysis for determining the quality of time-Series expression profiles. Nature Biotechnology 2005, 23(12):1503–1508. 10.1038/nbt1164View ArticlePubMedGoogle Scholar
- Han JDJ, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJM, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 2004, 430(6995):88–93. 10.1038/nature02555View ArticlePubMedGoogle Scholar
- de Lichtenberg U, Jensen LJ, Brunak S, Bork P: Dynamic complex formation during the yeast cell cycle. Science 2005, 307: 724–727. 10.1126/science.1105103View ArticlePubMedGoogle Scholar
- Qi Y, Ge H: Modularity and dynamics of cellular networks. PLoS Computational Biology 2006, 2: 1502–1510.View ArticleGoogle Scholar
- Li XL, Wu M, Kwoh CK, Ng SK: Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 2010, 11(suppl+1):S3.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Song J, Singh M: How and when should interactome-derived clusters be used to predict functional modules and protein function? BMC Bioinformatics 2009, 25(23):3143–3150.View ArticleGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA: An e cient algorithm for large-scale detection of protein families. Nucleic Acids Research 2002, 30(7):1575–1584. 10.1093/nar/30.7.1575PubMed CentralView ArticlePubMedGoogle Scholar
- Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003., 4(2):Google Scholar
- Freeman TC, Goldovsky L, Brosch M, van Dongen S, Mazi'ere P, Grocock RJ, Freilich S, Thornton J, Enright AJ: Construction, visualization, and clustering of transcription networks from microarray expression data. PLoS Computational Biology 2007, 3(10):e206. 10.1371/journal.pcbi.0030206PubMed CentralView ArticleGoogle Scholar
- Ala U, Piro RM, Grassi E, Damasco C, Silengo L, Oti M, Provero P, Cunto FD: Prediction of human disease genes by humanmouse conserved coexpression analysis. PLoS Computational Biology 2008, 4(3):e1000043. 10.1371/journal.pcbi.1000043PubMed CentralView ArticlePubMedGoogle Scholar
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences of the United States of America 2000, 97(22):12182–12186. 10.1073/pnas.220392197PubMed CentralView ArticlePubMedGoogle Scholar
- Voy BH, Schar JA, Perkins AD, Saxton AM, Borate B, Chesler EJ, Branstetter LK, Langston MA: Extracting gene networks for low-dose radiation using graph theoretical algorithms. PloS Computational Biology 2006, 2(7):e89. 10.1371/journal.pcbi.0020089PubMed CentralView ArticlePubMedGoogle Scholar
- Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Research 2004, 14: 1085–1094. 10.1101/gr.1910904PubMed CentralView ArticlePubMedGoogle Scholar
- Moriyama M, Hoshida Y, Otsuka M, Nishimura S, Kato N, Goto T, Taniguchi H, Shiratori Y, Seki N, Omata M: Relevance network between chemosensitivity and transcriptome in human hepatoma cells. Molecular Cancer Therapeutics 2003, 2: 199–205.PubMedGoogle Scholar
- Brohée S, van Helden J: Evaluation of clustering algorithms for proteinprotein interaction networks. BMC Bioinformatics 2006, 7: 488. 10.1186/1471-2105-7-488PubMed CentralView ArticlePubMedGoogle Scholar
- Blatt M, Wiseman S, Domany E: Superparamagnetic clustering of data. Physical Review 1998, 57(4):3767–3783.Google Scholar
- King AD, Przulj N, Jurisica I: Protein complex prediction via cost-based clustering. Bioinformatics 2004, 20(17):3013–3020. 10.1093/bioinformatics/bth351View ArticlePubMedGoogle Scholar
- Vlasblom J, Wodak S: Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics 2009, 10: 99. 10.1186/1471-2105-10-99PubMed CentralView ArticlePubMedGoogle Scholar
- Wu M, Li XL, Kwoh K: Algorithms for Detecting Protein Complexes in PPI Networks: An Evaluation Study. (Supplementary paper presented at) International Conference on Pattern Recognition in Bioinformatics (PRIB); 2008 Oct 15–17; Melbourne, Australia 2008: 135–146.Google Scholar
- Pu S, Wong J, Turner B, Cho E, Wodak SJ: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res 2009, 37(3):825–831. 10.1093/nar/gkn1005PubMed CentralView ArticlePubMedGoogle Scholar
- Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics 2006, 7: 207–219. 10.1186/1471-2105-7-207PubMed CentralView ArticlePubMedGoogle Scholar
- Hu H, Yan X, Huang Y, Han J, Zhou X: Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 2005, 21(suppl 1):213–221. 10.1093/bioinformatics/bti1049View ArticleGoogle Scholar
- Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G: GO::TermFinder-open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 2004, 20(18):3710–3715. 10.1093/bioinformatics/bth456PubMed CentralView ArticlePubMedGoogle Scholar
- Daraselia N, Yuryev A, Egorov S, Mazo I, Ispolatov I: Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks. BMC Bioinformatics 2007, 8: 243. 10.1186/1471-2105-8-243PubMed CentralView ArticlePubMedGoogle Scholar
- Maraziotis IA, Dimitrakopoulou K, Bezerianos A: Growing functional modules from a seed protein via integration of protein interaction and gene expression data. BMC Bioinformatics 2007, 8: 408. 10.1186/1471-2105-8-408PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.