# Deducing topology of protein-protein interaction networks from experimentally measured sub-networks

- Ling Yang
^{1}, - Thomas M Vondriska
^{1, 2, 3}, - Zhangang Han
^{1}, - W Robb MacLellan
^{1, 2}, - James N Weiss
^{1, 2}and - Zhilin Qu
^{1}Email author

**9**:301

**DOI: **10.1186/1471-2105-9-301

© Yang et al; licensee BioMed Central Ltd. 2008

**Received: **22 February 2008

**Accepted: **03 July 2008

**Published: **03 July 2008

## Abstract

### Background

Protein-protein interaction networks are commonly sampled using yeast two hybrid approaches. However, whether topological information reaped from these experimentally-measured sub-networks can be extrapolated to complete protein-protein interaction networks is unclear.

### Results

By analyzing various experimental protein-protein interaction datasets, we found that they are not random samples of the parent networks. Based on the experimental bait-prey behaviors, our computer simulations show that these non-random sampling features may affect the topological information. We tested the hypothesis that a core sub-network exists within the experimentally sampled network that better maintains the topological characteristics of the parent protein-protein interaction network. We developed a method to filter the experimentally sampled network to result in a core sub-network that more accurately reflects the topology of the parent network. These findings have fundamental implications for large-scale protein interaction studies and for our understanding of the behavior of cellular networks.

### Conclusion

The topological information from experimental measured networks network *as is* may not be the correct source for topological information about the parent protein-protein interaction network. We define a core sub-network that more accurately reflects the topology of the parent network.

## Background

Biological systems are characterized by extremely complex interacting networks of nucleotides, proteins, metabolites and other molecules. It has become increasingly clear that to understand the function of a cell, one must understand the function of these networks. Because the topological characteristics of a network are believed to determine basic properties of its function [1–4], a primary goal in analyzing biological networksis to determine how the interacting elements (nodes) are connected toeach other (edges or links). The commonly used large-scaleexperimental approaches (yeast two hybrid and affinity pull-down combined with mass spectrometry) for mapping protein-protein interaction networks are extremely useful to sample portions of the entire network, however, they have well recognized limitations: (i) some interactions are missed (false negatives); (ii) spurious interactions are detected (false positives); (iii) interactions are assumed to be direct (binary analyses lose hierarchical information); and (iv) some proteins function better than others in a protein interaction assay [5, 6]. "Sticky" proteins may be less likely to have false negatives, but it remains an empirical argument as to whether these proteins are also more likely to have false positives. Other factors contributing to these limitations include effects of affinity tag interactions, effects of antibody binding, influence of subcellular localization and protein activity, and post-translational modifications.

A general theoretical question is whether there is a way to sample a network so that the topological information of a sub-network can reflect well that of the original network. This issue was addressed by recent theoretical studies of Stumpf and colleagues [7, 8] who showed that a randomly-sampled sub-network from an Erdös-Rényi random network is still an Erdös-Rényi random network; the same is true for an exponential network. When the original network is scale-free, however, the randomly sampled sub-network is not truly scale-free, but the degree distribution is still very close to a power-law. These findings suggest that a randomly-sampled sub-network may still largely maintain the topological information of the original scale-free network. Besides the maintenance of degree distribution, we also numerically analyzed the network motifs and found that the motif structures were also maintained after random sampling (Additional file 1 Fig.S1). Therefore, a practical question that arises is whether the sub-networks measured by the large-scale experimental approaches can be used to deduce topological information of the original networks. The answer to this question remains largely unclear. In a recent computational analysis [9], it was found that the power-law degree distributions of sampled networks reported in previous studies [3, 4, 10–13] may be a consequence of the manner in which the data are acquired and the low coverage of the complete (i.e., the "actual") protein-protein interaction networks. Besides the degree distribution and network motifs, other topological properties of the randomly sampled network, such as degree exponent, average path length and clustering coefficient, can be quite different from the original network when the size of sampled network is smaller than that of the original one [14, 15]. Nevertheless, based on these previous studies [7–9] and our simulations (Additional file 1 Fig.S1), a sample that reflects the degree distribution and percentage of network motifs of the original network should: be randomly acquired and contain a high degree of coverage of the parent network. By analyzing several experimentally measured protein-protein interaction networks in the present study, we demonstrate that these experimental samples do not constitute random samples, likely due to the aforementioned experimental considerations. This observation highlights that the experimentally-measured sub-networks may not be the correct source for topological information about the parent protein-protein interaction network, raising the distinct possibility that previous analyses of biological networks [3, 4, 10–13, 16–22] make inappropriate conclusions about topology. Although we conclude in this study that the current experiment datasets cannot be used directly for deducing topological information of the original network, we hypothesized that there is a *core sub-network* (CSN) within the experimentally sampled network that can better retain the topological information of the original protein-protein interaction network.

## Results

### Properties of experimentally-measured protein-protein interaction networks

Here we first defined the sub-network composed of the proteins which have both bait and prey functions, and the links among these proteins (red dot and links in Fig. 1c), as a "core sub-network" (CSN). Although the proteins can act as both bait and prey, some of them are still very biased towards one behavior or the other, resulting in very asymmetrical bait and prey behaviors of the proteins. The pure baits and pure preys are the extreme cases of this asymmetrical bait and prey behavior. We first exclude these extreme proteins and develop later a quantitative method to further refine the CSN.

The results in Fig. 1d and Fig. 2 show that the bait and prey behaviors in experimental datasets differ substantially from a true random sampling; in other words, experimental sampling is not random. This supports the idea that bait/prey preference is an artifact of the experimental limitations and/or sampling methods, as previously suggested by Aloy and colleagues [24], and Maslov and Sneppen [25, 26]. Therefore, based on the available theory on random sampling [7, 8], one cannot extrapolate the topological information from the experimentally measured sub-networks to the entire network.

### Effects of experimental sampling on network topology

To show how the experimental sampling affects the topological information, we first studied effects of the ratio of the three types of nodes in the sampled network on the degree distribution and motif structure. We generated three theoretical networks (15,000 nodes each) with different topologies (Erdös-Rényi random distribution with an average connectivity equals 40, exponential distribution *p*(*k*) ∝ *e*^{-0.025k}, and scale-free distribution *p*(*k*) ∝ *k*^{-1.4}) and used the Drosophila protein-protein interaction (DPPI) network by Giot et al [20] as if it were a theoretical network without the original bait and prey information.

*q*(In Fig 3, we focused on the effects of the three types of proteins but not the random sampling of the interactions, and thus we chose

*q*=

*1*); and (iii) that a link between protein A and protein B exists in the measured networks when at least one of interactions A → B and B → A is detected. For comparison, we also performed a true random sampling of the original networks using the same number of nodes as the simulated experimental networks. Note that in the resultant network, one observes new ratios between the pure preys, the pure baits, and the BPs, which are different from the prior assigned rations. This is because of incomplete sampling, i.e., some of the prior assigned BPs become either pure baits, pure preys, or isolated nodes (which are not detected) in the resultant network. In this study, when we refer to a protein as a pure bait, a pure prey, or a BP, we refer to the observed behavior of the protein, not the prior assigned property.

Figure 3a shows the degree distributions of the four types of networks. For Erdös-Rényi random and exponential networks, the degree distribution of the simulated experimental network (symbols) becomes increasingly different from the corresponding random sample network (lines) as the proportion of pure baits or preys increases. For the power-law network, the degree distribution is unchanged. The DPPI network exhibits a truncated power-law distribution, and therefore minor effects are observed for small connectivity since it is dominated by a power-law component, but larger effects of the differences in sampling manifest for larger connectivity due to the exponential tail of the degree distribution. The sub-network within the measured network that contains only BPs–which is a random sample of the library and therefore a random sample of the full network–may maintain the distribution characteristics of the full network. However, all links between two pure baits and between two pure preys are missing in the measurement. As such, the contribution of the pure baits and pure preys are biased and may change the characteristics of the degree distribution. An extreme example of this phenomenon can be observed with a random degree distribution with protein ratio of 2:1:7 (BP: pure bait: pure prey) in which the observed degree distribution of the sub-network displays two peaks with the smaller one contributed by the pure preys alone.

We also counted the sub-graphs of the networks as performed in previous studies [27–29]. Theoretically, a randomly sampled sub-network retaining all links (*q* = *1*) should maintain the ratios between different types of motifs, based on the following argument: a given four-node motif (for example) in the parent network remains intact in the sampled sub-network if and only if all 4 nodes are in the sub-network. If the sub-network is sampled by selecting nodes with a probability *p*, then a four-node motif survives with probability *p*^{4}. Since all motifs have the same survival probability, the percentage of different motif types will not change in the randomly sampled sub-network. On the other hand, in the simulated experimental network, the three types (BP, pure bait, pure prey) may change the survival probability, i.e. the probability that the link is maintained in the sample. For example, for the two motifs: Motif 1 (A–B, A–C, A–D) and Motif 2 (A–B, A–C, B–D) (see Additional file 1 Fig. 2b), if nodes A and D are pure baits, B and C are BPs, it is impossible for Motif 1 to survive to the sampled network as the link A–D will invariably be missed. In contrast, Motif 2 has the survival probability of *p*^{4}. Thus, the ratio of the three types of nodes we define in this study can determine (arbitrarily) the percentage of interaction motifs observed in the sampled network. Changes in this ratio, over which the experimenter does not have control, can alter the perceived topology and motif make-up of the network.

Figure 3b shows the percentage of six different four-node motifs for each of the four types of original networks (black bar), for the simulated experimental networks (cyan bar), and for the sub-network composed of BPs (red bar). The percentage of the motifs detected in the sub-network composed of BPs is almost unchanged from the original network (although larger variations occur in the DPPI dataset). However, the percentage of motif 1 increases and motif 2 decreases in the simulated experimental network (witness this same trend for all four types of networks). Note that although the experimental procedure has almost no effect on degree distribution if the network is scale-free, the network motifs change in the similar manner as in other types of networks.

Figure 3c shows the degree distributions of the sub-network composed of BPs (CSN) within the simulated experimental networks with protein ratio of 2:1:7 (BP: pure bait: pure prey). For all four types of networks, the distribution of CSN (red symbols) closely matches the degree distribution of the corresponding random sample network (red line). For Erdös-Rényi random and exponential networks, the degree distribution of the simulated experimental network (magenta symbols) is different from the corresponding random sample network (magenta lines). Fig 3b and Figure 3c imply that the sub-network composed of BPs (CSN) is the random sample of the full network and therefore retains the topological information of the full network. However, the simulated experimental network will change the topological information: in Erdös-Rényi random and exponential networks, the change includes both degree distribution and motif distribution; for power law and DPPI networks, the change involves the motif distribution.

### Filtering core sub-network within an experimental dataset

Based on our analysis above, it is not surprising that the bait/prey preference affects the network topology so that it cannot be used to predict the topology of the parent network. But it is also not non-intuitive that the core sub-network (CSN) which is composed of only BPs (the red dots and lines in Fig. 1c) may better reflect the topological information of the parent network since the proteins in that network are somehow less biased or better represented. It is obvious that in our computer simulated networks (Fig. 3), the CSN is a true random sample of the full network; therefore, the degree distribution and motif structure of this random sample agree very well with the original network. However, in the experimental datasets, even in the CSN as defined above (the red dots and lines in Fig. 1c), most of the proteins are not equally effective as baits and as preys, but rather, exhibit a bias behavior as either bait or prey. This feature exists in all protein-protein interaction networks we analyzed. For example, protein SRB4 in the yeast dataset (Fig. 1a) is very effective when used as a bait, but much less so as a prey. Specifically, it linked to 95 (we denote this number as m) preys when it was used as a bait. Among the 95 preys, 23 (we denote this number as m_{1}) proteins were also labeled as baits in the dataset. This indicates that if SRB4 is also effective as a prey, it should (theoretically) be linked to at least these 23 proteins when it was a prey. However, it was only linked to 3 (we denote this number as n) baits (TAF17, YNR024W, and RIF2), 2 of which (we denote this number as n_{1}) themselves behave as preys. Unfortunately, none of the 3 proteins that SRB4 linked to when it was a prey belonged to the list of 23 proteins that should have been able to link with SRB4. If SRB4 was equally effective as both bait and prey, it would link to the same 23 baits when it is used a prey, resulting in 23 bidirectional interactions; however, none of these bidirectional links were detected in the experiment. In fact, in all the available experimentally-measured datasets [11, 12, 20, 23], the incidence of bidirectional links is very low. For example, in the yeast network by Ito *et al* [23], there are only 74 bidirectional interactions out of 4,549 total interactions among 3,278 proteins. In the human network by Stelzl *et al* [12], 8 out of 3,269 interactions are bidirectional. In the DPPI network by Giot *et al* [20], the value is 266 out of 20,405. Most of these bidirectional interactions (260 out of 266) were retained in their high-confidence dataset though the total interactions were reduced to 4,780, suggesting that most of the detected bidirectional interactions are true links. The reason for the prevalence of this incongruent behavior of proteins in one scenario versus another (i.e. preferential actions as bait or prey) is unclear, but may result from altered protein folding, differences in post-translational modification, necessity of tertiary interactions, or other factors.

According to our analysis above, exclusion of pure baits and pure preys does not eliminate the biased behavior of proteins from the CSN. To further refine this network, we first define two quantities–the bait score and prey score–to quantitatively characterize the experimental behavior of individual proteins. These two quantities are empirically defined as: *bait score = m/n*_{1}, *prey score = n/m*_{1} (truncated to 1 if greater than 1). The rationale for these definitions is as follows. For the hypothetical Protein X, m is the number of preys to which Protein X links when it is a bait protein, among which m_{1} proteins are themselves also baits in the experiment. The number of baits to which Protein X links when it is a prey protein, is denoted by the term n. In the perfect experiment, when Protein X functions as a prey it should therefore link to at least m_{1} proteins (i.e. m_{1} should be equal to n). This of course is not the case in a real experiment, however, and therefore a protein's behavior as a prey is quantified by n/m_{1}, i.e., the prey score. In the experimental setting, n can be larger than m_{1}, and m_{1} = 0 for the pure preys; therefore, once n>m_{1}, we set the prey score to be the maximum 1. Similar nomenclature is used to label proteins from the prey perspective. For a given Protein X, n is the number of baits to which it links when it is a prey, among which n_{1} proteins are themselves also preys in the same experiment. As with the bait score above, the experimental data does not show the idealized relationship in which all interactions are detected from both directions, and therefore the bait score is calculated as m/n_{1}. Relating these two scores together in the idealized scenario for a BP protein the bait score = prey score = 1, pure baits have bait score = 1 and prey score = 0, and pure preys have bait score = 0 and prey score = 1. For the proteins in red nodes in Fig. 1c, both scores range from 0 to 1, reflecting the aforementioned point that amongst the proteins functioning as both bait and prey, there is a range over which the relative abilities of individual proteins in each of these roles is distributed.

When the original DPPI dataset was filtered into the high-confidence one [20], the protein number collapsed from 7048 to 4679 (66% of initial value) and the link number from 20405 to 4780 (23% of initial value). For the CSN generated with bait and prey scores ≥ 0.5 before filtering with confidence score, there were 1149 proteins with 1834 links, of which 130 links were bidirectional, and the average confidence score was 0.438. After the filtering, 702 (61%) proteins, 854 (47%) links, and 126 (97%) bidirectional links remained, and the average confidence score was 0.747. This exercise demonstrates that the links in the CSN have a much higher retention rate (47% vs. 23%) when filtered with confidence, in further agreement with the higher average confidence score of interactions in the CSN. This conclusion is further substantiated if we regenerate the CSN (with the same bait and prey scores) after filtering the DPPI network to the high confidence DPPI network on the basis of the experimental data: this new CSN has 937 (602 are identical to those in the unfiltered CSN) proteins, 902 (450 identical) links, 223 bidirectional links, and an average confidence score of 0.753, which is substantially increased in comparison to when the filtering is done *after* the CSN is defined from the DPPI network. Interestingly, 84% (223/266) of the bidirectional links were retained when the CSN was defined after filtering the DPPI network to the high confidence DPPI network, versus 47% (126/266) retention of bidirectional links when defined from the DPPI network prior to confidence score filtering. Thus, this CSN approach is an independent (and complementary) method to identify high confidence links more likely to harbor accurate topological information.

We also compared the motif distributions of the DDPI dataset and their CSNs (Fig. 5c). The percentage of the Motif 1 is higher, while that of Motif 2 is lower, in the DPPI network as compared to those observed in the CSN, which agrees with the theoretical analysis in Fig. 3. This is also true for the other experimental datasets (Additional file 1 Fig.S4).

Based on the analyses above, we hypothesize that the CSN within the experimentally sampled sub-network is a closer approximation of a random sample and thus retains the topological information of the original network better than the entire experimental sample. Theoretically, filtering the experimental datasets using our method with higher bait score and prey score thresholds, one can obtain a better CSN. However, due to the limited number of proteins in the network, higher bait and prey scores result in fewer proteins in the CSN, which may cause the CSN to be too small to faithfully retain the topological information of the parent network.

### What are the degree distributions of protein-protein interaction networks?

*p*(

*k*) ∝

*k*

^{-δ}

*e*

^{-εk}(see Additional file 1 Fig.S4), and for the DPPI dataset (Fig. 6a), it is well fit by

*p*(

*k*) ∝

*k*

^{-1.2}

*e*

^{-0.038k}as shown by Giot et al [20]. A CSN with both bait and prey scores greater than or equal to 0.5 has a degree distribution close to

*p*(

*k*) ∝

*k*

^{-0.6}

*e*

^{-0.22k}, which has a larger exponential component but smaller power-law component than the DPPI network. For the high-confidence dataset of the DPPI network (Fig. 6b), it can be well fit by

*p*(

*k*) ∝

*k*

^{-1.26}

*e*

^{-0.27k}, while the CSN defined by both bait and prey scores greater than or equal to 0.5 has a degree distribution

*p*(

*k*) ∝

*k*

^{-0.01}

*e*

^{-0.75k}which is almost completely exponential. To show that this effect is not due solely to the reduction in network size, we also show the degree distributions of two random subsets of the experimentally sampled network: one where the protein number is the same as that of the CSN (called random sample 1) and the other in which the link number is the same as that of the CSN (called random sample 2), both of which have degree distributions that are very different from the CSN. In other datasets we analyzed, the degree distributions of CSNs all have a smaller power-law component and a larger exponential component as compared to the original datasets (Additional file 1 Fig.S4). However, we are not able to completely rule out that the reduction in network size contributes to the enhancement of the exponential component. The two randomly sampled networks in Fig. 6a are not very different from the CSN in both the power-law component and the exponential component. While the networks in Fig. 6b have much stronger power-law components than the CSN, there are relatively few data points making up the degree distribution for the randomly sampled networks.

## Discussion

The present study provides an improved method for extracting accurate topological information about real protein-protein interaction networks from experimentally-obtained sub-networks. The fundamental conclusions of this study can be summarized as follows: (i) random sampling of networks preserves topological information, regardless of the type of network analyzed; and (ii) experimental protein-protein interaction studies have well-established limitations that make their method of sampling non-random; however, (iii) definition of a CSN that contains proteins that behave experimentally as both baits and preys better approximates a random sample and therefore increases the accuracy of topological assessment of protein-protein interaction networks. We show that sampling of theoretical protein interaction networks with exponential, random or scale-free topology in a manner that takes into account experimental limitations, can (and indeed, usually does) produce a sample with scale-free topology; it is given that samples of protein interaction networks appear scale-free; from this, however, it cannot be concluded (as has been previously attempted) that protein interaction networks are scale-free.

Based on our method of defining CSN from the experimental datasets, we show that the degree distribution of the original network may not be scale-free, but may in fact exhibit an exponential distribution. Protein interaction analyses have unavoidable limitations including false positive and negative identifications [30–33] and assumed binary interactions, as mentioned above. We suspect that these false positives may contribute to the observed power-law component of the protein-protein interaction networks based on the following rationale: (i) the high-confidence Drosophila network (purportedly containing fewer false positives [32]) has a stronger exponential component (also verified by Przulj and colleagues [21]) and the CSN has an even higher confidence score and stronger exponential component (Fig. 5 and Figs. S4); (ii) many proteins preferentially behave as either baits or preys but not both, suggesting an experimentally-introduced preferential attachment phenomenon (introduction of hubs by experimental bias) which, as shown by Barabasi and Albert [34], is a key factor for occurrence of power-law distributions; and (iii) the degree distribution of a mammalian protein-protein interaction network obtained by Ma'ayan *et al* [29] from the literature, which should have a much lower rate of false positives, exhibits an almost purely exponential distribution (Additional file 1 Fig. S5). Additionally, the failed detection of links between certain proteins (the green ones or blue nodes in Fig. 1c) due to the aforementioned experimental considerations may contribute to the high rate of false negatives, which may thereby also contribute to the power-law component of the distribution. Although we show evidence that the degree distribution of protein-protein interaction networks might exhibit stronger exponential component, further detailed analyses are needed to substantiate this conclusion.

Determining with high confidence topological information about protein-protein interaction networks from the properties of a smaller, experimentally measured, sub-networks has been challenging [35–37]. However, the topologies of the networks are extremely important for their function and robustness [1–4, 38, 39].

## Conclusion

In this study, we have developed an improved method for extracting topological information for cellular protein-protein interaction networks from experimentally-obtained datasets. As structure, or network anatomy, is a necessary precursor to understanding function, or network physiology, these findings enhance our ability to use existing experimental methods for protein-protein interaction analysis to investigate the behavior of these networks *in vivo*.

## Methods

### Experimental datasets

The experimental datasets analyzed in this study were either downloaded from the related websites or kindly provided by the authors of the following references [2, 9, 11, 12, 20, 23, 29, 31, 40–42].

### Theoretical networks

Theoretical networks were generated following the method by Bender and Canfield [43], that is, we assigned a desired number of edges for each node following the theoretical distribution, then randomly linked a pair of nodes to make an edge, and decreased the link number for both nodes by one until all edges were assigned to nodes without repetition. Random networks were generated according to the Erdös-Rényi model binomial degree distribution represented by: $p(k)={C}_{N-1}^{k}{\gamma}^{k}{(1-\gamma )}^{N-1-k}$.

### Simulated experimental networks

*N*nodes by the method mentioned above. Then we randomly selected

*M*(

*M*<

*N*) nodes from the

*N*-node parent network, and randomly assigned the nodes in the M-node network to be pure baits, pure preys, or both baits and preys with different probabilities independent of the number of links of the nodes. We then applied the following rules to the links of the selected nodes:

- 1)
Any interaction starts from a pure prey or ends at a pure bait is forbidden;

- 2)
For the allowed interactions, each has a probability

*q*(in the simulations in Fig 3, we used*q*=*1*) to be detected; - 3)
A link A–B exists when at least one of interactions A → B and B → A is detected.

### Motif detection

We detected the motifs using the software mfinder1.2 developed by U. Alon's lab [44].

## Declarations

### Acknowledgements

This study was supported by grants from the NIH/NHLBI and by the Laubisch and Kawata Endowments at UCLA.

## Authors’ Affiliations

## References

- Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks.
*Nature*2000, 406(6794):378–382. 10.1038/35019019View ArticlePubMedGoogle Scholar - Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, Vidal M: Evidence for dynamically organized modularity in the yeast protein-protein interaction network.
*Nature*2004, 430(6995):88–93. 10.1038/nature02555View ArticlePubMedGoogle Scholar - Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization.
*Nat Rev Genet*2004, 5(2):101–113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar - Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks.
*Nature*2001, 411(6833):41–42. 10.1038/35075138View ArticlePubMedGoogle Scholar - Deeds EJ, Ashenberg O, Shakhnovich EI: A simple physical model for scaling in protein-protein interaction networks.
*Proc Natl Acad Sci U S A*2006, 103(2):311–316. 10.1073/pnas.0509715102PubMed CentralView ArticlePubMedGoogle Scholar - Shi YY, Miller GA, Qian H, Bomsztyk K: Free-energy distribution of binary protein-protein binding suggests cross-species interactome differences.
*Proc Natl Acad Sci U S A*2006, 103(31):11527–11532. 10.1073/pnas.0604316103PubMed CentralView ArticlePubMedGoogle Scholar - Stumpf MP, Wiuf C, May RM: Subnets of scale-free networks are not scale-free: sampling properties of networks.
*Proc Natl Acad Sci U S A*2005, 102(12):4221–4224. 10.1073/pnas.0501179102PubMed CentralView ArticlePubMedGoogle Scholar - Stumpf MP, Wiuf C: Sampling properties of random graphs: the degree distribution.
*Phys Rev E Stat Nonlin Soft Matter Phys*2005, 72(3 Pt 2):36118.View ArticleGoogle Scholar - Han JD, Dupuy D, Bertin N, Cusick ME, Vidal M: Effect of sampling on topology predictions of protein-protein interaction networks.
*Nat Biotechnol*2005, 23(7):839–844. 10.1038/nbt1116View ArticlePubMedGoogle Scholar - Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks.
*Nature*2000, 407(6804):651–654. 10.1038/35036627View ArticlePubMedGoogle Scholar - Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans.
*Science*2004, 303(5657):540–543. 10.1126/science.1091403PubMed CentralView ArticlePubMedGoogle Scholar - Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksoz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, Wanker EE: A human protein-protein interaction network: a resource for annotating the proteome.
*Cell*2005, 122(6):957–968. 10.1016/j.cell.2005.08.029View ArticlePubMedGoogle Scholar - Song C, Havlin S, Makse HA: Self-similarity of complex networks.
*Nature*2005, 433(7024):392–395. 10.1038/nature03248View ArticlePubMedGoogle Scholar - Lee SH, Kim PJ, Jeong H: Statistical properties of sampled networks.
*Phys Rev E Stat Nonlin Soft Matter Phys*2006, 73(1 Pt 2):016102.View ArticlePubMedGoogle Scholar - Yoon S, Lee S, Yook SH, Kim Y: Statistical properties of sampled networks by random walks.
*Phys Rev E Stat Nonlin Soft Matter Phys*2007, 75(4 Pt 2):046114.View ArticlePubMedGoogle Scholar - Vazquez A, Dobrin R, Sergi D, Eckmann JP, Oltvai ZN, Barabasi AL: The topological relationship between the large-scale attributes and local interaction patterns of complex networks.
*Proc Natl Acad Sci U S A*2004, 101(52):17940–17945. 10.1073/pnas.0406024101PubMed CentralView ArticlePubMedGoogle Scholar - Balazsi G, Barabasi AL, Oltvai ZN: Topological units of environmental signal processing in the transcriptional regulatory network of Escherichia coli.
*Proc Natl Acad Sci U S A*2005, 102(22):7841–7846. 10.1073/pnas.0500365102PubMed CentralView ArticlePubMedGoogle Scholar - Uetz P, Dong YA, Zeretzke C, Atzler C, Baiker A, Berger B, Rajagopala SV, Roupelieva M, Rose D, Fossum E, Haas J: Herpesviral protein networks and their interaction with the human proteome.
*Science*2006, 311(5758):239–242. 10.1126/science.1116804View ArticlePubMedGoogle Scholar - Tanaka R, Yi TM, Doyle J: Some protein interaction data do not exhibit power law statistics.
*FEBS Lett*2005, 579(23):5140–5144. 10.1016/j.febslet.2005.08.024View ArticlePubMedGoogle Scholar - Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL Jr., White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A protein interaction map of Drosophila melanogaster.
*Science*2003, 302(5651):1727–1736. 10.1126/science.1090289View ArticlePubMedGoogle Scholar - Przulj N, Corneil DG, Jurisica I: Modeling interactome: scale-free or geometric?
*Bioinformatics*2004, 20(18):3508–3515. 10.1093/bioinformatics/bth436View ArticlePubMedGoogle Scholar - Khanin R, Wit E: How scale-free are biological networks.
*J Comput Biol*2006, 13(3):810–818. 10.1089/cmb.2006.13.810View ArticlePubMedGoogle Scholar - Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome.
*Proc Natl Acad Sci U S A*2001, 98(8):4569–4574. 10.1073/pnas.061034498PubMed CentralView ArticlePubMedGoogle Scholar - Aloy P, Russell RB: Potential artefacts in protein-interaction networks.
*FEBS Lett*2002, 530(1–3):253–254. 10.1016/S0014-5793(02)03427-0View ArticlePubMedGoogle Scholar - Maslov S, Sneppen K: Specificity and stability in topology of protein networks.
*Science*2002, 296(5569):910–913. 10.1126/science.1065103View ArticlePubMedGoogle Scholar - Maslov S, Sneppen K: Protein interaction networks beyond artifacts.
*FEBS Lett*2002, 530(1–3):255–256. 10.1016/S0014-5793(02)03428-2View ArticlePubMedGoogle Scholar - Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U: Network motifs: simple building blocks of complex networks.
*Science*2002, 298(5594):824–827. 10.1126/science.298.5594.824View ArticlePubMedGoogle Scholar - Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U: Superfamilies of evolved and designed networks.
*Science*2004, 303(5663):1538–1542. 10.1126/science.1089167View ArticlePubMedGoogle Scholar - Ma'ayan A, Jenkins SL, Neves S, Hasseldine A, Grace E, Dubin-Thaler B, Eungdamrong NJ, Weng G, Ram PT, Rice JJ, Kershenbaum A, Stolovitzky GA, Blitzer RD, Iyengar R: Formation of regulatory patterns during signal propagation in a Mammalian cellular network.
*Science*2005, 309(5737):1078–1083. 10.1126/science.1108876PubMed CentralView ArticlePubMedGoogle Scholar - Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations.
*Mol Cell Proteomics*2002, 1(5):349–356. 10.1074/mcp.M100037-MCP200View ArticlePubMedGoogle Scholar - von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions.
*Nature*2002, 417(6887):399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar - Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks.
*Nat Biotechnol*2004, 22(1):78–85. 10.1038/nbt924View ArticlePubMedGoogle Scholar - Gandhi TK, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, Mishra G, Nandakumar K, Shen B, Deshpande N, Nayak R, Sarker M, Boeke JD, Parmigiani G, Schultz J, Bader JS, Pandey A: Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets.
*Nat Genet*2006, 38(3):285–293. 10.1038/ng1747View ArticlePubMedGoogle Scholar - Barabasi AL, Albert R: Emergence of scaling in random networks.
*Science*1999, 286(5439):509–512. 10.1126/science.286.5439.509View ArticlePubMedGoogle Scholar - Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling.
*Science*2003, 301(5629):102–105. 10.1126/science.1081900View ArticlePubMedGoogle Scholar - Guido NJ, Wang X, Adalsteinsson D, McMillen D, Hasty J, Cantor CR, Elston TC, Collins JJ: A bottom-up approach to gene regulation.
*Nature*2006, 439(7078):856–860. 10.1038/nature04473View ArticlePubMedGoogle Scholar - Han Z, Yang L, Maclellan WR, Weiss JN, Qu Z: Hysteresis and cell cycle transitions: how crucial is it?
*Biophys J*2005, 88(3):1626–1634. 10.1529/biophysj.104.053066PubMed CentralView ArticlePubMedGoogle Scholar - Doyle JC, Alderson DL, Li L, Low S, Roughan M, Shalunov S, Tanaka R, Willinger W: The "robust yet fragile" nature of the Internet.
*Proc Natl Acad Sci U S A*2005, 102(41):14497–14502. 10.1073/pnas.0501426102PubMed CentralView ArticlePubMedGoogle Scholar - Willeboordse FH: Dynamical advantages of scale-free networks.
*Phys Rev Lett*2006, 96(1):018702. 10.1103/PhysRevLett.96.018702View ArticlePubMedGoogle Scholar - Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update.
*Nucleic Acids Res*2004, 32(Database issue):D449–51. 10.1093/nar/gkh086PubMed CentralView ArticlePubMedGoogle Scholar - Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.
*Nature*2000, 403(6770):623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar - Lehner B, Fraser AG: A first-draft human protein-interaction map.
*Genome Biol*2004, 5(9):R63. 10.1186/gb-2004-5-9-r63PubMed CentralView ArticlePubMedGoogle Scholar - Bender EA, Canfield ER: The asymptotic number of labeled graphs with given degree sequences.
*Journal of Combinatorial Theory A*1978, 24: 296–307. 10.1016/0097-3165(78)90059-6View ArticleGoogle Scholar - Alon U[http://www.weizmann.ac.il/mcb/UriAlon/]

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.