ProtNet: a tool for stochastic simulations of protein interaction networks dynamics
© Bernaschi et al; licensee BioMed Central Ltd. 2007
Published: 8 March 2007
Protein interactions support cell organization and mediate its response to any specific stimulus. Recent technological advances have produced large data-sets that aim at describing the cell interactome. These data are usually presented as graphs where proteins (nodes) are linked by edges to their experimentally determined partners. This representation reveals that protein-protein interaction (PPI) networks, like other kinds of complex networks, are not randomly organized and display properties that are typical of "hierarchical" networks, combining modularity and local clustering to scale free topology. However informative, this representation is static and provides no clue about the dynamic nature of protein interactions inside the cell.
To fill this methodological gap, we designed and implemented a computer model that captures the discrete and stochastic nature of protein interactions. In ProtNet, our simplified model, the intracellular space is mapped onto either a two-dimensional or a three-dimensional lattice with each lattice site having a linear size (5 nm) comparable to the diameter of an average globular protein. The protein filled lattice has an occupancy (e.g. 20%) compatible with the estimated crowding of proteins in the cell cytoplasm. Proteins or protein complexes are free to translate and rotate on the lattice that represents a sort of naïve unstructured cell (devoid of compartments). At each time step, molecular entities (proteins or complexes) that happen to be in neighboring cells may interact and form larger complexes or dissociate depending on the interaction rules defined in an experimental protein interaction network. This whole procedure can be seen as a sort of "discrete molecular dynamics" applied to interacting proteins in a cell.
We have tested our model by performing different simulations using as interaction rules those derived from an experimental interactome of Saccharomyces cerevisiae (1378 nodes, 2491 edges) and we have compared the dynamics of complex formation in a two and a three dimensional lattice model.
ProtNet is a cellular automaton model, where each protein molecule or complex is explicitly represented and where simple interaction rules are applied to populations of discrete particles. This tool can be used to simulate the dynamics of protein interactions in the cell.
Cell structure and physiology is governed by the precise wiring of the protein interaction network in the cell. In principle, if we had quantitative information about the kinetic constants describing the association and dissociation of all pairs of proteins in a proteome, we could attempt to combine it with information about interactions with small molecules, membranes and nucleic acids to model an electronic cell. This in turn could be used to ask very specific questions in order to test our understanding of the mechanisms underlying cell organization and its dynamic response to external stimuli.
This strategy is computationally very demanding and, in its general formulation, hindered by the limitations of current technologies, at least in a foreseeable future. Currently, the main limitation is the lack of appropriate experimental technologies to produce accurate data at a global genomic level. If precise and complete data are required for carrying out simulations that have biological relevance, then it is unlikely that we will achieve useful cell models in the near future. It is possible, however, that even partial, noisy and/or qualitative data would be sufficient to develop models that can answer low-resolution questions.
Recent technological advances have offered the opportunity to accumulate extensive information about the protein interaction networks in several organisms. In particular the yeast interactome has been looked at from several perspectives and using complementary approaches [1–4]. Although these data sets are noisy and include a sizeable fraction of false positives and false negatives, we have, at this time, the best picture ever of the possible interactions occurring in a living organism.
These types of data are usually represented as a graph where nodes (proteins) are connected by edges if the proteins in question have been shown to interact. Similarly to many social and technological networks, protein-protein interaction (PPI) networks, are not random in many aspects, displaying properties that are characteristic of "hierarchical" networks combining modularity and local clustering to scale free topology .
The graph representation is informative since it allows us to visualize all the possible connections between the proteins in a cell. However, it is not possible to extract some biologically relevant information such as the complexes that are preferentially formed in the cell under different conditions or their compositional distribution. Finally, the graph representation does not have spatial and temporal resolution since it delineates an average over several conditions and time steps. In other words graphs render the static dimension of the protein interaction network and do not capture the dynamic nature and the non-homogeneous spatial distribution of the interactions occurring in a cell.
We present here the initial characterization of a cellular automata model that uses as input experimentally derived protein interaction graphs and, by treating explicitly each protein or complex, produces a full dynamic picture of protein complex assembly and disassembly within a stylized cell.
Although some recent reports, appearing in the literature, attempted to describe the dynamics of protein interactions [6–9], none of these represents a full dynamic model permitting the simulation of the complete set of interactions occurring in vivo in the four spatio-temporal dimensions.
ProtNet is a dynamic stochastic model developed to simulate protein interactions with full spatial and temporal resolution. Proteins diffuse freely in a lattice space and, whenever they reside in neighboring sites, they may form a complex depending on the interaction rules contained in an input interaction graph. The simulations presented in this report are based on the protein interaction network of the yeast Saccharomyces cerevisiae. Protein interaction databases store more than 40,000 interactions between yeast proteins . Most of them, however, are derived from high throughput experiments, producing an inherent high level of false interaction information . To minimize false positives, we utilized a high-quality yeast interaction network derived by considering only interactions described by more than one method (Additional File 1). The resulting interactome data set contains 2,491 high-confidence interactions between 1378 proteins. This corresponds to an average of 3.6 interactions per protein.
In this first implementation of the model we make the following simplifications (see also methods for more details):
The simulation space is a domain of the cell that does not contain compartments and is not limited by a membrane.
All the proteins are seeded onto the lattice at the same concentration.
Protein concentrations are constant. We do not model protein synthesis and protein degradation.
Interacting proteins have the same probability of forming a complex, and each bond in a complex can break at each time step with the same rate constant.
Proteins diffuse at the same rate irrespective of size.
All these simplifications are not implicit in the formulation of the model and could be removed upon the availability of new experimental data, allowing new biological questions to be asked. Although this may seem major simplifications, the basic version of the model presented here is appropriate to ask simple questions about cell organization and protein association dynamics.
Comparison of association kinetics and complex composition at equilibrium in the two and three-dimensional models
We first compared a 2D and a 3D model. The 2D cell model is represented by a square (L = 352) containing 123,904 sites of hexagonal geometry. Considering the linear dimensions of the lattice site as 5 nm, the diameter of an average globular protein, the size of our bi-dimensional simulation space is approximately 1.8 μm, the approximate size of a two-dimensional slice of a bacterium. The 3D model is a cube containing 125000 cubic sites (50 sites for each side). Thus both simulations use a comparable number of lattice sites. If the size of each small cubic site is 5 nm, our 3D model has linear dimensions of 0.25 μm.
Both lattices were seeded with 2 × 104 monomers at random positions. This corresponds to a lattice occupancy of about 20% and a protein concentration of 5 mM, which is comparable with the estimated crowding of a eukaryotic cell. Since the network contains 1378 unique protein species, each protein is, on average, represented 18 times and its final concentration is approximately 3.6 μM. In preliminary experiments we adjusted the probability of associations and dissociations to pon = 0.7 and poff = 0.002 respectively in order to have approximately one half of the proteins involved in the formation of complexes at equilibrium.
Perhaps not surprisingly, the monomers were three times as fast to find their partners in the 3D than in the 2D model. Also dimers and trimers were accumulated with a faster kinetic in the 3D model. However, after 100,000 steps, the complex size distribution at equilibrium is not substantially different. The minor but significant differences observed in the number of complexes of size ~5 is difficult to rationalize. This seems to indicate that multi-protein complexes of medium size are more likely to be formed in two-dimension while dimers, trimers and very large complexes are equally probable in 2 and 3D. The experiments described in the following sessions were all carried out using a 3D lattice corresponding to a cube of linear dimensions of 0.25 μm containing 1.25 x 105 lattice sites.
Association kinetics and complex size distribution as a function of the probability of association and dissociation
Association kinetics at varying molecular crowding
Protein density is another factor likely to influence the kinetics of complex formation. Higher density should increase the likelihood of protein encounter and, as a consequence, should favor complex formation. On the other hand molecular crowding should slow down diffusion and, in the long range, when large complexes are formed, prevent exploration of the cellular space. In order to evaluate the effect of molecular crowding, we performed three simulations differing in the number of monomers initially seeded in the lattice.
Input and output graph
In the input graph the different proteins are joined by a varying number of edges. We were interested in investigating how node (protein) degree and other "static" graph properties are reflected in the actual number of links formed among the different members of any protein species in the cell model. To this end we examined all the interactions at quasi equilibrium after a large number of simulation steps. We then constructed a quantitative graph where the edges between any two proteins are weighted by the fraction of times the two partners take part in the specific interaction. As an example, if protein A is linked to protein B and C in the input graph we count the number of times we find a protein member A associated to B and C respectively and in the resulting graph we associate the two edges (A-B and A-C) with a figure proportional to the fraction of A proteins participating in the two interactions.
Furthermore, the graph in Figure 4A is quantitative and can be filtered based on the weight of each edge. In panel B for instance, we only show the edges with a weight higher than 0.1: at least 10% of the partner proteins participate in the formation of that specific binary complex. Although, as outlined below, a detailed discussion of the biological implications of this output graph would require a more reliable and quantitative input "interactome", already this simple example sketches the formation of well characterized biological complexes and delineate the interactions that are more likely to form within the complex (Figure 4A and 4B).
The electronic cell is considered the "grail" of the field of Systems Biology, The success of this quest could reveal new perspectives for using computer simulations to accelerate biological research. Several attempts have been reported to develop simulation tools that can account not only for the stochastic nature of molecular interactions, but also for the spatial heterogeneity of cellular compartments (for reviews see [15, 16]).
However, set aside the computational hurdles, these types of approaches are limited by our relative ignorance of the parameters affecting the functional interaction between proteins. A realistic simulation of cell physiology would require (at least) genome wide information about protein concentrations combined with integration of a quantitative protein interaction network with the metabolic and gene regulatory networks. These data are currently not available and are not likely to be available in a foreseeable future. Nevertheless, simple computational models may help to reveal general organizing principles from the available noisy and qualitative information.
We aim at investigating, by means of a combination of experimental and informatic approaches, how high order structures form within a cell and we want to identify the characteristics of the interaction graph that promote the formation of a structured cell.
In this first report we presented an initial basic version of a cell automaton that is capable of modeling the dynamics of protein interactions in three dimensions. Our model is presently a naïve one. The cell has no membrane boundary and no compartments. Proteins are fed at a density that is compatible with the estimated physiological protein density, move freely in the cytoplasm and upon meeting have a certain probability of forming a complex according to the rules dictated by a high confidence protein interaction network. It is clear that the more the complexes formed in the ProtNet model reflect the distribution of protein complexes in the cell the more reliable and informative will the biological conclusions that one can draw from such a model. This, however, is difficult to assess because we do not have a full picture of a cell.
In the simulations described in this report the interaction rules are extracted from a yeast high confidence network proposed by Han and colleagues . Albeit "high confidence", this network is incomplete and qualitative. Furthermore, it is biased by interactions that derive from the matrix representations of complexes and, as a consequence, contain a number of interactions that are not direct. The recent publication of two new genomic-scale complex purification experiments [1, 3] and a large literature survey to collect information about low-throughput experiments  gives us confidence that we should soon be able to compile a much more reliable yeast protein interaction graph enriched for high confidence direct binary interactions. Another important limitation of the experiments reported in this manuscript is the assumption that all the proteins present in the network are equally represented in the cell and that they are indefinitely stable. Protein half lives in a living cell range from a few seconds to many days , and the protein abundance in yeast can range from fewer than 50 to more than 1 million molecules per cell . However, since genome wide information about protein concentration in the yeast cell is available this can be readily used to improve the biological reliability of the model .
It is somewhat reassuring that, despite the simple nature of the ProtNet model, complexes resembling physiologically important structures like ribosomes and proteasomes are consistently observed (Figure 4). Even in its basic format, ProtNet can readily be used to explore the stochastic nature of protein interactions and how it affects the compositions of the complexes that are formed in the cell. The possibility to vary protein concentrations also allows to evaluate the effects of molecular overcrowding  on protein diffusion and complex formation. Finally the dynamic characteristics of the model are particularly suited to assess, for instance, the time required for a cell to reach a new equilibrium stage when either the interaction rules are changed because of a protein modification (i.e. phosphorylation) or when some protein concentrations change because of accelerated degradation or increased gene expression.
As paradigm for the simulation of the dynamics of a PPI network we adopted the stochastic lattice gas . More specifically, we implemented two cell models based on a two and a three-dimensional lattice respectively. In the two-dimensional model the elementary lattice cell has hexagonal geometry while in the three-dimensional one the elementary cell is a cube.
Each site of the lattice can be occupied by a single protein. The number of proteins for each molecular species is fixed at the beginning of the simulation. Protein monomers are distributed randomly in the lattice cells. Although, in principle, the concentration of the different protein species can be chosen at will, for instance to mimic the real concentration in the cell, in all the simulations reported in this manuscript each kind of the K protein species is represented by n instances n ~ N/K where N is the total number of proteins in the grid. If we consider a grid occupancy ratio p (0 < p < 1) in a lattice of linear dimensions L we have N = pL d where d is the lattice dimension, (that is d = 2 or 3 for the two and three-dimensional lattice respectively).
At this time we do not model protein synthesis and protein degradation.
Diffusion and rotation
The simulation iterates for a number of time steps by applying a set of rules that defines the dynamics of the monomers, initially distributed in the lattice, and of the complexes that form when proteins bind.
Empty sites of the lattice are considered inactive. However, due to the diffusion and rotation of proteins and complexes, the active sites configuration changes at each time step. More specifically, in the 2D model, at each time step, single proteins and multi-protein structures (i.e., complexes) can rotate by 60 degrees around their center of mass in a randomly chosen direction. In the 3D model rotations are by 90 degrees around each of the three axes. Rotations are rigid, that is, in multi-protein structures, the whole complex undergoes a rotation and there is no torsion. For multi-protein complexes, the chance of rotation is inversely proportional to their diameter. As a consequence, "long" protein complexes rotate less frequently. This choice aims at reducing the impact of the dramatic changes in the configuration of active sites caused by the rotation of long multi-protein complexes.
The diffusion coefficient of a molecule in a dilute solution is a complex function of its shape: D = (1/f)kT, where f is the frictional coefficient, k the Boltzman constant and T the absolute temperature. For a spherical object the frictional coefficient increases linearly with the inverse of the radius. In the current implementation of our model we do not input information about protein masses and shapes. As a consequence all proteins are assumed to have identical diffusion coefficients. In our simulations, monomers have the highest probability of moving to one of the neighboring elementary cells at each time step. Multimers diffuse more slowly (see below).
Having set the linear dimension of the lattice cell to 5 nm, the diameter of an average globular protein, we can estimate the time step of the simulation.
In the limit of large t, the mean-square displacement <x2> of a lattice gas is <x2> = 2dD(p)t
where d is the dimension of the lattice (2 or 3 in our case) and D is the diffusion coefficient that is a function of the concentration p. If l is the size of the random walk step (5 nm in our model), M is the number of steps and t0 is the elementary time step (M = t/t 0 ) we have
2dD(p)t = l 2 t/t 0
that is, for a 2D lattice:
t o = l2/4D(p)
Assuming an average diffusion coefficient of 10-7 cm2sec-1, this results in a time step of 6 × 10-7 s.
Multimeric complexes have larger frictional coefficients and therefore move more slowly. However, in the calculation of frictional coefficients, we do not take into account the shape of the complexes but we assume an inverse linear relation between the diffusion coefficients and the complex sizes, that is the number of monomers forming a complex.
When a monomer diffuses and when a multi-protein complex diffuses or rotates, another single or multi-protein complex can be "hit" if it occupies a "target" site. In this case, the moving protein (or complex) pushes the still protein (or complex) with a probability that is a function of the ratio between the masses of the hitting and hit protein/complex. In particular, the probability that the hitting complex with mass m1 pushes the still complex with mass m2 is
p = e[-(m2/m1)*log2]
as a consequence, if m1 >> m2, on average the hitting complex pushes m2 whereas, if m1 << m2 and m1 hits m2, then m1 stops. In the case m1 ~= m2, the probability the push succeeds is 1/2. Note that this mechanism is applied recursively in situations where a colliding protein or complex m1 pushes m2 which in turn pushes m3 and so on.
Association and dissociation
At each time step new complexes can form and existing complexes can break down into smaller complexes according to experimental interaction rules provided as input to the model. Two proteins residing in neighboring lattice sites may interact and form a new complex depending on whether they are linked by an edge in the input interaction graph.
Monomers have six binding sites corresponding to the six sides of the hexagon in the 2D model or to the six cube faces in the 3D version of the model. Therefore, during a simulation, a single protein can bind to, at most, six other partners. Proteins do not compete for binding to the same partner and association is only limited by the number of edges, 6 in both 2D and 3D models. However, to prevent the formation of very large complexes containing multiple copies of the same protein, we implemented the ad hoc rule that a complex may contain only one copy of any protein species. Binding is stochastic and the probability of forming a complex is a function of an association probability that is related to the association rate constant (kon) of the interacting protein pair. After association, a complex moves as a single entity. Since for most protein pairs the value of the association/dissociation constants is unknown, we set it equal to an average value that is the same for all proteins.
This work was supported by AIRC, by the European integrated project "Interaction proteome" and by the Network of Excellence "ENFIN". We would like to thank Arnaldo Florio for helpful discussion during the development of this project and Lars Kiemer for critically reading the manuscript.
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1.
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440(7084):631–636. 10.1038/nature04532View ArticlePubMedGoogle Scholar
- Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y: Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA 2000, 97(3):1143–1147. 10.1073/pnas.97.3.1143PubMed CentralView ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006, 440(7084):637–643. 10.1038/nature04670View ArticlePubMedGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403(6770):623–627. 10.1038/35001009View ArticlePubMedGoogle Scholar
- Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL: The large-scale organization of metabolic networks. Nature 2000, 407(6804):651–654. 10.1038/35036627View ArticlePubMedGoogle Scholar
- Beyer A, Wilhelm T: Dynamic simulation of protein complex formation on a genomic scale. Bioinformatics 2005, 21(8):1610–1616. 10.1093/bioinformatics/bti223View ArticlePubMedGoogle Scholar
- Soula H, Robardet C, Perrin F, Gripon S, Beslon G, Gandrillon O: Modeling the emergence of multi-protein dynamic structures by principles of self-organization through the use of 3DSpi, a multi-agent-based software. BMC Bioinformatics 2005, 6: 228. 10.1186/1471-2105-6-228PubMed CentralView ArticlePubMedGoogle Scholar
- Broderick G, Ru'aini M, Chan E, Ellison MJ: A life-like virtual cell membrane using discrete automata. In Silico Biol 2005, 5(2):163–178.PubMedGoogle Scholar
- Le Sceller L, Ripoll C, Demarty M, Cabin-Flamand A, Nystrom T, Saier M, Norris V: Modelling bacterial hyperstructures with cellular automata. InterJournal 2000., 366: [http://www.interjournal.org/.]Google Scholar
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513(1):135–140. 10.1016/S0014-5793(01)03293-8View ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417(6887):399–403. 10.1038/nature750View ArticlePubMedGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, et al.: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 2004, 430(6995):88–93. 10.1038/nature02555View ArticlePubMedGoogle Scholar
- Ellis RJ: Macromolecular crowding: an important but neglected aspect of the intracellular environment. Curr Opin Struct Biol 2001, 11(1):114–119. 10.1016/S0959-440X(00)00172-XView ArticlePubMedGoogle Scholar
- Zachariae W, Nasmyth K: Whose end is destruction: cell division and the anaphase-promoting complex. Genes Dev 1999, 13(16):2039–2058.View ArticlePubMedGoogle Scholar
- Lemerle C, Di Ventura B, Serrano L: Space as the final frontier in stochastic simulations of biological systems. FEBS Lett 2005, 579(8):1789–1794. 10.1016/j.febslet.2005.02.009View ArticlePubMedGoogle Scholar
- Arkin AP: Synthetic cell biology. Curr Opin Biotechnol 2001, 12(6):638–644. 10.1016/S0958-1669(01)00273-7View ArticlePubMedGoogle Scholar
- Reguly T, Breitkreutz A, Boucher L, Breitkreutz BJ, Hon GC, Myers CL, Parsons A, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews B, Boone C, Troyanskya OG, Ideker T, Dolinski K, Batada NN, Tyers M: Comprehensive Curation and Analysis of Global Interaction Networks in Saccharomyces cerevisiae. J Biol 2006, 5: 11. 10.1186/jbiol36PubMed CentralView ArticlePubMedGoogle Scholar
- Varshavsky A: The N-end rule: functions, mysteries, uses. Proc Natl Acad Sci USA 1996, 93(22):12142–12149. 10.1073/pnas.93.22.12142PubMed CentralView ArticlePubMedGoogle Scholar
- Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature 2003, 425(6959):737–741. 10.1038/nature02046View ArticlePubMedGoogle Scholar
- Hardy J, Pomeau Y, de Pazzis O: Time evolution of two-dimensional model system. I. Invariant states and time correlation functions. J Math Phys 1973, 14: 1746–1759. 10.1063/1.1666248View ArticleGoogle Scholar
- Hu Z, Mellor J, Wu J, Yamada T, Holloway D, Delisi C: VisANT: data-integrating visual framework for biological networks and modules. Nucleic Acids Res 2005, 33(Web Server):W352–357. 10.1093/nar/gki431PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.