Characterization of the emergent properties of a synthetic quasi-cellular system

Background The process of solutes entrapment during liposomes formation is interesting for the investigation of the relationship between the formation of compartments and the distribution of molecules inside them; a relevant issue in the studies of the origin of life. Theoretically, when no interactions are supposed among the chemical species to be entrapped, the entrapment is described by a standard Poisson process. But very recent experimental findings show that, for small liposomes (100 nm diameter), the distribution of entrapped molecules is best described by a power-law function. This is of a great importance, as the two random processes give rise to two completely different scenarios. Here we present an in silico stochastic simulation of the encapsulation of a cell-free molecular translation system (the PURE system), obtained following two different entrapment models: a pure Poisson process, and a power-law. The protein synthesis inside the liposomes has been studied in both cases, with the aim to highlight experimental observables that could be measured to assess which model gives a better representation of the real process. Results Firstly, a minimal model for in vitro protein synthesis, based on the PURE system molecular composition, has been formalized. Then, we have designed a reliable experimental simulation where stochastic factors affect the reaction course inside the compartment. To this end, 24 solutes, which represent the PURE system components, have been stochastically distributed among vesicles by following either a Poisson or a power-law distribution. The course of the protein synthesis within each vesicle has been consequently calculated, as a function of vesicle size. Our study can predict translation yield in a population of small liposomes down to the attoliter (10-18 L) range. Our results show that the efficiency of protein synthesis peaks at approximately 3·10-16 L (840 nm diam.) with a Poisson distribution of solutes, while a relative optimum is found at around 10-17 L (275 nm diam.) for the power-law statistics. Conclusions Our simulation clearly shows that the wet-lab measurement of an effective protein synthesis at smaller volumes than 10-17 L would rule out, according to our models, a Poisson distribution of solutes.


Background
The origin of life is a scientific problem not outside the experimental realm.
The scientific consensus holds that life on this planet emerged through a process of gradual, incremental, chemical evolution from non-living matter, a process known as abiogenesis [1,2].
In the last decades, a great effort has been done to reconstruct, at least partially, the physical-chemical steps leading to the origin of life on our planet. A definitive proof of concept would be provided by the ex novo assembly of a living system -that is, a membrane-bound structure characterized by self-replication and self-maintenance, capable of Darwinian evolution [3,4].
The eventual synthetic pathway would also help answer questions concerning the relative importance of chance and necessity in the historical origin of life on earth [5,6].
Ideally, an experimental approach should start from the simplest allegedly prebiotic chemical precursors. Recently, however, an alternative approach, called "semisynthetic", has been proposed. This seems not only well focused to provide the proof of principle that living forms might indeed emerge from the self-organization of their molecular components, but also appears experimentally feasible [4,7,8].
Liposomes, as closed spherical bilayers, are the most obvious candidates as cell compartments. Their key properties, subject to ongoing research, range from autopoiesis, or self-assembly from amphiphilic precursors in water [9][10][11], to their ability to incorporate various molecules, to their growth and division at this expenses of free precursors [12][13][14][15][16][17].
In order to build liposome-based minimal cells, it is necessary to encapsulate the minimal number of molecules such as proteins and nucleic acids in order to accomplish the essential process typical of cellular life. Current research focuses on the synthesis of functional proteins inside liposomes. The translation "module", in fact, is one of the most complex and important for cellular metabolism. Therefore, its implementation in liposome-based systems not only allows the production of proteins (which, in turn, act as catalysts or structural components of the synthetic cell, extending the repertoire of the available functions), but also well models the complexity of living cells.
Several protein expression systems have been successfully encapsulated inside liposomes [18][19][20][21][22] (for a review, see [8]). So far, these systems have been used to express the Green Fluorescent Protein (GFP), an easily detectable protein, and few other enzymes. The final goal of this approach is the intraliposomal expression of enzymes that ultimately catalyze the regeneration of all internalized and membrane-forming components, so that a coordinated "core-and-shell" self-reproduction can be achieved [4,23].
Our focus was on those approaches where a particular translation apparatus called the PURE system is encapsulated inside lipid vesicles (see Figure 1).
The PURE system (Protein synthesis Using Recombinant Elements) was developed by Ueda and colleagues [24] and is now commercially available. It comprises about 80 different macromolecular species (plus amino acids, nucleotides, etc.), representing the minimal set of the E. coli translational machinery (see Table 1). Thirty-six of these macromolecular species have been purified individually by Ni/His6 affinity chromatography after overexpression in E. coli cells. Ribosomes are specifically isolated by sucrosedensity-gradient centrifugation. By mixing in proper ratios the isolated compounds, the E. coli translation activity is reconstituted effectively [24] and can be used to synthesize a protein starting from an encoding DNA (or RNA) sequence. As evident in Figure 1, the PURE system is indeed a modular tool that recapitulates in vitro the following metabolic processes: (i) aminoacylation of tRNAs; (ii) translation (initiation, elongation, termination); (iii) regeneration of the energetic species (ATP, GTP); and, optionally, (iv) transcription.
Experiments are usually done by preparing liposomes in an aqueous solution containing the PURE system. The dispersion of lipids in water can be carried out by different methods, for example by swelling preformed thin lipid films, or by rehydrating freeze-dried liposome membranes, or also by injecting a small aliquot of a lipid-containing alcoholic solution. Consequent to lipid dispersion in the aqueous phase, liposomes form spontaneously. It is during this self-organization step that the PURE system components, floating in solution, are entrapped in the liposome's inner space. Non-entrapped PURE system components are generally inhibited from reacting by the addition of EDTA, RNAses, or proteases, so that protein synthesis occurs only inside liposomes.
In order to produce a realistic in silico model of intraliposomal translation, both solute entrapment and protein synthesis must be simulated stochastically. In fact, it is evident that stochastic effects will affect the solute encapsulation efficiency especially in the case of small liposomes and low solute concentrations. It is generally accepted that the physics of water-soluble solute encapsulation follows a Poisson statistics, where the mean number μ i of entrapped i-th solute (initially present in solution with a concentration C i ) in a vesicle of volume V is given by μ i = N A C i V (N A being the Avogadro number). Fluctuations around this mean value can be described by the Poisson distribution, and are responsible for the fact that the true outcome of the encapsulation step is actually a population of vesicles of different solute content (in terms of number of chemical species and their amount). Furthermore, even independently of this effect, protein synthesis can display a stochastic behaviour when occurring in small volumes with a low number of solutes, as is the case in attoliter (10 -18 L) containers. This behaviour is dependent on compartment size, because translation consists of higher-than-first order reactions whose likelihood depends, inter alia, on reaction volume.
This scenario is further complicated by recent -and very intriguing -findings which suggest that the encapsulation might not follow the Poisson statistics, but could be ruled by a power law. In particular, based on the observed occurrence of GFP production in 200 nm (diameter) vesicles, it was supposed that macromolecules could be indeed captured by closing lipid membranes with much higher a probability than the one calculated according to the Poisson distribution. To test this hypothesis, a thorough investigation was conducted on the physics of solute encapsulation by cryo-transmission electron microscopy. By directly counting the actual number of molecules per vesicle, it has been shown that when the protein ferritin [31] and ribosomes [32] are encapsulated inside lipid vesicles, most of the formed vesicles are actually empty, while a small minority (0.1-1%) contains an unexplained high number of solute molecules. This observation is not compatible with a Poisson entrapment model, but it follows a power law distribution (i.e., a distribution where the "long tail" predicts the occurrence of events which are very distant from the average behaviour ( Figure 2). While these observations are a matter of fact, an explanation is still missing. No experimental studies are available as yet to ascertain whether a power law rules the encapsulation of multiple solutes, as in the PURE system. Further, there are no hints about the existence of solute-or membrane-specific interactions that might cause a deviation from the Poisson distribution.
Here we would like to investigate, by an in silico approach, the possible expectations -in terms of measurable protein synthesis yield-that follow from a set of hypotheses which are currently under experimental and theoretical scrutiny.
By following an approach philosophically similar to that of a previous work [33], where a stochastic simulation was used to test the viability of candidate minimal genome organisms, we would like to test the efficiency of different entrapment models to generate liposomes able to perform a complete protein synthesis. To do this, a specific simulator has been used, called QDC [34], based on Gillespie SSA algorithm, and able to control experimental parameters in a metabolic experiment and to deal with very large and very small number of molecules and volumes.
We think such approach can help advance the understanding of the implications of different hypotheses on solutes entrapment process. A simulation approach, in fact, can conveniently overcome the technical limits which hinder the "wet" study of these micro-compartmentalized systems, and predict clear-cut experimental observables, so providing a means to discriminate between alternative models. Figure 1 The PURE system: main metabolic blocks. A schematic drawing of the metabolic pathways included in the PURE system and hosted in a functionalized liposome (reproduced from [42] with permission from Springer, and from [35] with permission from Elsevier). *The components are solubilized in 50 mM HEPES-KOH pH 7.6; 100 mM potassium glutamate, 13 mM magnesium acetate, 2 mM spermidine, 1 mM DTT. One unit of activity was defined as the amount of enzyme that catalyzes the formation of 1 pmol of amino acyl-tRNA in 1 min (reproduced from [35] with permission from Elsevier). The conversion from U/mL to μM has been taken from [27].

Figure 2
Over-crowding effect. Distribution of entrapped ferritin experimentally measured by means of electronic microscopy and theoretically forecasted Poisson distribution: the main divergence is given by the long tail of non-zero probability (redrawn from [31]).

Formal specification of the PURE system
By assuming the simplifications declared in the Methods section, we derive a formal specification of the PURE system composed by 106 reactions, that best describe the behaviour of the system. The complete list of each simulated chemical species, reaction and kinetic coefficient is reported in the Additional File 1 (written in the QDC input syntax); we report in the Table 2 a selection of the most important kinetic constants, along with a concise description of their derivation. The model was used to perform stochastic simulations of the PURE system in a set volume, representing the internal volume of a liposome. Varying the system volume between the simulations allowed us to study the impact of this variable on the system dynamics. Even though the PURE system is well known and commercially available, to the best of our knowledge this is the first work where its dynamic behaviour is studied by means of a stochastic simulation.

Simulating the protein synthesis in liposomes with the two entrapment models
Our primary target of study was translation efficiency. A measure of it was established as the time τ the system required in order to yield a fixed, arbitrary number of protein molecules per volume unit. We have computed this quantity for both entrapment models: the Poisson and the power-law, respectively.

τ (volume) for Poisson-distributed system species
For each simulated volume V, the mean number μ i of each of the 24 system species present in the reaction compartment at time zero was calculated from the initial concentrations of PURE system species as reported by [35]. μ i was set as the parameter defining the Poisson distribution for each i-th species, which was then used to generate 360 instances of the encapsulated system with stochastically distributed species.
The time course of the number of protein molecules was averaged instant-by-instant across all simulations, in order to cancel out stochastic effects not dependent on volume. The procedure was followed for each volume simulated, in a range from 10 -13 to 10 -18 L. Figure 3 graphically reports the result for a 10 -16 L compartment (vesicle outer diameter ca. 580 nm). The linear-like time course is due to a good translational efficiency, where the molecular machinery works regularly and the chemical resources are still available in relatively large quantities.
The data were used to derive the translation efficiency, as shown in Table 3.
A peak of translation efficiency is revealed at around 3*10 -16 L, corresponding to liposomes of approximately 840 nm (Figure 4).
This maximum is reliable, as we repeated five runs of independent simulations, and its existence and position has been confirmed. The existence of this maximum is somewhat expected from a general consideration that sees the translational efficiency as the equilibrium between two opposite contributions. The former is given by pure kinetic effect: the smaller is the vesicle volume, the higher is the apparent kinetics of the chemical reactions of order higher than first. As almost all the reactions involved in the simulated PURE system are second order, their speed increases when the liposome  [24] AK + ATP > Y 10 6 M -1 s -1 from Adenylate Kinase Km for ATP, as reported in [40] Y > AK + ATP 40 s -1 from Adenylate Kinase Km for ATP, as reported in [40] Y > AKP + ADP 300 s -1 [41] See Additional file 2 for legend.
size becomes smaller. The latter contribution derives from the Poisson entrapment process that makes highly probable that some necessary chemical species are missing when the liposome size becomes smaller. Therefore it is likely to expect that there is an optimal vesicle size that contains all the necessary chemical species in the minimum volume possible. Following our result, the translational efficiency decreases abruptly for vesicle smaller than the optimal size. τ (volume) for power-law-distributed system species For each simulated volume V, the corresponding "overcrowding concentration" c s was determined from the experimental data reported in [31] and graphically displayed in Figure 2. The number μ s of each solute in an "overcrowded" vesicle was calculated as μ s = c s N A V. Solutes whose initial concentration according to [24] is higher than c s were not adjusted for the super-concentration effect, but left with their regular molecule numbers.
The frequency of the "overcrowded" vesicles was inferred from [31] and taken as between 0.1% -1%, depending on super-concentration level, hence also on volume. So, the frequency was established as 1%, 0.2%, and 0.1% for vesicles of volume, respectively, 10 -16 , 5*10 -17 , and < = 10 -17 L (corresponding, respectively to vesicle diameters of ca. 580, 465 and < = 275 nm). This is a good choice to highlight the properties of power-law distribution, by avoiding, at the same time, to face a complex multivariate study.
The "non-overcrowded" vesicles were assigned a mean number μ i of species i, calculated as: μ i = [c i N A V (L s + L n ) -c s N A V L s ]/L n , where c i is the regular concentration of species i as reported in [24] for the PURE system; L s is  10 -17 > 1,000 n/a 5*10 -18 n/a n/a 10 -18 n/a n/a the number of overcrowded vesicles; L n is the number of non-overcrowded vesicles. The formula introduces a correction for the number of molecules already entrapped in the overcrowded vesicles, thus unavailable to be entrapped in the non-overcrowded liposomes.
For each volume V, in a range from 10 -16 to 10 -20 L (diameters from 580 to 35 nm), 1000 instances of the system were generated pursuant to the above-described procedure.
Contrary to the Poisson model, the system reaches a protein production plateau within the simulation time limit ( Figure 5). This is due to the rapid consumption of amino acids and energetic species by the over-concentrated PURE system enzymes. Results are presented in Table 4. The plateau and the general poor yield of the system made impractical the use of the same measure of efficiency as used in the Poisson series. In accordance to our aim to provide a means to discriminate between the Poisson and the power-law statistics in the wet-lab, the time required to reach the plateau, and the number of proteins at plateau per overcrowded vesicle, are given instead.

Conclusions
The in silico analysis of the translational efficiency of liposomes entrapping the PURE system reveals that there is a size threshold of 10 -16 L below which the Poisson entrapment model is forecasted to give no detectable synthesis. The main experimental observable is therefore represented by the possible protein synthesis in liposomes of about 10 -17 L volume (275 nm diam.) or less. By entrapping the PURE system in liposomes of this size, with an mRNA specifying for a GFP protein, the detection of a fluorescence signal will indicate that the real distribution of solutes in liposome follows a power-law statistics and not a Poisson process. This experimental validation could improve our understanding about the basic mechanism underlying the relationship between membrane compartments formation and solute distribution within them.

Deriving a formal specification for the PURE system
In order to specify a simulation-amenable model, the biochemical transformations carried out by the PURE system components was decomposed into elementary reactions (first or second order reactions), whose deterministic kinetic coefficients have been harvested from the literature, or, when unavailable, deduced by global considerations of thermodynamic and/or chemical nature or by imposing an inner coherence to the system. The detail level chosen to specify the system is the result of two complementary needs: to maintain as low as possible the computational cost (that is very high for SSA-based simulators) and to explicitly represent any significant entity present in the system. The resulting main simplifications adopted are: amino acids, tRNAs, aminoacyl-tRNA synthetases, and translation Release Factors are modelled each as a single species representing its own entire class (e.g., one "general tRNA" in lieu of the 56 actual tRNAs of the PURE system). These simplifying assumptions allow reducing the number of chemical species involved in the transcription/translation reactions down to 24. Furthermore, every round of translation is so designed as to yield a complete, 300-amino acids long protein (which models the GFP). In other words there is no abortive translation. This is in good accord with wet-lab empirical observations. Autoradiography of synthesized protein after electrophoretic separation typically reveals only one major protein band [28].
Perform a stochastic simulation of the biochemical species entrapped in the vesicles Stochastic simulations have been performed thanks to QDC (Quick Direct-method controlled), a simulator based on the Gillespie's SSA algorithm (Direct-method version), described in the detail in [34]. The main reason to use such simulator is that it implements a series of checking tools that allowed us to verify the reliability of the obtained simulated data. In particular, it allows the use of very large/small quantities and it checks for possible overflow/ underflow of variables that, given the volumes and concentrations used in our virtual experiments, are possible. Moreover, QDC outputs three files: one is containing the time course of the number of molecules of the various chemical species; one is containing the time course of the propensities of each chemical reaction, and the last file contains the effective firing rate of each reaction. By analysing these three files, it is possible to check for possible simulation artefacts or for possible system properties that can alter the simulation (e.g.: the possible stiffness of a system). We have deeply used these characteristics in order to rule out numerical errors or unsuitable descriptions of the biochemical system to be simulated.

Generating the entrapment models
The Poisson distribution model has been generated by means of a laboratory made script in Python language that simply computes the average value for the molecules in a vesicle, given the vesicle volume and the concentration of the species in the original solution. Then the script calls the Poisson subroutine (available in the statistic library of Python language) that outputs a Poisson-distributed variable.  The power-law entrapment model has been generated by a direct inference on the experimental data, as recalled in the Results and discussion section.