Guided evolution of in silico microbial populations in complex environments accelerates evolutionary rates through a step-wise adaptation

Background During their lifetime, microbes are exposed to environmental variations, each with its distinct spatio-temporal dynamics. Microbial communities display a remarkable degree of phenotypic plasticity, and highly-fit individuals emerge quite rapidly during microbial adaptation to novel environments. However, there exists a high variability when it comes to adaptation potential, and while adaptation occurs rapidly in certain environmental transitions, in others organisms struggle to adapt. Here, we investigate the hypothesis that the rate of evolution can both increase or decrease, depending on the similarity and complexity of the intermediate and final environments. Elucidating such dependencies paves the way towards controlling the rate and direction of evolution, which is of interest to industrial and medical applications. Results Our results show that the rate of evolution can be accelerated by evolving cell populations in sequential combinations of environments that are increasingly more complex. To quantify environmental complexity, we evaluate various information-theoretic metrics, and we provide evidence that multivariate mutual information between environmental signals in a given environment correlates well with the rate of evolution in that environment, as measured in our simulations. We find that strong positive and negative correlations between the intermediate and final environments lead to the increase of evolutionary rates, when the environmental complexity increases. Horizontal Gene Transfer is shown to further augment this acceleration, under certain conditions. Interestingly, our simulations show that weak environmental correlations lead to deceleration of evolution, regardless of environmental complexity. Further analysis of network evolution provides a mechanistic explanation of this phenomenon, as exposing cells to intermediate environments can trap the population to local neighborhoods of sub-optimal fitness.


Background
From microbes to vertebrates, organisms are constantly subjected to evolutionary processes that lead to adaptation and phenotypic variation. Whether evolutionary forces lead to new and rapidly evolving species, as it is in the case of adaptive radiation, or are responsible for phenotypic divergence within a species, the main underlying mechanism by which complex behavior arises remains the same: gradual accumulation of selected genetic mutations and epigenetic changes gives rise to a myriad of anatomical, physiological and behavioral expressions. Although the notion that evolution, niche adaptation, and phenotypic variation leads to "endless forms most beautiful" can be traced back to Darwin [1], it was only in the last decade that with the advent of high-throughput sequencing and profiling techniques, we were able to understand the mechanisms by which mutations give rise to novel traits. Remarkably, it has been shown that even single mutations, such as nucleotide polymorphisms, can yield phenotypes that are significantly dissimilar [2]. The same holds for the rewiring of the gene regulatory and biochemical networks, as they were found to exhibit a highly degree of evolvability [3,4], yet preserve phenotypic robustness when under stabilizing selection and in the presence of disrupting mutations [5,6].
A challenging task is to identify the environmental and organism-specific characteristics that allow the rapid adaptation from past to new environments. Recently, the theory of "Facilitated Variation" provided a unifying framework according to which, organisms are under selection to develop conserved core components that can be reused to quickly adapt to novel environments. Computer simulations with logic gates and RNA secondary structures have demonstrated that facilitated variation spontaneously emerges during evolution [7], while theoretical models provide support that varying environments can affect evolution [8] and can give rise to modular structures [9].
Here, we posit that evolution can be both accelerated and decelerated through step-wise adaptation to novel environments. In complex environments, organisms have to explore a large parameter space before settling in stable fitness points ( Figure 1A). Local minima and discontinuities may lead to sub-optimal fitness peaks, from where it may be difficult, or even infeasible, to escape. In addition, it has been shown that phenotypes that occupy flatter regions of the fitness surface are more robust to mutations, a phenomenon that was coined as "survival of the flattest" [10]. Intermediate environments can lead to fitness neighborhoods from where it is easier for an organism to adapt to novel environments ( Figure 1B). However, exposure to an intermediate environment may also have the opposite effect, as it can result to neighborhoods of poor fitness potential upon transition from one environment to another ( Figure 1C). To further investigate this hypothesis, we first define metrics to quantify environmental complexity, and then proceed to measure the rate of evolution in multi-scale simulations of evolving microbial populations under five dynamic environments.

Methods
We used a multi-scale microbial simulator to perform simulations of microbial populations in five environments with distinct temporal dynamics. The multi-scale simulator employs abstract, multi-scale models of basic sub-cellular phenomena related to expression (transcription, translation, protein modification, degradation, etc.), evolution (mutation, gene duplication, gene deletion, etc.), network regulation and other evolutionary processes (natural selection). It has been used successfully in the past to generate hypotheses related to regulatory network evolution in nutrient-limited microbial communities [11], and we have recently extended to include Horizontal Gene Transfer events [12].
In a simulation run, a population is composed of a fixed number of organisms. Each cell comprises of a number of "triplets" (three nodes): Gene/mRNA, Protein, and Modified Protein ( Figure 2A). The Promoter/ Gene/RNA node captures gene regulation and transcription, while the Protein and Modified Protein nodes capture translation and post-translational modification, respectively. Each organism has its own distinct gene regulatory and biochemical network (i.e. a collection of various triplets and weighted regulatory edges) that can be depicted as a directed weighted graph (see Figure  2B). The probability of molecule creation at each node and at each time step is a function of the regulatory effect of other nodes (activation or inhibitions) on that specific node, and the availability of substrate molecules. We model the molecule production probability as a two-level sigmoid function that captures a threshold and saturation effects for any given regulator and for the expression of any given node [11]. In addition to its regulatory network, each organism has a unique metabolic pathway which, when expressed, can metabolize available resources in the environment.
Mutational events (e.g. transcription rate changes, node duplications, node deletions, etc.) occur stochastically at any time point and on any node, thus changing its internal network and potentially its phenotype, which in this context is synonymous to the regulatory and metabolic pathway expression. The production and destruction of any molecule has an energy cost, as does the maintenance of molecular species (nodes). Organisms cannot directly sense the presence of resources; however they can potentially infer their future presence, if they are able to process information from various environmental signals through biochemical and regulatory interactions. Once an organism reaches a certain energy level, it undergoes division, increasing its genotype representation in the population, while its progeny replaces an existing organism so that the fixed size of the population is preserved (probability of an organism being replaced is inversely proportional to its energy level).
For our simulations here, we used environments where two signals, s 1 and s 2 , carry information regarding the presence of nutrients in the environment ( Figure 3). The I/O characteristic of environments A and B is given by the logic Nutrients Presence [A] = Delayed (s 1 AND NOT(s 2 )) and Nutrients Presence [B] = Delayed (NOT(s 1 ) AND s 2 ), respectively. This logic produces a single peak when s 1 and s 2 have the temporal characteristics of the waveform presented in Figure 3. Environments that encode an AND, OR and XOR gate where also used. The latter is also the environment with the most complex correlation structure, due to the fact that the XOR gate is not linearly separable. In addition, we introduced a delay in the signal/nutrient correlation (through the Delayed() function, which imposes a fixed delay of 500 time steps) to further increase the evolutionary complexity of the environment, as organisms now have to account for it through the topology and dynamics of the respective underlying networks. Similar observations were obtained with the absence of delay, although evolution was faster and resulted in simpler regulatory networks.
To assess the fitness level of each organism, we report the Pearson correlation between nutrient abundance and response protein expression level over a predefined interval of time, which we call an "epoch" (4,500 time units in our simulations). We stress that this similarity measure is used for visualization purposes as a proxy to each organism's fitness, and at no point participates or interferes with the selection or evolutionary trajectory of cells during the simulation. High correlation between nutrients and response protein concentration implies an efficient underlying mechanism to metabolize nutrients, as activation of this costly pathway takes place only when it confers an advantage to the organism.

Environmental complexity and the rate of evolution
It is expected that the time it takes until a fit phenotype emerges is inversely proportional to the complexity of the environment. For instance, the gene regulatory and biochemical network of an organism which has to evolve a simple inverse relationship between signals and nutrients (i.e. the signal should be absent, for an action to take place), will be much less complex than the network of an organism which must capture an oscillatory dependency between inputs and outputs. The same is true for the evolution of networks in other contexts (social, financial, etc.) where inference and decisionmaking are an integral part of the network function. It is an open question, however, how we can quantify environmental complexity in a way that it captures the expected rate of evolution, or the time it will take for a fit phenotype to emerge. To address this question, we evolved initially random populations in the five environments that we mentioned before, and then measured the rate of evolution over 32 simulation runs. Then, we considered several information-theoretic measures that capture the correlation between the observable inputs and latent output, and assessed their potential to predict the evolutionary rate (Table 1). We found that multivariate mutual information between environmental inputs (signals) and outputs (nutrients) correlates well with the rate of evolution for the environments that we studied ( Figure 4A). As expected, the rate of evolution (measured as the time constant in an exponential fit) was slower for the case of XOR than any other environment tested.
Next, we calculated the environmental similarity between any two environments by measuring the Mozhayskiy and Tagkopoulos BMC Bioinformatics 2012, 13(Suppl 10):S10 http://www.biomedcentral.com/1471-2105/13/S10/S10 pair-wise Pearson correlation of nutrient presence. Inclusion of the two signals would not alter the measured correlation, as the temporal dynamics of the two signals are the same in all environments that we consider. Figure 5 depicts the final "environmental network" that reflects the two measured quantities, the environmental quantity and similarity. We continued by testing whether evolution can be accelerated or decelerated by using different paths within the environmental network. A populations is considered to be evolved once its fitness is above w = 0.75; additional phenotype refinement is required to pass the w = 0.85 threshold. Success rate is the ratio of simulations with fitness above threshold after 4,000 epochs over the total number of simulations.  Table 1). (B) Fitness trajectories for AND, OR, A, and XOR environments averaged over 64 experiments; evolutionary rate is defined as the time constant τ of the exponential fit (w=A 0 -α*exp(-t/τ)) on the averaged fitness trajectory of the evolved populations (dashed curves and inset table).
Step-wise evolution through adaptation to environments of increased complexity Interestingly, pre-evolving organisms to intermediate environments that have both lower complexity and strong, positive correlation to the final (target) environment, leads to a higher rate of evolution. We found that this effect was even more profound when positive correlation exists between the intermediate and final environments and in the presence of Horizontal Gene Transfer (HGT) [13]. As shown in Figure 6A, evolution in an XOR environment is usually slow and has low probability of success: 5241 avg. epochs, 26/64 successful simulations. Average adaptation speed or time to a solution is measured as an average number of epochs required for a population to reach a 0.75 fitness threshold in a given environment. However, when cells evolve in an OR environment first, and then transferred in an XOR environment, the average time to a solution decreases by a remarkable 73.5% and the success rate increases considerably (1389 avg. epochs, 67% success rate, Figure 6B). Similarly, if cells evolve in either A or B environment and then in an XOR environment the time is reduced by 63.5% to 1907 epochs ( Figure 6C). This reduction is larger when the initial cell population evolves on parallel in A and B environments, and then a mixed population evolves in an XOR environment in the presence of HGT (1805 avg. epochs, Figure 6D). HGT allows the transfer of genetic material between cells of different lineages, rendering possible the transfer of gene clusters that confer a positive advantage to the target cell. This is particularly noticeable in mixtures of populations which have evolved in environments that are complementary to each other (such as A and B) and similar to the target environment (here XOR). Acceleration is also present in the case of paths with more than one intermediate goal ( Figure  6E), although we expect that the overhead of exploring intermediate goals might eventually offset any acceleration gain present. Figure 7 illustrates the evolutionary trajectories in one-step and multi-step adaptation for the combinations that we examined; detailed statistics is shown in Table 2.
The choice of environments is of paramount importance when it comes to changes in the rate of evolution. Contrary to what we expected, selecting an intermediate environment that strongly anti-correlates to the final environment, does not decelerate the evolutionary rate. The reason is that cell populations are mal-adapted to the second environment and the pre-evolved population is washed out within a few epochs during the transition, for the small population sizes of our simulations. We expect that this will also hold for larger population sizes, as the probability that a reversing mutation in the network arises is higher. Deceleration, however, can be achieved by first evolving the population at an environment of high complexity, which is not surprising. Interestingly, we observed deceleration in the case of weak Figure 5 Relative complexity and similarity of AND, OR, A, B, and XOR environments as a graph. Node values correspond to the environmental complexity (squares, border thickness is proportional to the complexity). Pair-wise similarity is calculated as the Pearson correlation of nutrient presence between environments and corresponds to the edge value (red and blue represent correlation and anticorrelation, respectively). The environment of highest complexity, XOR, is part of a cluster of strongly correlated environments (A, B, OR, and XOR). Environment AND is the simplest one, and it is isolated from other environments (low or negative correlation).
positive or negative correlation between the intermediate and final environments. This is the case of evolution to the OR environment through an initial population that has pre-evolved in the AND environment. Although there is a weak correlation between the two environments (0.09), OR phenotypes do not arise until much later in the simulations (> 4000 epochs). Analysis of the evolutionary trajectory and population variation showed Figure 6 Single-step and multi-step evolution towards an XOR phenotype. The highlighted edges of each network correspond to the environmental transitions made during evolution (evolutionary path). The value of each edge corresponds to the average time to evolve a fit phenotype, in number of epochs. <T> and P are the average total time (traversing through the highlighted path) and the success probability (ratio of successful experiments over total experiments). (A) Direct evolution in an XOR environment is very slow and has a low probability of success (B-C) evolution to environments of lower complexity and subsequent evolution to the final environment accelerate evolution and boosts the number of successful experiments, (D) The presence of HGT in a population that pre-evolved in two simple environments further increase the evolution rate, (E) step-wise acceleration can occur in more than two steps, although at a lesser degree. The evolutionary trajectory during single-step evolution in an XOR environment (averaged over 64 simulations), where it takes more than 5000 epochs to achieve the desired fitness; (B-D) If cells are first evolved in an OR and then in an XOR environment (guided, step-wise evolution), a fit phenotype emerges much faster (1,389 vs. 5,241 in the single-step evolution). Fitness discontinuity during transitions is due to the different correlations between the nutrients and the signals in the various environments. Similarity between environments (e.g. OR and XOR) result in cells that are exposed in the latter environments to already have certain non-zero fitness. that this is a result of cells populating a local optimum in the fitness landscape, in full agreement with our initial hypothesis and Figure 1C. More specifically, within a few epochs after the transition to the OR environment ( Figure 8A), the dominant phenotype in the population expressed the metabolic protein necessary to metabolize nutrients only during the second and third peak of nutrient occurrence ( Figure 8B). However, Table 2 Information metrics of environmental complexity.

Metric
Adjusted  Evolution during an AND-to-OR environmental transition. Snapshots of the minimal network evolution for a single cell during an AND-to-OR transition are shown. T0 is the metabolic triplet, and RP0 is the response protein, whose expression is necessary for nutrient uptake (its expression profile is the third row in the heatmap). (A) Initially the cell is exposed to the AND environment, and exhibits a highly fit profile (Pearson correlation between RP0 and nutrients is 0.89). The minimal network illustrates the logic that has evolved: signal S1 activates the expression of gene and protein product of T1, which in turn catalyzes the translation of T0, while signal S1 should be presence for T0 gene expression, thus operating in an AND-like logic. due to the sub-optimal, but stable, hard-wiring of its regulatory network, the optimal OR phenotype only emerges 1000 epochs later.

Conclusions
In this paper, we investigate the "guided evolution hypothesis", and provide evidence through in silico simulations that the rate of evolution can be both accelerated and decelerated by exposing a cell population to a series of environments. Towards identifying the rules that govern this phenomenon, we explored functions that can serve as a proxy for environmental complexity and similarity between environments. To understand how exposure to intermediate environments impacts evolutionary rates, we analyzed the fossil record of evolution on the networks and the population. Gathering and storing this amount of data would not been possible, without the development of a state-of-the-art, multi-scale, scalable microbial simulator that can run in HPC environments. Our results are insensitive to the particular evolutionary parameters, although simulations should be extended to more distinct environments in the future. The implications of this work span many areas of biological research. First, we extend our current understanding of evolution by deriving a set of rules that explain the directional change of evolutionary rates when populations are exposed to a series of environments. By quantifying the environmental complexity, we made possible a more rigorous assessment of the evolutionary potential of microbial communities in complex environments. In the context of synthetic biology and bioengineering, our work may prove useful if the task is to control evolutionary rates without changing main environmental parameters (medium, mutagenic agents, and other abiotic parameters). The ability to guide evolution can be a powerful tool when it comes to fine-tuning of genetic constructs, designing principles for resilient biological circuits, and controllable biotechnological processes [14].
Our work can be extended to environments of higher dimensionality and complexity, since the scope of this study was limited to temporal signals that encode logical functions. As such, the metrics that we used to assess environmental complexity may not be applicable to environments that are spatially heterogeneous or are characterized by a multitude of parameters that affect bacterial physiology. In such cases, novel metric functions, which measure the difficulty of organisms to change and adapt when exposed to the new environment, should be constructed. In addition, our analysis, despite its novelty on merging models of gene regulatory and biochemical networks with population dynamics, did not exploit the possible link between the biological network organization and modular structure in a given environment, and the ability to generalize to new environments that may, or may not, be similar. Integration of emergent network characteristics and other environmental information has the potential to yield a more accurate depiction of the interplay between evolutionary rates and environmental organization.
Finally, it would be interesting to validate the guided evolution hypothesis in a laboratory setting. An obvious challenge is to characterize environmental similarity and complexity in static environments, as laboratory evolution experiments with shake flasks would be very difficult, if they were to reproduce a temporal correlation between the nutrients and environmental signals. However, chemostats have the capacity to couple nutrient availability to abiotic signals (temperature, pressure), and as such, they can provide a suitable platform to experimentally test this hypothesis.