Techniques for analysing pattern formation in populations of stem cells and their progeny

Background To investigate how patterns of cell differentiation are related to underlying intra- and inter-cellular signalling pathways, we use a stochastic individual-based model to simulate pattern formation when stem cells and their progeny are cultured as a monolayer. We assume that the fate of an individual cell is regulated by the signals it receives from neighbouring cells via either diffusive or juxtacrine signalling. We analyse simulated patterns using two different spatial statistical measures that are suited to planar multicellular systems: pair correlation functions (PCFs) and quadrat histograms (QHs). Results With a diffusive signalling mechanism, pattern size (revealed by PCFs) is determined by both morphogen decay rate and a sensitivity parameter that determines the degree to which morphogen biases differentiation; high sensitivity and slow decay give rise to large-scale patterns. In contrast, with juxtacrine signalling, high sensitivity produces well-defined patterns over shorter lengthscales. QHs are simpler to compute than PCFs and allow us to distinguish between random differentiation at low sensitivities and patterned states generated at higher sensitivities. Conclusions PCFs and QHs together provide an effective means of characterising emergent patterns of differentiation in planar multicellular aggregates.


Background
Embryonic stem cells (ESCs) hold great promise as a source of cells for regenerative medicine, as they are, in principle, capable of being expanded indefinitely in vitro and have the potential to differentiate into any adult cell type. Whilst small molecules (such as dexamethasone, vitamin C and retinoic acid [1]), or growth factors (such as bone morphogenesis proteins (BMPs) and transforming growth factor b (TGF-b) [2]) can be used to increase the proportion of cells of a desired type, the population typically consists of multiple cell types, often organised into distinct patches (as illustrated in Figure 1). Culturing cells for extended periods of time in vitro is expensive and stem cells are generally in short supply. There is therefore value in using mechanistic theoretical models of the differentiation of cultured cells to investigate the relationship between the processes determining the fate of individual cells and tissue-scale patterns. Such models can be used to develop optimised protocols for the production of specific cell types and for the development of relevant analytical techniques. In this paper, we present a computational model of a population of stem cells, forming a relatively dense confluent monolayer, in which juxtacrine or diffusive cell signalling biases differentiation of individual cells into two possible cell types. We demonstrate how statistical tools (pair correlation functions and quadrat histograms) can be used to characterise the emergent patterns of differentiation arising from these distinct signalling mechanisms.
In the context of stem-cell differentiation, theoretical models have successfully described for instance the OCT4-SOX2-NANOG system [3], lineage determination between trophectoderm and endoderm [4] and the later differentiation of cells into one of three mesenchymal lineages under the regulation of the master transcription factors RUNX2, SOX9 and PPAR-g [5]. However interactions between multiple pathways remain poorly characterised [5] and many of the key processes involved in cell differentiation remain to be identified. More abstract theoretical models for cellular differentiation are based on the identification of cell fates with distinct attractors of an underlying dynamical system [6]. This idea is embodied in the concept of the 'epigenetic landscape' [7], whereby a ball rolling down a slope into a branching network of valleys is analogous to a differentiating cell choosing between distinct fates. Such ideas have been revisited [8,9] in the light of recent observations of differentiating stem cells. Subsequent work has sought to identify explicitly some of the attractors in the dynamical system generated by the cell's internal regulatory networks [10,11].
The development of mechanistic models to describe pattern formation is a cornerstone of mathematical biology. Substantial attention has focused on systems which exhibit Turing instabilities, involving competition between short-range inhibitors and long-range activators [12]. Such models have been used to describe pattern formation in populations of differentiating cells; for example Garfinkel et al. [13] examined the formation of swirls and ridges in populations of mesenchymal cells. A range of alternative mechanisms have also been investigated, in the context of stem-cell differentiation, involving for example the combination of hapotaxis and cellcell adhesion in mesenchymal condensations leading to the formation of patches of cartilage [14], hapotaxis and activator-inhibitor dynamics combined with a discrete model for cell motion [15] and static activator-inhibitor models [16,17]. As these diverse studies suggest, there are a number of mechanisms by which patches of different cell types could be generated. For example, cells with a similar clonal history are likely to be found near each other, and inherited transcription factors and epigenetic changes may predispose their differentiation into similar types. Alternatively, cells could first differentiate and subsequently organise (or 'sort') themselves into patches through spatial rearrangement [18,19]. The distribution of mechanical forces in the culture environment, or the spatial distribution of chemicals, could favour differentiation into particular cell fates in specific regions of the culture system; and cells may influence the differentiation of their neighbours, by auto/paracrine signalling through diffusive signalling molecules, or by juxtacrine signalling between adjacent cells (possibly mediated by local mechanical effects) [20]. The above list is certainly not exhaustive and it is likely that multiple mechanisms act in combination.
In this paper, we focus on two candidate mechanisms that may be responsible for pattern formation in populations of stem cells and their progeny, considering patterns which are formed by the transmission of information between cells through either diffusible morphogens or juxtacrine signalling, biasing differentiation pathways. Candidate diffusible morphogens might include TGF-b and BMP-2, as reviewed in [21], see also [22][23][24]. We neglect details relating to diffusive transport [25] such as transcytosis [26] or binding of morphogens to cell surfaces or the extracellular matrix. The juxtacrine case could model lateral induction through Notch signalling, which is known to be involved in regulating differentiation and has been found to stimulate the differentiation of embryonic stem cells (ESCs) into neurons [27] and epithelial stem cells into the functioning cells of the intestinal crypt [28]. Alternatively, this case could represent the effects of signalling mediated by cell-cell adhesion molecules such as cadherins [29,30], some of which are thought to modulate differentiation [31].
While our model is generic in the sense that we do not identify explicit morphogens or signalling pathways in our model, we can nevertheless use it to investigate the physical mechanisms that underlie experimentally observed patterns. Previous studies illustrate the complexity of this task. While juxtacrine signalling is typically concerned with pattern formation on the lengthscale of a cell [32], it can exert a longer-range effect. For example, in the imaginal disc of Drosphilia, sensory organ precursor cells extend filopodia containing Delta, allowing them to signal to cells which are not nearest neighbours [33]. Lateral induction of ligand production [34] can generate large-scale patterns, with the juxtacrine signal being relayed between neighbouring cells [35]. Newman & Bhat [36] suggest a mechanism in which oscillatory behaviour synchronised by juxtacrine signalling generated large scale patterns by limiting the period of time over which condensations could grow.
The differences between patterns arising from diffusive and juxtacrine signals therefore merit careful investigation. Given the complexity of modelling specific multi-step differentiation pathways, and their interactions with other signalling networks, we propose here a deliberately simple pattern-generating model that captures generic features in qualitative terms using minimal parameter sets. Motivated by the idea of the epigenetic landscape, we consider a model in which the state of an individual cell evolves as a flow on a two-dimensional surface [11]. The surface branches into two valleys, which correspond to the two alternative cell fates. Differentiating cells are assumed to influence other cells through juxtacrine or diffusive signalling, 'tilting' the potential landscape of a target cell and breaking the symmetry of the pitchfork bifurcation. We assume the bifurcation is supercritical, unlike the subcritical case treated by Huang et al. [37]. We incorporate stochasticity in our model in two ways: by introducing noise into the differentiation process [38,39]; and by introducing a random element to the initial spatial distribution of cells within the monolayer. However, to avoid further complexity, we neglect cell motility and division while differentiation takes place.
In order to analyse the patterns that emerge from our simulations, we employ statistical measures for marked or multitype spatial point processes. One common class of spatial statistics are 'second-order' characteristics, which include Ripley's K-function [40] and pair correlation functions (PCFs) [41], that consider the distribution of distances between pairs of points. Statistics of this class have associated cross or bivariate versions, which only consider distances between pairs of points of specific types. Both the standard and cross-type versions of these statistics have been previously used to examine the distribution of cells in experimental data. For example, a number of statistics, including PCFs, were used by [42] to examine the spatial locations of dividing and non-dividing cells in histological sections of solid tumours. Ripley's K-function [40] has been used to examine retinal neurons [43], the three-dimensional distributions of osteocyte lacunae [44], nerve cells [45], and villous branches in the placenta [46]. Ripley's L-function (a variant of the K-function [40]) was used to examine immune cells in lymph nodes [47]. Su et al. [48] use "local cell metrics" (LCMs), which are closely related to PCFs (their normalised LCM is precisely the cross PCF), to analyse cell-cell interactions in populations of proliferating osteoblasts. However, the types of spatial patterns arising in these experiments, and the biological questions under consideration, differ from those considered here. We note that other spatial statistics have been developed, in particular Minkowski functionals [49], which are more complicated to implement than second-order statistics.
In this paper, we examine two statistical measures that are particularly well suited to multicellular systems, and which could equally be applied to experimental observations. These provide a quantitative estimate of pattern length scales in populations of two cell types, distinguish 'noisy patterns' from completely random differentiation and condense image data into a small number of measures which are useful for parameter surveys. We show how PCFs can be used to assign a length-scale to patterns of differentiating cells. We also show how quadrat histograms (QH) can be used to distinguish noisy patterns from random distributions. QHs are adopted here on account of their conceptual simplicity, ease of implementation and low computational cost. PCFs were chosen in preference to other second-order statistics because of their (arguably) more natural interpretation in the context of exploratory data analysis, as they indicate the properties of pairs of cells separated by a particular distance (rather than all those pairs separated by less than a given distance, which is the case for the Ripley's K-function). These tools generate simple metrics that enable us to characterise the patterns that emerge and their dependence on system parameters.

Results
The model is initialised by seeding undifferentiated cells at random on a planar surface, and allowing them to push each other apart at short distances (and attract nearby cells at longer distances) to form aggregates with only minimal overlap between cells. Thereafter (for t > 0), the cells are assumed to remain stationary while they undergo differentiation into one of two possible terminal states, denoted R (red) or G (green) (Figure 2(a)). The evolution of each cell is modelled by a stochastic differential equation which is analogous to the motion of a particle (in the presence of noise) down a valley (in a surface with coordinates (s n , f n )) that bifurcates into two sub-valleys via a pitchfork bifurcation (Figure 2(c)). The 'stemness' parameter s n for cell n falls from 1 to 0 as the cell differentiates; the type of cell n is coded by a variable f n that approaches the base of the sub-valley in f n > 0 (R) or f n < 0 (G).
Signals from nearby cells tilt the landscape (Figure 2 (d)), favouring differentiation towards the fate shared by its neighbours. Noise in the signalling, generated by randomness in the initial spatial distribution of the cell aggregates and intrinsic variation in the differentiation of each cell, leads to the formation of local regions containing more cells of type R (f n > 0) or G (f n < 0).
Partitioning of the cells into distinct fates is illustrated by histograms of f n (Figure 2(a)). The distributions presented in Figure 2(a) illustrate one of a large set of possible simulation outcomes.

Characterising patterns
At the end of each simulation, cells are characterised by the positions of their centre and their type (R or G). Two representative patterns are shown in Figure 3, with which we illustrate the use of PCFs and QHs.
PCFs are represented by two functions, g(r) and g S (r): g(r) describes the distribution of distances r between pairs of cells, normalised by the expected distribution if the cell positions were completely random; the cross-PCF g S (r) represents the distribution of distances between pairs of cells of the same type (either R or G). If the cells have a completely random spatial distribution, then g(r) ≡ 1 (although the requirement for cells  Table 1. (c,d) show the potential surface for (2b), U = Bnsn + χ n /4 for c = 5, ν = 1 with (c) B n = 0 and (d) B n = 0.1. The cells start in a multipotent state (upper valley), but as they progress down the surface they diverge into two distinct phenotypes (lower valleys). Diffusive morphogens bias differentiation towards one of the two states ("tilting" the surface). Solid lines correspond to stable steady-states for the type equation (2b) with s n viewed as a constant parameter. not to be overlapping implies g < 1 for small r). If cells differentiate randomly and independently, then g S (r) ≡ g (r) (Figure 3(e)). However, if the cells form patches of different types, then the cross PCF will differ from g(r) (Figure 3(b)). For example, for distances r smaller than the sizes of the patches, g S (r) >g(r), as two cells separated by a distance r are more likely to be of the same type than two cells which are selected at random. The point at which the PCFs intersect (r = r p ≈ 38 in this case) provides a quantitative estimate of the scale of the pattern.
QHs indicate the proportion p R of cells of type R in each quadrat when the domain is divided into M q × M q square quadrats. If the cells differentiate at random (Figure 3(d,e,f)), and the number of quadrats is chosen such that the average number of cells in a quadrat N/M 2 q is moderately large (N/M 2 q > 10), then p R has an approximately binomial form, N q p R~B (N q , 1/2) with N q = N/M 2 q ; there are on average N q cells in each quadrat, and the type of each cell is determined randomly and independently of the others with probability Figure 3(f)). However, if there are distinct regions (with a length scale larger than the size of the quadrats) in which most cells are of one type then there will be many quadrats for which p ≈ 0 and p ≈ 1, resulting in a distribution with two large peaks (Figure 3(c)). Thus distributions with distinct patches are identified by PCFs with g S (r) >g(r) for sufficiently small r and QHs showing a substantial majority of quadrats containing cells which are almost all of one type. In contrast, spatially random patterns of differentiation (as illustrated in Figure 3(d,e,f)) are characterised by g S (r) ≈ g(r) and a QH of binomial form.
In summary, QHs provide simple information about whether or not a pattern is present whereas PCFs

Simulation
PCFs Quadrat histogram In (a,b,c), the cells are organised into distinct patches, reflected in the behaviour of the PCFs; g S (r) > g(r) for r < r P (i.e., nearby pairs of cells are more likely to be of the same type than two cells selected at random), the intersection point r = r p giving a quantitative estimate of pattern scale. The QH (c) also indicates the formation of patches, as the majority of the quadrats contain cells of one type (the range 0 <p < 1 is divided into 50 bins, and the values are normalised such that the total area is one). In (d,e,f), the cells appear to have differentiated at random, with no discernible structure. This can be seen from the PCFs in (e), with g S ≈ g(r) for all r (i.e., two nearby cells selected at random are no more likely to be of the same type than two well-separated cells). Similarly, the QH (f) shows that most of the quadrats contain a mixture of cells of different types, and the proportion of cells of type R in each quadrat is well described by a truncated normal distribution on [0,1] with mean 1/2 and variance M 2 q /4N (solid line). provide additional information about the pattern's length-scale.

Diffusive signalling
The spatial patterns that are observed under diffusive signalling are particularly sensitive to two dimensionless model parameters: S diff , which measures the response of the bias to morphogen concentrations; and the morphogen decay rate, λ. Results from individual realisations of the model for 16 pairs of parameter values are shown in Figure 4, illustrating the range of patterns that can be generated. For small S diff and large λ, the cells appear to differentiate randomly, as the strong decay rate inhibits communication between cells. For large S diff and small λ, the patterns often contain many more of one cell type than another, and in some cases all cells adopt the same (differentiated) fate, with stochastic effects dictating whether they are all red (of type R) or all green (of type G). For fixed λ and increasing S diff , we observe a transition from random differentiation to distinct patches of cells, with "noisy patches" evident for intermediate values of S diff ; patterning is more coherent when cells have greater sensitivity to morphogens. For fixed S diff and increasing λ, the spatial scale of the patches appears to decrease, with the differentiation becoming random for sufficiently large λ.
To identify behaviour that is consistent across multiple realisations, simulations were conducted M sim = 100 times for each parameter set in Figure 4. The corresponding PCFs, averaged over all simulations ( Figure 5 (a)), demonstrate consistently random differentiation for small values of S diff and large λ (g S (r)~g(r)). Distinct patches are evident for larger S diff and small λ (g S (r) >g (r) for r <r p ). The quantitative estimates of the scale of the pattern, r p , increase slightly as λ decreases (the diffusive signals act over distances proportional to D/λ), Figure 5(b), but are less sensitive to S diff . We report values of r p for the mean PCFs in Figure 5(a), noting that there is a distribution of patch sizes between individual simulation realisations; the width of this distribution is indicated in Figure 5(b). The difference between g S (r) and g(r) becomes smaller for small λ, because some realisations contain cells which are all of one type (in which case g S (r) ≡ g(r)).
The corresponding QHs (averaged over M sim realisations, see Figure 6), demonstrate a transition from random differentiation for small S diff and large λ, in which the histogram has a binomial form with a peak at p = 1/ 2, to well-defined patterns for large S diff and small λ in which the majority of the quadrats contain cells which are entirely of one type (p R ≈ 0,1). It is helpful to introduce a (very conservative) threshold that defines the existence of patterns: for example, if more than 10% of the quadrats have p R < 0.02 or p R > 0.98 (so lie in either of the extreme bins of the QH), then we say that well defined patterns exist. We demarcate patterned and non-patterned distributions defined by this criterion in Figure 6. Note that the presence of any quadrats with extreme values of p strongly suggests the presence of patterning: with the parameters of Table 1, the average quadrat contains about 24 cells, and if these all differentiate randomly and independently the probability of all 24 being of one type is roughly 2 × (0.5) 24 ≈ 10 -7 . The degree of noise in the patterns is characterised by the shape of the histograms for intermediate values of p R ; the roughly uniform distribution on 0 <p R < 1 falls in magnitude as S diff increases ( Figure 6), even though pattern length-scales remain approximately constant relative to the size of quadrats ( Figure 5). This diffusive signalling mechanism is therefore capable of generating a wide range of spatial patterns. Overall, the sensitivity parameter, S diff , appears to control the degree of noise in the patterns, whilst the morphogen decay rate, λ, controls their length-scale.

Juxtacrine signalling
For the juxtacrine signalling mechanism, we consider only the effects of varying the sensitivity parameter, S juxt . Simulation results (Figure 7) show a smooth transition from random differentiation for small S juxt to small, distinct patches of cells for larger S juxt . In contrast to the diffusive signalling mechanism, patch size under juxtacrine signalling is limited to approximately 20 cell radii in scale. The transition from random differentiation is evident in PCFs (g S (r) ≈ g(r) for small S juxt ; g S (r) >g(r) for r < r p for larger S juxt ), which indicate a patch size of approximately r p ≃ 14 for large S juxt . The QHs also reflect this transition, although as the scale of the patterns is comparable to that of the quadrats, there are substantially fewer quadrats containing cells entirely of one type (p R ≈ 0,1) than in the diffusive case (with large S diff and small λ).  Figure 4). Numbers show normalised frequencies (and corresponding percentages) for the bins if these are greater than 5. In those QH to the upper-left side of the red line, more than 10% of the quadrats are in either of the extreme bins (p R < 0.02, p R > 0.98), which we use as a conservative criterion for the presence of patterns. Dimensionless parameter estimates; unless otherwise stated (in the figure caption), these are the parameter values used for simulations.

Discussion
Heterogeneity in differentiating populations of stem cells hinders the efficient generation of specific types of differentiated cells. Whilst it seems likely that cells will always need to be sorted before being implanted in vivo, not least because undifferentiated cells can cause teratomas (e.g. [50]), improving the yield of particular cell lineages would be of great value. The detailed mechanisms which govern the later stages of cell differentiation into particular phenotypes are not well understood, and there is evidence to suggest that components of both diffusible and juxtacrine signalling pathways play a role [21][22][23][24]27,28]. The statistical measures described here provide a robust, quantitative measure of noisy spatial patterns. We have shown, using a simple model of diffusive or juxtacrine signalling in a cellular monolayer, how QHs provide a simple measure for distinguishing binary patterns of cellular differentiation from spatially uncorrelated outcomes, and how PCFs may be used to estimate the typical lengthscale of binary patterns. As discussed below, these could be readily applied to experimental data, allowing the objective comparison of patterns associated with different culture conditions. In the future, such measures may prove useful in future for comparing the outputs of mechanistic, theoretical models with experimental outcomes. Spatial multicellular simulations often contain large numbers of parameters and generate verbose output; PCFs and QHs may prove to be useful tools for the automatic exploration of parameter space and for condensing the information into a smaller number of physically meaningful quantities.

Model extensions
The present model is deliberately simple, but sufficient to capture the fundamental dynamics (a pitchfork bifurcation with symmetry broken by signalling) that we expect to govern cell fate specification. There are many ways in which we could extend the model. For example, we could include more detailed models of the regulatory networks that govern differentiation [5], and details of their interactions with signalling pathways, such as Wnt signalling [51,52], which is thought to play a role in regulating mesenchymal differentiation [53] and the cell fate of intestinal epithelial cells [54].
At present, all cells lose their "stemness" at the same, pre-determined rate. It seems plausible that individual cells could undergo a rapid, asynchronous transition from an undifferentiated stem-like state to a committed or differentiated one; our model could be extended to permit this by changing the form of the potential surface. This would also permit small numbers of partiallydifferentiated cells to be present in the terminal population [55].
In addition, embryonic stem cell populations have been found to be heterogeneous, containing subpopulations which are biased towards particular lineages [56][57][58]. Such effects could be modelled by considering a subcritical pitchfork bifurcation, as in the model of [37], rather than the supercritical one considered here. While the current model allows limited plasticity in cell fate, with partially differentiated cells being able to change cell type, it is possible to include de-differentiation in response to specific extracellular signals [59,60] and transdifferentiation of cells [61,62].
More accurate models for diffusive signalling could be developed that account for realistic cell shapes in three dimensions and the details of receptor-ligand binding [63] and signal transduction [64]. The model for juxtacrine signalling could also be greatly refined, incorporating established mechanisms [65][66][67][68]. Mechanical forces are also known to affect tissue morphogenesis (reviewed by [69]); changes in cell shape [70] and substrate stiffness [71] have been found to cause mesenchymal stem cells to commit to different lineages. Extracellular matrix (ECM) proteins are thought to regulate differentiation [72][73][74][75], and it has recently been observed that the ECM generated by osteogenic precursors promotes the osteogenic differentiation of ESCs [76]. Such effects could be incorporated in a similar manner to diffusible morphogens, but without diffusion. Other extracellular stimuli that are known to influence differentiation, such as O 2 tension [77,78], could also be readily incorporated in the model.
Cell motion can be readily included in the model, e.g. equation (1), which is here used to determine initial cell positions, could be employed and noise added to account for random cell motility. It would also be interesting to extend the model to account for cell division. However, we have concentrated on the case of static populations of non proliferating cells in order to investigate the two patterning mechanisms in a simple context.

Applications to experimental data
The positions of the cell nuclei (possibly obtained through DAPI staining and confocal imaging, followed by image segmentation and identification of the centroids of the nuclei) give a set of points in space, and if a cell type can be assigned to each point (through costaining), the data will be of the same form as that analysed in this paper. The PCFs (and also the QHs) may be calculated in a straightforward manner using the R package spatstat [79,80].

Conclusions
We have shown how two statistical techniques, QHs and PCFs, can be used to analyse the spatial patterns that emerge in populations of differentiating cells, when there is randomness in the spatial distribution of cells and in the superimposed patterns of differentiation. We have illustrated these techniques using data from a simple stochastic model, in which cell patterning is regulated by either diffusive or juxtacrine signals. We have shown how the size and onset of patterns can be quantified, and illustrated how patterns depend on the mechanisms controlling differentiation and the system parameters.
Our results suggest that when diffusive signalling regulates differentiation, pattern size, as characterised by the QHs and PCFs, is strongly influenced by morphogen decay rate and the degree to which the morphogen biases cell differentiation, with large-scale patterns observed when the decay rate is low and the cells' sensitivity to the morphogen is high. For juxtacrine signalling, the size of the patterns that emerge is an increasing, saturating function of the cells' sensitivity to signalling; large-scale juxtacrine patterns were not seen in our simulations. Our results also reveal how standard statistical techniques such as PCFs and the QH may be used to analyse and characterise the patterns that emerge from differentiating populations of cells in planar multicellular aggregates.

Methods
We simulate individual cells on a planar substrate. The model operates in two steps, described in detail below: undifferentiated cells are seeded at random (at t = 0), and a mechanical model is used which generates aggregates of non-overlapping cells (at t = 0); thereafter (for t > 0), individual cells stop moving and undergo differentiation, mediated by diffusive or juxtacrine signalling (see Figure 8). We combine an individual-based model for cell differentiation with a model for signalling; for diffusive signalling, we use continuum reaction-diffusion equations for the diffusible species, whilst for juxtacrine signalling, we assume that each cell influences the differentiation of a finite number of nearby cells.
Patterns of aggregation and differentiation are analysed with PCFs and QHs, as explained below.
Modelling initial spatial distribution N cells are distributed randomly on a square domain [0, L] × [0, L], considered to be periodic in both directions. Cells move according to a simple, cell-centre based model for a time interval t init , generating a distribution that minimises overlapping but allows aggregate formation. Cells move due to forces between neighbouring cells that are repulsive over short distances to prevent overcrowding but attractive over longer distances to mimic adhesion.
The location of the centre of the n-th cell, x n , evolves according to the differential equation Short-range repulsion and long-range attraction are simulated by the velocity v(r), satisfying (We note that other functions having a similar quantitative form would be similarly effective.) We take the cut-off radius to be R v = 3r c , where r c is the cell radius. A parametrises the size of cell-cell forces. Equations (1) were simulated using the Euler method for an interval t init = 0.002, taking A = 5000.

Modelling cell differentiation
We parametrise the state of the n-th cell (1 ≤ n ≤ N) by (s n , f n ), which serves as a low-dimensional approximation to the levels of numerous transcription factors and the methylation status of many genes. The variable s n , lying in the range 0 ≤ s n ≤ 1, denotes the "stemness" or degree of plasticity of the cell; each value of s n may represent a set of regulatory network activation patterns from the molecular viewpoint, and may depend on the relative abundance and subcellular localisations of proteins and RNAs as well as other types of signalling molecules.
At the start of the simulations, all cells have stemness parameter s n = 1. Over time and as the cells differentiate, s n decreases (in the present model in a deterministic manner). The variable f n (a measure of the relative expression level of specific genes) may take any real value and represents the differentiation fate of the cells. We classify the cells into two types, R and G, for which f n > 0 and f n < 0, respectively. (In images of simulations, cells of types R and G are coloured red and green, respectively.) At the start of the simulation, we set f n = 0 (no preferred lineage) for all cells.
The state of the n-th cell evolves according to the system of stochastic ordinary differential equations where t is time, > 0 controls the rate at which cells differentiate, while c > 0 and ν > 0 are parameters which regulate positive and negative feedback. The equation for f n is chosen such that (with s n viewed as a parameter, and B n = δ = 0) it displays a supercritical pitchfork bifurcation at s n = 1/2, with a single stable  steady state for s n > 1/2, but two stable (and one unstable) steady states for s n < 1/2, associated with the two distinct cell fates (Figure 2(c)). B n ≡ B juxt n + B diff n denotes the influence of external factors (juxtacrine and diffusive signalling) on the fate of the cell. Non-zero B n breaks the symmetry of the pitchfork bifurcation ( Figure  2(d)). Noise (of amplitude δ) accounts for randomness in the differentiation process, allows plasticity in the fate of partially committed cells, and perturbs the system from the unstable state in which all cells have f n = 0. Cells are assumed to remain stationary while they differentiate. We do not claim that the present model for differentiation is definitive; however, it exemplifies in a simple phenomenological way the phenotypic evolution of individual cells.

Diffusive signalling
To simulate diffusive signalling, we assume that the cells produce morphogens with concentrations (at a point x in space) denoted by a(x, t) and b(x, t). Cells of type R (f n > 0) produce a, whilst cells of type G (f n < 0) produce b, with the production rates of the nth cell being given by a a (s n , f n ) and a b (s n , f n ), respectively (Figure 8 (a)). The morphogens diffuse freely in the extracellular space, with diffusion coefficients D a and D b , and are degraded at rates λ a and λ b . The concentrations a and b satisfy the equations where the x n (n = 1,..., N) are the positions of the cell centres. Uptake of the morphogens by the cells is neglected. For simplicity we adopt the following forms for the production functions: where a > 0 is a constant. Production rates increase as the cells lose their multipotency (i.e. as s n decreases).
The influence of morphogens on cell fate in (2b) is modelled by assuming that B diff n is proportional to the difference in concentrations of the two morphogens, S diff being a parameter representing the sensitivity of cells to diffusive signalling. Differentiation is biased towards type R (G) when B diff n is positive (negative) via (2b).

Juxtacrine signalling
To simulate signalling between cells which are in direct physical contact (represented by cells whose centres are less than a distance R juxt apart, where we take R juxt = 3r c ), we define the influence function B juxt n in (2b) to be summing over all m ≠ n, with | x m -x n |<R juxt . The signals produced by differentiating cells (Figure 8(b)) are chosen to be S juxt parametrises the sensitivity of cells to juxtacrine signalling and the constant b > 0 represents the typical number of cell-surface ligands. In (4a), the area of contact between cells (and hence the number of receptorligand interactions) is assumed to be inversely proportional to the distance between them.

Parameter estimation and nondimensionalization
The governing equations can be simplified by making the model dimensionless. The parameters r c , , a, b and ν, can be eliminated by rescaling time on -1 , distances on r c , the cell fate variable f n on 1/2 ν -1/2 , diffusive morphogen concentrations and production rates on α/κr 2 c and a respectively, juxtacrine production rates on b and biasing functions B n on 3/2 ν -1/2 . In dimensionless variables, we recover equations (2) with = ν = 1 and parameters c and δ replaced byχ = χ /κ andδ = δν/κ 2 ; equations (3) with D a , D b replaced bŷ c ; equation (4a) with r c = 1 and S juxt replaced bŷ S juxt = S juxt βν 1/2 /κ 3/2 and R juxt by R juxt = R juxt /r c ; and equations (4b,c) with b = 1. The domain becomes [0,L] × [0,L] withL = L/r c , and simulations are of durationt end = κt end . Henceforth we work only with dimensionless quantities and omit hats.
Estimates for the dimensionless parameters are listed in Table 1; these are the default values used for simulations in Results. D a and D b are based on the diffusion coefficient for the morphogen BMP-2, which was estimated to be 10 -8 cm 2 s -1 in [13] (we do not include the correction proposed in [13] for the slowing of diffusion by the extracellular matrix), and we take D a = D b . The typical cell radius is taken to be 10 μm. Data to estimate the other parameters are not readily available, in particular , which we take to be = 1 day -1 . However the parameters S diff , S juxt and λ a , λ b have a significant effect on the generated patterns, and therefore a wide region of parameter space is surveyed. (We note that the range of λ considered (1 ≤ λ ≤ 40) encompasses the degradation rate 2.5 × 10 -4 s -1 for the morphogen Dpp in Drosophila measured by [81], corresponding to λ = 21 in dimensionless units.) For simplicity we assume λ a = λ b = λ, say.
In order to select parameter values such that the diffusive and juxtacrine mechanisms exert similar effects on differentiating cells, we estimate the maximum sizes of B diff n and B just n . Cells are typically separated from their nearest neighbours by a dimensionless distance of 2 (2r c in dimensional units), so for the juxtacrine mechanism the contribution to B just n in (4) from a neighbouring cell is of the order of S juxt . As cells typically have 6 or fewer neighbours (close packing for discs), we estimate |B juxt n | ≈ 6S juxt . For the diffusive signalling mechanism, the steady-state morphogen field generated by a point source of strength unity is given by where r is the distance from the source and K 0 a modified Bessel function. As K 0 (x) ∼ e −x π /2x as x ∞, diffusive signalling will be significant between cells separated by r = O( D a /λ a ). Provided λ a ≪ D a , we estimate where φ = 1/ 2 √ 3 represents the density of cell centres for closely packed discs. For D a = 1000, λ a = 10, this expression is approximately 0.03S diff . We therefore expect that the juxtacrine and diffusive signalling mechanisms will have similar effects on differentiation if S juxt is roughly 1000 times smaller than S diff .

Numerical methods
Solutions to the stochastic differential equations (2) are approximated numerically using the Euler-Maruyama method [82]. Denoting by Δt the integration timestep and introducing the superscript τ to represent the state of a cell at time t = τΔt, we have where the W τ n are independent random numbers drawn from a normal distribution with mean zero and variance Δt.
The morphogen equations (3) are approximated numerically using a cell-centred finite-volume approach to discretise spatial derivatives. We denote by a j,k (t) and b j,k (t) (j,k = 1,..., M s ) the average concentration of a or b in the region for 1 ≤ j, k ≤ M s , and similarly for (3b). Solutions to the continuous equations (3) have logarithmic singularities at the cell centres, as the cells are modelled as point sources. These singularities are regularised via the spatial discretization, which averages all quantities over a grid square, making the strength of autocrine signalling (and that between cells separated by distances which are of the order of h or less) dependent on h. The discrete equations are stepped forward in time using the Douglas alternating-direction implicit method [83,84]. The morphogen concentrations a(x n ,t) and b(x n ,t) experienced by the n-th cell are then taken to be those for the grid square in which its centre, x n , lies. As the system contains stochastic elements, we perform M sim simulation realisations for each set of parameter values.
The simulations were written in ISO C99, using the random number generator of the GSL library [85], and are available as Additional file 1.

Spatial statistics Pair correlation functions
PCFs are 'second-order' characteristics (involving relationships between pairs of points). We first define them for sets of points which are all of one type, before extending their definitions to the multitype case.
Let Π(ξ,h) be the probability of finding at least one cell centre in both of the infinitesimally small discs, with centres ξ and h and areas dS 1 and dS 2 , respectively. The product density [41], r (2) (ξ,h), is intuitively defined by Π(ξ, h) = r (2) (ξ,h) dS 1 dS 2 (see [41,86] for a rigorous definition). If the pattern is translation-independent and isotropic, then r (2) (ξ,h) ≡ r (2) (r), where r = |ξ -h|. Let r = N/L 2 be the average density of cell centres. Then the PCF (or radial distribution function [87]) is defined by g(r) ≡ r (2) (r)/r 2 , and describes the distribution of distances between pairs of cells.
In the multitype case, for each choice of X, Y {R, G}, we define ρ (2) XY (ξ , η) as for r (2) (ξ,h), except that we require the points in S 1 and S 2 to be of types X and Y respectively. The corresponding cross pair correlation functions [88] (or mark PCFs [41], or partial radial distribution functions [87]) are defined by g XY (r) = ρ (2) XY (r)/ρ X ρ Y , where r X is the density of cells of type X.
We estimate PCFs using the approach illustrated in Figure 9; see [41] (p. 284) for more detailed discussion. (Functions pcf for calculating g(r) and pcfcross for calculating g XY (r) are included in the R package spatstat [79].) A piecewise constant estimate of g(r) is obtained by dividing the range 0 <r <L into M g intervals of equal length L/M g . Setting r j = jL/M g , we approximate g(r) on r k <r ≤ r k+1 by where d nm ≡ | x n -x m |, I (s,t] (r) is the indicator function on (s,t]: I (s,t] (r) = 1 s < r ≤ t, 0 otherwise.
For each cell m {1, 2,..., N}, and each interval k, we calculate the number of cells in the annular region r k < r ≤ r k+1 centred at x m , and normalise this by the expected number of cells in an area of this size were the cells to be uniformly distributed. We then average this over all N cells. (Smooth estimates of g(r) can be obtained by using a smoothing kernel in place of the indicator function.) Whilst the above estimate is piecewise constant, in order to show the distribution more clearly, we plot the values calculated as above at the centres of each interval ((r k+1 + r k )/2) (this is linearly interpolated to give a continuous line).
The cross PCFs g XY are calculated in a similar manner, but the sums for m and n in (10) run only over cells of types X and Y respectively, and the normalization constant is L 2 /[N X N Y π (r 2 k+1 − r 2 k )], where N X and N Y are the numbers of cells of type X and Y. As the simulations are initially symmetrical in the two cell fates, we will combine g RR (r) and g GG (r) to give the cross PCF for pairs of cells of the same type, g S (r), defined by g S (r) = (ρ R ) 2 g RR (r) + (ρ G ) 2 g GG (r) (ρ R ) 2 + (ρ G ) 2 .
We choose to weight the two cross PCFs in proportion to the number of pairs of cells of that type, as g S (r)/ g(r) is then the conditional probability that two randomly selected cells are of the same type, given that they are separated by a distance r, divided by the probability that any two randomly selected cells are of the same type ((ρ 2 R + ρ 2 G )/ρ 2 ). We take the arithmetic mean of PCFs over M sim realisations with the same parameter values in order to better estimate them.

Quadrat histograms
To calculate this statistic, we partition the domain [0, L] × [0, L] into M q × M q squares (or quadrats) with side length L/M q . We calculate the proportion p R of cells of type R (those for which f n > 0) in each quadrat, ignoring empty quadrats; we combine the results of M sim simulations with the same parameter values to generate a histogram of the distribution of p R over all quadrats and for all simulations.

Additional material
Additional file 1: Simulation source code. Source code for simulations of pattern generation in populations of stem cells. r k +1 r k x m Figure 9 Calculating PCFs. Schematic diagram to illustrate the method used to calculate PCFs. For each distance interval (r k , r k+1 ] and each cell with centre x m , we count the number of (other) cells in r k <r ≤ r k+1 where r is distance from x m . The PCF, g(r), on r k <r ≤ r k+1 is the mean number of cells in these annular regions normalised by π (r 2 k+1 − r 2 k )ρ, which is the number of other cells which would be expected to be found in the annular region were the cells uniformly distributed (see equations (10)- (11)). For the cross PCFs g XY (r), we restrict x m to be of type X and only count cells of type Y; g S (r) is calculated from g RR (r) and g GG (r) by (12).