We simulate individual cells on a planar substrate. The model operates in two steps, described in detail below: undifferentiated cells are seeded at random (at *t* = 0), and a mechanical model is used which generates aggregates of non-overlapping cells (at *t* = 0); thereafter (for *t* > 0), individual cells stop moving and undergo differentiation, mediated by diffusive or juxtacrine signalling (see Figure 8). We combine an individual-based model for cell differentiation with a model for signalling; for diffusive signalling, we use continuum reaction-diffusion equations for the diffusible species, whilst for juxtacrine signalling, we assume that each cell influences the differentiation of a finite number of nearby cells.

Patterns of aggregation and differentiation are analysed with PCFs and QHs, as explained below.

### Modelling initial spatial distribution

*N* cells are distributed randomly on a square domain [0, *L*] × [0, *L*], considered to be periodic in both directions. Cells move according to a simple, cell-centre based model for a time interval *t*_{init}, generating a distribution that minimises overlapping but allows aggregate formation. Cells move due to forces between neighbouring cells that are repulsive over short distances to prevent overcrowding but attractive over longer distances to mimic adhesion.

The location of the centre of the *n*-th cell, **x**_{
n
}, evolves according to the differential equation

\frac{\mathsf{\text{d}}{\mathbf{x}}_{n}}{\mathsf{\text{d}}t}=\sum _{m=1,m\ne n}^{N}v\left(|{\mathbf{x}}_{n}-{\mathbf{x}}_{m}|\right)\frac{{\mathbf{x}}_{n}-{\mathbf{x}}_{m}}{|{\mathbf{x}}_{n}-{\mathbf{x}}_{m}|}\cdot

(1a)

Short-range repulsion and long-range attraction are simulated by the velocity *v*(*r*), satisfying

v\left(r\right)=\left\{\begin{array}{cc}\hfill A{r}_{c}^{2}{r}^{-3}\left(2{r}_{c}-r\right)\hfill & \hfill r<{R}_{v},\hfill \\ \hfill 0\hfill & \hfill r>{R}_{v}.\hfill \end{array}\right.

(1b)

(We note that other functions having a similar quantitative form would be similarly effective.) We take the cut-off radius to be *R*_{
v
}= 3*r*_{
c
}, where *r*_{
c
}is the cell radius. *A* parametrises the size of cell-cell forces. Equations (1) were simulated using the Euler method for an interval *t*_{init} = 0.002, taking *A =* 5000.

### Modelling cell differentiation

We parametrise the state of the *n*-th cell (1 ≤ *n* ≤ *N*) by (*s*_{
n
}, *f*_{
n
}), which serves as a low-dimensional approximation to the levels of numerous transcription factors and the methylation status of many genes. The variable *s*_{
n
}, lying in the range 0 ≤ *s*_{
n
}≤ 1, denotes the "stemness" or degree of plasticity of the cell; each value of *s*_{
n
}may represent a set of regulatory network activation patterns from the molecular viewpoint, and may depend on the relative abundance and subcellular localisations of proteins and RNAs as well as other types of signalling molecules.

At the start of the simulations, all cells have stemness parameter *s*_{
n
}= 1. Over time and as the cells differentiate, *s*_{
n
}decreases (in the present model in a deterministic manner). The variable *f*_{
n
}(a measure of the relative expression level of specific genes) may take any real value and represents the differentiation fate of the cells. We classify the cells into two types, *R* and *G*, for which *f*_{
n
}> 0 and *f*_{
n
}< 0, respectively. (In images of simulations, cells of types *R* and *G* are coloured red and green, respectively.) At the start of the simulation, we set *f*_{
n
}= 0 (no preferred lineage) for all cells.

The state of the *n*-th cell evolves according to the system of stochastic ordinary differential equations

\mathsf{\text{d}}{s}_{n}=-\kappa {s}_{n}\mathsf{\text{d}}t

(2a)

\mathsf{\text{d}}{f}_{n}=\left({B}_{n}+\chi \left(\frac{1}{2}-{s}_{n}\right){f}_{n}-v{f}_{n}^{3}\right)\mathsf{\text{d}}t+\sqrt{2\delta}\mathsf{\text{d}}{W}_{n}

(2b)

where *t* is time, *κ* > 0 controls the rate at which cells differentiate, while *χ* > 0 and *ν* > 0 are parameters which regulate positive and negative feedback. The equation for *f*_{
n
}is chosen such that (with *s*_{
n
}viewed as a parameter, and *B*_{
n
}= *δ* = 0) it displays a supercritical pitchfork bifurcation at *s*_{
n
}= 1/2, with a single stable steady state for *s*_{
n
}> 1/2, but two stable (and one unstable) steady states for *s*_{
n
}< 1/2, associated with the two distinct cell fates (Figure 2(c)). {B}_{n}\equiv {B}_{n}^{\mathsf{\text{juxt}}}+{B}_{n}^{\mathsf{\text{diff}}} denotes the influence of external factors (juxtacrine and diffusive signalling) on the fate of the cell. Non-zero *B*_{
n
}breaks the symmetry of the pitchfork bifurcation (Figure 2(d)). Noise (of amplitude *δ*) accounts for randomness in the differentiation process, allows plasticity in the fate of partially committed cells, and perturbs the system from the unstable state in which all cells have *f*_{
n
}= 0. Cells are assumed to remain stationary while they differentiate. We do not claim that the present model for differentiation is definitive; however, it exemplifies in a simple phenomenological way the phenotypic evolution of individual cells.

#### Diffusive signalling

To simulate diffusive signalling, we assume that the cells produce morphogens with concentrations (at a point **x** in space) denoted by *a*(**x**, *t*) and *b*(**x**, *t*). Cells of type *R* (*f*_{
n
}> 0) produce *a*, whilst cells of type *G* (*f*_{
n
} < 0) produce *b*, with the production rates of the *n* th cell being given by *α*_{
a
}(*s*_{
n
}, *f*_{
n
}) and *α*_{
b
} (*s*_{
n
}, *f*_{
n
}), respectively (Figure 8(a)). The morphogens diffuse freely in the extracellular space, with diffusion coefficients *D*_{
a
}and *D*_{
b
}, and are degraded at rates λ_{
a
}and λ_{
b
}. The concentrations *a* and *b* satisfy the equations

\frac{\partial a}{\partial t}={D}_{a}{\nabla}^{2}a+\sum _{n=1}^{N}{\alpha}_{a}\left({s}_{n},{f}_{n}\right)\delta \left(\mathbf{x}-{\mathbf{x}}_{n}\right)-{\lambda}_{a}a

(3a)

\frac{\partial b}{\partial t}={D}_{b}{\nabla}^{2}b+\sum _{n=1}^{N}{\alpha}_{b}\left({s}_{n},{f}_{n}\right)\delta \left(\mathbf{x}-{\mathbf{x}}_{n}\right)-{\lambda}_{b}b

(3b)

where the **x**_{
n
}(*n* = 1,..., *N*) are the positions of the cell centres. Uptake of the morphogens by the cells is neglected. For simplicity we adopt the following forms for the production functions:

{\alpha}_{a}\left({S}_{n},{f}_{n}\right)=\left\{\begin{array}{cc}\hfill \alpha \left(1-{s}_{n}\right)\hfill & \hfill {f}_{n}>0,\hfill \\ \hfill 0\hfill & \hfill {f}_{n}<0,\hfill \end{array}\right.

(3c)

{\alpha}_{b}\left({s}_{n},{f}_{n}\right)=\left\{\begin{array}{cc}\hfill 0\hfill & \hfill {f}_{n}>0,\hfill \\ \hfill \alpha \left(1-{s}_{n}\right)\hfill & \hfill {f}_{n}<0,\hfill \end{array}\right.

(3d)

where *α* > 0 is a constant. Production rates increase as the cells lose their multipotency (i.e. as *s*_{
n
}decreases).

The influence of morphogens on cell fate in (2b) is modelled by assuming that {B}_{n}^{\mathsf{\text{diff}}} is proportional to the difference in concentrations of the two morphogens,

{B}_{n}^{\mathsf{\text{diff}}}={S}^{\mathsf{\text{diff}}}\left(a\left({\mathbf{x}}_{n}\right)-b\left({\mathbf{x}}_{n}\right)\right),

(3e)

*S*^{diff} being a parameter representing the sensitivity of cells to diffusive signalling. Differentiation is biased towards type *R* (*G*) when {B}_{n}^{\mathsf{\text{diff}}} is positive (negative) via (2b).

#### Juxtacrine signalling

To simulate signalling between cells which are in direct physical contact (represented by cells whose centres are less than a distance *R*_{juxt} apart, where we take *R*_{juxt} = 3*r*_{
c
}), we define the influence function {B}_{n}^{\mathsf{\text{juxt}}} in (2b) to be

{B}_{n}^{\mathsf{\text{juxt}}}={S}^{\mathsf{\text{juxt}}}\sum _{m}\frac{2{r}_{c}\left({\beta}_{a}\left({s}_{m},{f}_{m}\right)-{\beta}_{b}\left({s}_{m},{f}_{m}\right)\right)}{|{\mathbf{x}}_{m}-{\mathbf{x}}_{n}|}

(4a)

summing over all *m* ≠ *n*, with | **x**_{
m
}- **x**_{
n
}|< *R*_{juxt}. The signals produced by differentiating cells (Figure 8(b)) are chosen to be

{\beta}_{a}\left({s}_{n},{f}_{n}\right)=\left\{\begin{array}{cc}\hfill \beta \left(1-{s}_{n}\right)\hfill & \hfill {f}_{n}>0,\hfill \\ \hfill 0\hfill & \hfill {f}_{n}<0,\hfill \end{array}\right.

(4b)

{\beta}_{b}\left({s}_{n},{f}_{n}\right)=\left\{\begin{array}{cc}\hfill 0\hfill & \hfill {f}_{n}>0,\hfill \\ \hfill \beta \left(1-sn\right)\hfill & \hfill {f}_{n}<0.\hfill \end{array}\right.

(4c)

*S*^{juxt} parametrises the sensitivity of cells to juxtacrine signalling and the constant *β* > 0 represents the typical number of cell-surface ligands. In (4a), the area of contact between cells (and hence the number of receptor-ligand interactions) is assumed to be inversely proportional to the distance between them.

#### Parameter estimation and nondimensionalization

The governing equations can be simplified by making the model dimensionless. The parameters *r*_{
c
}, *κ*, *α*, *β* and *ν*, can be eliminated by rescaling time on *κ*^{-1}, distances on *r*_{
c
}, the cell fate variable *f*_{
n
}on *κ*^{1/2}*ν*^{-1/2}, diffusive morphogen concentrations and production rates on \alpha \u2215\kappa {r}_{c}^{2} and *α* respectively, juxtacrine production rates on *β* and biasing functions *B*_{
n
}on *κ*^{3/2}*ν*^{-1/2}. In dimensionless variables, we recover equations (2) with *κ* = *ν* = 1 and parameters *χ* and *δ* replaced by \widehat{\chi}=\chi \u2215\kappa and \widehat{\delta}=\delta \nu \u2215{\kappa}^{2}; equations (3) with *D*_{
a
}, *D*_{
b
}replaced by {\widehat{D}}_{a}={D}_{a}\u2215\kappa {r}_{c}^{2}, {\widehat{D}}_{b}={D}_{b}\u2215\kappa {r}_{c}^{2} and λ_{
a
}, λ_{
b
}replaced by {\widehat{\lambda}}_{a}={\lambda}_{a}\u2215\kappa, {\widehat{\lambda}}_{b}={\lambda}_{b}\u2215\kappa; equations (3d) with *α* = 1; equation (3e) with *S*^{diff} replaced by {\u015c}^{\mathsf{\text{diff}}}={S}^{\mathsf{\text{diff}}}\alpha {\nu}^{1\u22152}\u2215{\kappa}^{5\u22152}{r}_{c}^{2}; equation (4a) with *r*_{
c
}= 1 and *S*^{juxt} replaced by {\u015c}^{\mathsf{\text{juxt}}}={S}^{\mathsf{\text{juxt}}}\beta {\nu}^{1\u22152}\u2215{\kappa}^{3\u22152} and *R*_{juxt} by {R}_{\mathsf{\text{juxt}}}={R}_{\mathsf{\text{juxt}}}\u2215{r}_{c}; and equations (4b,c) with *β* = 1. The domain becomes \left[0,\widehat{L}\right]\times \left[0,\widehat{L}\right] with \widehat{L}=L\u2215{r}_{c}, and simulations are of duration {\widehat{t}}_{\mathsf{\text{end}}}=\kappa {t}_{\mathsf{\text{end}}}. Henceforth we work only with dimensionless quantities and omit hats.

Estimates for the dimensionless parameters are listed in Table 1; these are the default values used for simulations in Results. *D*_{
a
}and *D*_{
b
}are based on the diffusion coefficient for the morphogen BMP-2, which was estimated to be 10^{-8} cm^{2}s^{-1} in [13] (we do not include the correction proposed in [13] for the slowing of diffusion by the extracellular matrix), and we take *D*_{
a
}*= D*_{
b
}. The typical cell radius is taken to be 10 *μ* m. Data to estimate the other parameters are not readily available, in particular *κ*, which we take to be *κ* = 1 day^{-1}. However the parameters *S*^{diff}*, S*^{juxt} and λ_{
a
}, λ_{
b
}have a significant effect on the generated patterns, and therefore a wide region of parameter space is surveyed. (We note that the range of λ considered (1 ≤ λ ≤ 40) encompasses the degradation rate 2.5 × 10^{-4} s ^{-1} for the morphogen Dpp in Drosophila measured by [81], corresponding to λ = 21 in dimensionless units.) For simplicity we assume λ_{
a
}= λ_{
b
}= λ, say.

In order to select parameter values such that the diffusive and juxtacrine mechanisms exert similar effects on differentiating cells, we estimate the maximum sizes of {B}_{n}^{\mathsf{\text{diff}}} and {B}_{n}^{\mathsf{\text{just}}}. Cells are typically separated from their nearest neighbours by a dimensionless distance of 2 (2*r*_{
c
}in dimensional units), so for the juxtacrine mechanism the contribution to {B}_{n}^{\mathsf{\text{just}}} in (4) from a neighbouring cell is of the order of *S*^{juxt}. As cells typically have 6 or fewer neighbours (close packing for discs), we estimate \left|{B}_{n}^{\mathsf{\text{juxt}}}\right|\approx 6{S}^{\mathsf{\text{juxt}}}. For the diffusive signalling mechanism, the steady-state morphogen field generated by a point source of strength unity is given by

a=\frac{1}{2\pi {D}_{a}}{K}_{0}\left(\sqrt{\frac{{\lambda}_{a}}{{D}_{a}}r}\right)

(5)

where *r* is the distance from the source and *K*_{0} a modified Bessel function. As {K}_{0}\left(x\right)~{e}^{-x}\sqrt{\pi \u22152x} as *x* → ∞, diffusive signalling will be significant between cells separated by r=O\left(\sqrt{{D}_{a}\u2215{\lambda}_{a}}\right). Provided λ_{
a
}≪ *D*_{
a
}, we estimate

\left|{B}_{n}^{\mathsf{\text{diff}}}\right|\approx \frac{{S}^{\mathsf{\text{diff}}}\varphi}{{D}_{a}}{\int}_{0}^{\infty}{K}_{0}\left(\sqrt{\frac{{\lambda}_{a}}{{D}_{a}}}r\right)r\phantom{\rule{0.3em}{0ex}}dr=\frac{{S}^{\mathsf{\text{diff}}}\varphi}{{\lambda}_{a}}

(6)

where \varphi =1\u2215\left(2\sqrt{3}\right) represents the density of cell centres for closely packed discs. For *D*_{
a
}= 1000, λ_{
a
}= 10, this expression is approximately 0.03*S*^{diff}. We therefore expect that the juxtacrine and diffusive signalling mechanisms will have similar effects on differentiation if *S*^{juxt} is roughly 1000 times smaller than *S*^{diff}.

#### Numerical methods

Solutions to the stochastic differential equations (2) are approximated numerically using the Euler-Maruyama method [82]. Denoting by Δ*t* the integration timestep and introducing the superscript *τ* to represent the state of a cell at time *t* = *τ* Δ*t*, we have

{s}_{n}^{\tau +1}=\left(1-\Delta t\right){s}_{n}^{\tau},

(7)

\begin{array}{c}{f}_{n}^{\tau +1}={f}_{n}^{\tau}+\left({B}_{n}^{\tau}+\chi \left(\frac{1}{2}-{s}_{n}^{\tau}\right){f}_{n}^{\tau}-{\left({f}_{n}^{\tau}\right)}^{3}\right)\Delta t\\ +\sqrt{2\delta}\Delta {W}_{n}^{\tau}\end{array}

(8)

where the \Delta {W}_{n}^{\tau} are independent random numbers drawn from a normal distribution with mean zero and variance Δ*t*.

The morphogen equations (3) are approximated numerically using a cell-centred finite-volume approach to discretise spatial derivatives. We denote by *a*_{
j,k
}(*t*) and *b*_{
j,k
}(*t*) (*j,k* = 1,..., *M*_{
s
}) the average concentration of *a* or *b* in the region *I*_{
j,k
}= [(*j*- 1)*h,jh*] × [(*k* - 1)*h, kh*] at time *t*, where *h* = *L/M*_{
s
}. Equation (3a) becomes

\begin{array}{l}\frac{d}{dt}({a}_{j,k})=\frac{{D}_{a}}{{h}_{2}}({a}_{j-1,k}+{a}_{j+1,k}+{a}_{j,k-1}+{a}_{j,k+1}\\ \phantom{\rule{1em}{0ex}}-4{a}_{j,k})+\frac{1}{{h}^{2}}{\displaystyle \sum _{{\text{x}}_{n}\in {I}_{j,k}}{\alpha}_{a}({s}_{n},{f}_{n})-{\lambda}_{a}{a}_{j,k}}\end{array}

(9)

for 1 ≤ *j, k* ≤ *M*_{
s
}, and similarly for (3b).

Solutions to the continuous equations (3) have logarithmic singularities at the cell centres, as the cells are modelled as point sources. These singularities are regularised via the spatial discretization, which averages all quantities over a grid square, making the strength of autocrine signalling (and that between cells separated by distances which are of the order of *h* or less) dependent on *h*. The discrete equations are stepped forward in time using the Douglas alternating-direction implicit method [83, 84]. The morphogen concentrations *a*(**x**_{
n
},t) and *b*(**x**_{
n
},t) experienced by the *n*-th cell are then taken to be those for the grid square in which its centre, **x**_{
n
}, lies. As the system contains stochastic elements, we perform *M*_{sim} simulation realisations for each set of parameter values.

The simulations were written in ISO C99, using the random number generator of the GSL library [85], and are available as Additional file 1.

### Spatial statistics

#### Pair correlation functions

PCFs are 'second-order' characteristics (involving relationships between pairs of points). We first define them for sets of points which are all of one type, before extending their definitions to the multitype case.

Let Π(*ξ*,*η*) be the probability of finding at least one cell centre in both of the infinitesimally small discs, with centres *ξ* and *η* and areas d*S*_{1} and dS_{2}, respectively. The *product density*[41], *ρ*^{(2)} (*ξ*,*η*), is intuitively defined by Π(*ξ*, *η*) = *ρ*^{(2)} (*ξ*,*η*) d*S*_{1}d*S*_{2} (see [41, 86] for a rigorous definition). If the pattern is translation-independent and isotropic, then *ρ*^{(2)} (*ξ*,*η*) ≡ *ρ*^{(2)} (*r*), where *r* = |*ξ*- *η*|. Let *ρ = N/L*^{2} be the average density of cell centres. Then the PCF (or radial distribution function [87]) is defined by *g*(*r*) ≡ *ρ*^{(2)} (*r*)/*ρ*^{2}, and describes the distribution of distances between pairs of cells.

In the multitype case, for each choice of *X, Y* ∈ {*R*, G}, we define {\rho}_{XY}^{\left(2\right)}\left(\mathbf{\xi},\mathbf{\eta}\right) as for *ρ*^{(2)} (*ξ*,*η*), except that we require the points in *S*_{1} and *S*_{2} to be of types *X* and *Y* respectively. The corresponding *cross pair correlation functions*[88] (or mark PCFs [41], or partial radial distribution functions [87]) are defined by {g}_{XY}\left(r\right)={\rho}_{XY}^{\left(2\right)}\left(r\right)\u2215{\rho}_{X}{\rho}_{Y}, where *ρ*_{
X
}is the density of cells of type *X*.

We estimate PCFs using the approach illustrated in Figure 9; see [41] (p. 284) for more detailed discussion. (Functions pcf for calculating *g*(*r*) and pcfcross for calculating *g*_{
XY
} (*r*) are included in the R package spatstat [79].) A piecewise constant estimate of *g*(*r*) is obtained by dividing the range 0 < *r* < *L* into *M*_{
g
}intervals of equal length *L/M*_{
g
}. Setting *r*_{
j
}*= jL/M*_{
g
}, we approximate *g*(*r*) on *r*_{
k
}< *r* ≤ *r*_{
k
}_{+1} by

g\left(r\right)=\frac{{L}^{2}}{{N}^{2}\pi \left({r}_{k+1}^{2}-{r}_{k}^{2}\right)}\sum _{m=1}^{N}\sum _{n=1,n\ne m}^{N}{I}_{\left({r}_{k},{r}_{k+1}\right]}\left({d}_{nm}\right)

(10)

where *d*_{
nm
}≡ | **x**_{
n
}- **x**_{
m
}|, *I*_{(s,t]}(*r*) is the indicator function on (s,*t*]:

{I}_{\left(s,t\right]}\left(r\right)=\left\{\begin{array}{cc}\hfill 1\hfill & \hfill s<r\le t,\hfill \\ \hfill 0\hfill & \hfill \mathsf{\text{otherwise}}.\hfill \end{array}\right.

(11)

For each cell *m* ∈ {1, 2,..., *N*}, and each interval *k*, we calculate the number of cells in the annular region *r*_{
k
}*< r* ≤ *r*_{k}_{
+
}_{1} centred at **x**_{
m
}, and normalise this by the expected number of cells in an area of this size were the cells to be uniformly distributed. We then average this over all *N* cells. (Smooth estimates of *g*(*r*) can be obtained by using a smoothing kernel in place of the indicator function.) Whilst the above estimate is piecewise constant, in order to show the distribution more clearly, we plot the values calculated as above at the centres of each interval ((*r*_{k}_{
+
}_{1} + *r*_{
k
})/2) (this is linearly interpolated to give a continuous line).

The cross PCFs *g*_{
XY
} are calculated in a similar manner, but the sums for *m* and *n* in (10) run only over cells of types *X* and *Y* respectively, and the normalization constant is {L}^{2}\u2215\left[{N}_{X}{N}_{Y}\pi \left({r}_{k+1}^{2}-{r}_{k}^{2}\right)\right], where *N*_{
X
}and *N*_{
Y
}are the numbers of cells of type *X* and *Y*. As the simulations are initially symmetrical in the two cell fates, we will combine *g*_{
RR
} (*r*) and *g*_{
GG
} (*r*) to give the cross PCF for pairs of cells of the same type, *g*_{
S
} (*r*), defined by

{g}_{S}\left(r\right)=\frac{{\left({\rho}_{R}\right)}^{2}{g}_{RR}\left(r\right)+{\left({\rho}_{G}\right)}^{2}{g}_{GG}\left(r\right)}{{\left({\rho}_{R}\right)}^{2}+{\left({\rho}_{G}\right)}^{2}}.

(12)

We choose to weight the two cross PCFs in proportion to the number of pairs of cells of that type, as *g*_{
S
} (*r*)/*g*(*r*) is then the conditional probability that two randomly selected cells are of the same type, given that they are separated by a distance *r*, divided by the probability that any two randomly selected cells are of the same type \left(\left({\rho}_{R}^{2}+{\rho}_{G}^{2}\right)\u2215{\rho}^{2}\right). We take the arithmetic mean of PCFs over *M*_{sim} realisations with the same parameter values in order to better estimate them.

#### Quadrat histograms

To calculate this statistic, we partition the domain [0, *L*] × [0, *L*] into *M*_{
q
}× *M*_{
q
}squares (or quadrats) with side length *L/M*_{
q
}. We calculate the proportion *p*_{
R
}of cells of type *R* (those for which *f*_{
n
}> 0) in each quadrat, ignoring empty quadrats; we combine the results of *M*_{sim} simulations with the same parameter values to generate a histogram of the distribution of *p*_{
R
}over all quadrats and for all simulations.