The in vivo experiment
In the immunoprevention experiment, BALBneuT mice that have reached 6 weeks of age are exposed to a twice weekly intraperitoneal vaccination cycle followed by two weeks of rest (Chronic protocol) for one year. At the end of the experiment the number of detectable lesions is taken as the final outcome. In the therapeutic experiment, instead of waiting for breast cancer to develop into its later stages (that give rise to lung metastases), metastases are induced by the injection of metastatic cells. Mice in late tumor stages present various problems, such as immune system aging, the presence of a non surgically removable primary tumor and the inability of establishing when the metastatic process starts. Indeed the experimental induction of metastases in tumorfree mice clearly represents a typical scenario in human cancer, i.e., the scenario arising after the surgical removal of the primary tumor.
The therapeutic experiment lasts for 32 days. TuBo mammary carcinoma neu cells (referred to as Neu/H2) are used to induce experimental metastases in syngeneic BALBneuT mice. At day 0 all mice receive an intravenous injection of 2.5·10^{4} metastatic TuBo cells.
This experiment counts three different mice sets: the untreated or control set, "set I", where a protocol composed by a "twice a week" vaccination cycle is started at day 1 and repeated up to the end of the experiment (Triplex+1 protocol), and "set II", where the same cycle is started at day 7 and repeated up to the end of the experiment (Triplex+7 protocol).
Mice from the control set developed ≈ 200 metastatic nodules, whereas mice from set I and II shown a reduction > 99% and ≈ 87% of early lung metastasis formation respectively. Moreover, the structure of metastatic lesions was frequently cribriform and less compact than in the controls.
The three stimuli of the triplex vaccine stimulate the immune system responses in many ways. The IL12 enhances antigen presentation, helper T cell (TH) activation and secretion of interferonγ (IFNγ) by natural killer (NK) and TH cells. IFNγ also has a cytostatic activity on cancer cells (CC) and stimulates granulocytes and macrophages (MP) in infiltrating tumor cell nests in the lungs. TH cells have a major role at the systemic level releasing various cytokines such as interleukin2 (IL2) which enhances cytotoxic T cells (TC) activities and B cells antibodies (Ab) release. The allogeneic MHC favors the recognition by antigen presenting cells (macrophages, B and dendritic cells (DC)), as well as cytotoxic T cells.
General description of the MetastaSim computational model
The MetastaSim model uses an agent based approach to simulate the main features of the immune system. It takes inspiration from the SimTriplex model developed by Pappalardo et. al. [6]. Both the innate and adaptive immune responses are part of the model. In particular the core of the adaptive immune response is represented, and cellular and humoral responses as well as the presence of antigens in the host organism are implemented.
The space is discrete. MetastaSim uses an hexagonal lattice instead of the more familiar square (checkerboard) lattice. This is because the neighbors of a site in a square lattice are of two different types, edges and corners. To have only neighbors of one type we choose the hexagonal lattice where each site has six identical neighbors. The lattice represents the frontal ventral surface of the left lung of mice and it has periodic boundary conditions. The model does not include the presence of lymph nodes. However all the relevant entities and processes that occur in lymph nodes are represented.
Perelson and Oster (1979) [12] proposed a simple quantitative model to understand how large the immune repertoire should be. This model is based on the notion of shape space. In order for a receptor and the molecule that it binds, a ligand, to approach each other over an appreciable portion of their surfaces, there must be extensive regions of complementarity. The constellation of features important in determining binding among molecules is called the generalized shape of a molecule.
One of the key aspects of the model is represented by the use of bitstrings to model receptors. Bitstrings are excellent candidates to describe the concept of shape space. This fundamental modeling abstraction ignores nearly all of the physical details that determine receptor/ligand interactions. However, by adopting character strings, many binding events can be simulated quickly, making it feasible to study largescale properties of the immune system [13].
Although character strings are unphysical, they can produce surprisingly accurate models when benchmarked to experiment [14], suggesting that the abstraction captures important features of receptor/ligand binding.
The timestep for the simulation is Δt = 8 hours, since no other biologically relevant processes for the in vivo experiment can be observed in shorter times. At any time step the interaction and the diffusion processes hold.
During the interaction process the concept of physical proximity is modeled through the concept of latticesite: only entities that lie in the same latticesite can stochastically interact with each others in the same site, so that there is no correlation between entities residing on different sites at a fixed time.
After the interaction phase, entities can move from a latticesite to another one in the neighborhood. Major bone marrow and thymus functions, like the positive and negative selection of immature T lymphocytes before they get into the lymphatic system, are also represented.
Cells and molecules
The model includes the major cell types needed to represent the immune responses elicited by the vaccine. The main involved types of cells and molecules represented are dendritic cells (DC), macrophages (M), natural killer cells (NK), vaccine cells (VC), cancer cells (CC), b lymphocytes (B), cytotoxic lymphocytes (TC), thelper lymphocytes (TH), antibodies (Ab), antigens (Ag), immunocomplexes (IC), interleukin2 (IL2), interleukin12 (IL12) and interferonγ (IFNγ).
More than one entity can be present at the same time in a lattice site. Cells are followed individually throughout the course of an experimental run because their internal states and receptors follow their own, individual life histories.
All cells have MHC class I receptors. APCs (macrophages and dendritic cells) and B cells also have MHC class II receptors. Macrophages and dendritic cells have receptors (i.e., Tolllike receptors) used for recognizing antigens aspecifically. These aspecific receptors are not represented explicitly. B cells, TH cells and Tc cells are endowed with receptors used for binding specific antigens. The B cell receptor (BCR) binds antigen which it can then ingest and endocytose. The T cell receptors (TCR) only bind antigen in MHC/peptide complexes. When B cells become plasma cells, they have no specific receptor, however they produce antibodies with the same receptor shape as the B cell from which they are descended.
With respect to the internal states, it is possible to describe the cells as finite state machines. In particular, they can take a state from a certain set of suitable states and their dynamics are realized by means of statechanges. A state change takes place when a cell interacts with another cell or with a molecule. Molecules are represented in the model as sitespecific concentrations, since they have only a fixed set of properties that does not change throughout the simulation. These molecules include antigens, antibodies and various cytokines.
Antigens can be schematized as molecular structures containing essentially two different portions: epitopes and peptides. Epitopes represent the external part of an antigen that is recognized by, for example, a Bcell receptor whereas peptides represent the internal portion of the antigen that can be bound by an MHC molecule, expressed on a cell surface, and recognized by the appropriate T cell.
Epitopes and peptides are represented in the model by two distinct arrays of binary strings. The minimum antigen is constituted by just two strings, one for the epitope and one for the peptide.
Antibodies are represented in the same way as antigens, i.e., bitstrings. Antibodies contain both foreign (arising from the variable regions) and self (arising from the constant regions) peptides. Lastly, they contain the Fc portion, an epitope identical for all antibodies that is not explicitly represented since it is the same for all of them.
The model also represents some important cytokines which are represented by enumerating their concentration (number of molecules) for each latticesite.
Entities interactions
Every interaction is a complex process that usually involves entities state change. It is possible to distinguish between specific and aspecific interactions. Specific interactions require the presence of specialized receptors for the recognition phase, such as those of B and T lymphocytes. For these kinds of interactions the probability of recognition is defined as a function of the Hamming distance between the receptors, and the affinity of the interaction can be enhanced by adjuvants.
Aspecific interactions entail the use of aspecific receptors, so the recognition phase is not represented explicitly. When two entities that may interact, occur at the same lattice site, they interact probabilistically.
The network of interactions the model uses is represented in figure 1.
Vaccine cells (VC) can be directly recognized by TC cells and NK cells. The Allogenic MHCI molecules favor the interaction with TC cells that proceed to kill them. NK cells can recognize VCs bounded by antibodies. Killed VCs release the tumor associated antigen (TAA) and IL2.
The TAA can be captured by DC, MP and B cells that proceed to internalize the antigen and to present it with the MHCII to TH cells. DC cells can also crosspresent the antigen in conjunction with MHCI to TC cells.
TH cells are activated by the interaction and proceed to release IL2. B cells are also stimulated by interaction with TH cells to differentiate into Plasma cells (P) or, eventually, into Memory B cells. P cells release antibodies (Ab) which can bind both Vaccine and Cancer cells favoring NK killing activity. NK cells can also kill CCs that underexpress the MHC complex.
Abs can also bind the antigen giving rise to Immunocomplexes (IC) that are then phagocyted by MPs. In addition they can activate the complement system with lysis of CCs as result. IL2 promotes TC and TH activities, and B differentiation.
IL12 promotes NK activities and TC activation. Moreover it stimulates TH cells and (to a lesser extent) NK cells to release IFNγ. IFNγ has cytostatic effects on CCs and stimulates both the killing of CCs and the release of IL12 by MPs.
It is easily understandable that the complex network of interactions presented here has many cycles in it, i.e., TH ↔ IL2. These cycles usually involve nonlinearities that the model is however able to handle without any problem.
How simulation proceeds
The first step of the simulation consists of initialization. After cells generation and thymic selection processes, the grid is populated by randomly placing the various cell types in the lattice.
Interactions take place between entities that occupy the same site, and all entities have the opportunity to interact at every timestep. Ideally all the interactions should be executed at the same time. This is obviously not possible. Since the interactions dynamics of a latticesite is not influenced by those of the other sites, the only problem is to decide how to manage interactions that occur at the same site.
At any timestep and for every latticesite an interaction scheme is generated by choosing a random order for the interaction rules, with the interactions executed as defined by interaction scheme. Given a specific interaction A ↔ B between entities of type A and B, every entity of type A is compared with all the entities of type B in the same site until a successful interaction occurs. Then, the next entity of type A is taken into consideration and it is compared again with all the entities of type B in the same lattice site. When all the entities have had their chance to interact, the next interaction rule in the interaction scheme is examined.
It is here that the concentration dependence of the model arises. For example, an APC may only have an antigen binding strength of 0.001, but if there are 100 antigens in the site the APC will have an opportunity to bind each of them. The resulting probability that the APC will actually bind an antigen will be approximately 0.1.
The next step after the interaction phase is represented by entity propagation. All the entities have a chance to move from the lattice site where they are to another one in the neighborhood. All the lattice sites have normally the same probability of being chosen as new positions. However, in a first attempt to mimic chemotaxis, higher probabilities are given to sites containing a congruous number of cancer cells. This "simplifiedchemotaxis" is only active for some entities such as, for example, TCs and MPs.
Other minor processes related to non interactiondriven dynamics such as aging and natural death, differentiation and mutation, are then executed. Time step interactions will proceed up to a preselected final time.
Modeling of the metastatic growth pattern using the Gompertz growth law
The untreated mice scenario represents the development of the metastatic burden in untreated mice. It is the most computationally expensive scenario since, due to the lack of vaccination, the number of cancer cells grow enormously (≈ 10^{7} different cells in each simulation) resulting in a memory intensive application. For modeling the development of the metastatic burden two strictly connected problems arose:
In the in vivo experiment, keeping track of the temporal evolution of nodules sizes is not achievable, because to measure the nodules mice have to be killed and they cannot obviously continue in the experiment. Thus, to represent the temporal growth kinetics of nodules, some assumptions were made. Biologists firstly assumed that every metastatic nodule has originated from an individual "progenitor" cancer cell. This means that approximately only one cancer cell in 100 passes through lung capillary vessels and settles into the lungs showing, indeed, how inefficient the metastatic process is.
We note here that, since the positioning process is not relevant for the experiment, the simulation starts with cancer cells already settled in the lungs. At the beginning of the simulation n "progenitor" cancer cells are then randomly positioned on the lattice.
Preliminary statistical analyses (ChiSquare and AndersonDarling test) on the nodule diameters coming from the left lung of 8 mice under the nullhypothesis of normality or lognormality [15] showed that neither of the two distributions approaches in vivo data.
Observing in vivo results and taking into account the "one cellone nodule" hypothesis, we deduced that distinct groups of cells originating from the same progenitor represent different populations, each one with its own growth rate. Many factors, and in particular a nonhomogeneous distribution of nutrients in lungs, can determine these different growth rates for the nodules.
The reactiondiffusion model for the growth of tumors by Ferreira et. al. showed that using nutrients [16] the growth in time of the cancer approached to a Gompertzian growth. Results by Kendall [15] showed indeed that Gompertzian growth hypothesis fits well with experimental data coming from distributions of human metastases.
Since the distribution of nutrients in the lungs is neither homogeneous nor known, we therefore supposed that nutrients would lead to Gompertzian growths in time and hence we reproduced the growth kinetics of the nodules using the Gompertz growth law [17].
The Gompertz law, introduced in 1825 by Benjamin Gompertz, is a sigmoid function suitable for describing populations growths. The law uses two factors: a growth factor that decreases in time and a constant mortality factor. Thanks to the growth factor deceleration, the dimension of the population tends asymptotically to a certain threshold (the carrying capacity). This model is particularly suitable for the following phenomena:

Mobile phone uptake, where costs were initially high (so uptake was slow), followed by a period of rapid growth, followed by a slowing of uptake as saturation was reached.

Population in a confined space, as birth rates first increase and then slow as resource limits are reached.

Modeling of the growth of tumors, where the approaching of the asymptote usually represents the presence of antiangiogenic growth factors.
Let x(t) be the number of cancer cells at time t, the differential form of the law is given by:
\frac{dx(t)}{dt}=ax(t)bx(t)\cdot \mathrm{ln}\left(x(t)\right)
where a represents the intrinsic grow factor of the tumor, usually connected to the nutrient availability, and b is the mortality factor. Having as an initial condition x(0) = x_{0}, the solution of the equation is:
x(t)={e}^{\frac{a}{b}}{e}^{\left(\frac{a}{b}\mathrm{ln}{x}_{0}\right){e}^{bt}}={x}_{0}^{{e}^{bt}}{e}^{\frac{a}{b}}(1{e}^{bt})
Other formulations use the following form:
x(t)={x}_{0}{e}^{\frac{a}{b}}(1{e}^{bt})
As previously stated, biologists supposed for the problem we are dealing with that every metastatic nodule originates from a single "progenitor" cancer cell (x_{0} = 1). For this reason the coefficient x_{0} can be removed. The model can therefore be simplified to:
x(t)={e}^{\frac{a}{b}}(1{e}^{bt})
To model in silico the same nodule sizes distributions of the in vivo experiment we then decided to use the inverse transform sampling method [18]. Starting from a random variable u uniformly distributed in 0[1], the method allows us to obtain a random variable X distributed according to some desired experimental data.
The algorithm of the method is as follows:

Build the cumulative distribution function F(x) using experimental data;

Consider ϕ(y) = F^{1}(x);

Generate a random value u uniformly distributed in the range [0, 1];

Return r = ϕ(u) distributed according to experimental data.
Supposing that we want to simulate k metastatic nodules, we can now generate k random nodule measures distributed according to experimental data. nodules have a spherical form, we can estimate the number of cells x contained in a nodule with a diameter r as follows:
x=\frac{{\left(\frac{r}{2}\right)}^{3}}{{\left(\frac{d}{2}\right)}^{3}}
where d represents the mean diameter of a TuBo cancer cell.
Starting from x, we have then to find a and b in a such way that, using the Gompertz law, the diameter of the nodule at the end of the simulation should be near to the expected value.
The knowledge of x is not enough for our aim since we have two unknown variables and just one equation, and this leads to an infinite set of possibilities. For this reason Biologists supposed that, at the end of the experiment, the nodules reached only 50% of the maximum diameter they can have before approaching close to their carrying capacity. Applying the logarithm function to both members, the previous equation translates into:
\mathrm{ln}x(t)=\frac{a}{b}(1{e}^{bt})
Since each timestep is 8 hours and the experiment lasts for 32 days, we need t' = 96 timesteps to complete the simulation.
As previously stated, the law also tends asymptotically to its carrying capacity. If we call x* the number of cells of a nodule at its carrying capacity, the following limit holds:
\underset{t\to \infty}{\mathrm{lim}}{e}^{\frac{a}{b}(1{e}^{bt})}={x}^{*}
Moreover, since the term e^{bt}→ 0 for t → ∞, we can suppose that for t ≫ t' the following statement holds:
\frac{a}{b}\approx \mathrm{ln}{x}^{\ast}
Proceeding with substitution we have:
\frac{\mathrm{ln}x}{\mathrm{ln}{x}^{\ast}}=(1{e}^{b{t}^{\prime}})
and finally:
\begin{array}{l}b=\frac{\mathrm{ln}\left(1\frac{\mathrm{ln}x}{\mathrm{ln}{x}^{\ast}}\right)}{{t}^{\prime}}\\ a=\left[\frac{\mathrm{ln}(x)\cdot b}{1{e}^{b{t}^{\prime}}}\right]\end{array}
The obtained k parameters (a_{
k
}, b_{
k
}) are associated at timestep 0 to the k randomly positioned cancer cells. At any timestep the Gompertz law is used to compute the duplication rates that cancer cells belonging to the specific nodule should have.
In particular, starting from the differential form of the law
\frac{d{x}_{j}(t)}{dt}={a}_{j}x(t){b}_{j}{x}_{j}(t)\cdot \mathrm{ln}\left(x(t)\right)
we discretize the equation with the forward Euler method obtaining
{x}_{j}^{t+1}{x}_{j}^{t}=\Delta t({x}_{j}^{t}({a}_{j}{b}_{j}\cdot \mathrm{ln}({x}_{j}^{t}))).
Having discrete timesteps, we can suppose Δt = 1. If we divide all the members by {x}_{j}^{t} we have:
{w}_{j}=\frac{{x}_{j}^{t+1}{x}_{j}^{t}}{{x}_{j}^{t}}=({a}_{j}{b}_{j}\cdot \mathrm{ln}({x}_{j}^{t}))
Where the index j identifies the nodule (or the population) j, j ∈ 1 ... k, w_{
j
} represents the duplication ratio for all the cells belonging to the nodule j and {x}_{j}^{t+1}and {x}_{j}^{t} are the number of cancer cells of the population j at timestep t + 1 and t, respectively.
Tuning and validation of the model
Prior to use the model as in silico wet lab, it has to be tuned and validated against existing in vivo experiments. All models have a certain number of parameters which can be freely chosen in a certain range. Biological knowledge has been used to guess reasonable initial ranges. Then fine tuning has been done in such a way that in silico experiments fit in vivo ones. For the reproduction of the metastatic growth pattern it has been possible to utilize experimental data on the distribution of nodules in sizes (diameters) coming from untreated mice. Thus for all scenarios the number of nodules at the end of the "in vivo" experiment has been used.
MetastaSim describes the metastatic growth pattern and immune response elicited by the vaccine against it for a single mouse. Parameters tuning must entitle as result that the simulator, applied to mice with different vaccine protocols, gives as result a reliable representation of the "in vivo" experiment. Note that immune system behavior should agree with biological knowledge. For this reason major entities mean plots were also submitted to biologists and checked for their approval.
The tuning procedure was done using few, randomly selected, individual mice. Parameters are varied under a certain range and then simulations are executed on the sample set. Obtained results are then checked. When a reasonable tuning has been found, "in silico" validation of the model has been done using the following experimental procedure:

Generate a large population of individual mice, each one with a different random seed which will determine different probabilistic chain events.

Randomly extract from the population two statistical samples of 100 individual mice to perform numerical experiments.

Simulate all the scenarios on the two sample sets.
Note here that none of the mice used during the tuning procedure has been used also for validation.
The treated scenarios
The tests have been executed on two different mice samples, each one composed by 100 virtual mice. For every mice set the untreated mice as well as the two treated mice scenarios (vaccination started at day 1 with Triplex+1 protocol and at day 7 with Triplex+7 protocol respectively) have been reproduced. Since MetastaSim simulates the frontal ventral surface of the left lung of mice, the initial number of metastases already settled in the lungs at the beginning of the simulation has been estimated using experimental data and has been set to 35. The number metastases has been taken as the outcome of the simulations for all the scenarios.
Median values relative to the two treated mice scenarios have been then compared against those coming from the untreated mice scenario and the percentages of prevented metastases (in respect to the untreated scenario) have been estimated, thus following the same procedure used "in vivo" [8].
Evaluation of "insilico" predicted protocols
The MetastaSim simulator usually requires no more than 5 minutes to test a protocol on a single mouse. An upper bound for the required total CPU time is ≈ 100·512·5 ≈ 177.8 days on a single cpu computer. It is possible to overcome this time limit by running multiple simulations at the same time using an HPC infrastructure. Resources from the CINECA Italian supercomputing center (BCX cluster) have been then used for the purpose, launching 8 different MPI jobs (each one requiring 128 cores) for approximately 4 hours. The total employed time has been ≈ 4000 hours, which corresponds to 167.7 days on a single cpu computer.
In order to establish whether a protocol is better than another, an order criterion (a fitness or rank function) must be used. For this reason the following rank function has been used:
Let N* ⊂ R be the set which contains all the possible values N_{
i
} described before, and let T ⊂ R be the set of all possible number of injections T_{
i
} for each protocol i. Each protocol i can be described as a point (N_{
i
}, T_{
i
}) in the space (N*, T). Ideally the optimum protocol lies in the origin (0, 0), since it is able to entitle 0 metastatic nodules with 0 injections. Of course such a protocol does not exist. However the best protocols will be the ones whose distance from the origin is minimum. The following sort criterion is then used:
{r}_{i}=\sqrt{{N}_{i}^{2}+{({k}_{3}\cdot {T}_{i})}^{2}}
where k_{3} is a fixed constant used to properly scale T. The goal of this exhaustive search is to find protocols with the same protection rate as "Triplex+1" protocol but with less injections, so protocol prevention rates must be favored in respect to the number of injections. For this reason k_{3} has been empirically set to 15.