 Methodology Article
 Open Access
 Published:
pSpatiocyte: a highperformance simulator for intracellular reactiondiffusion systems
BMC Bioinformatics volume 21, Article number: 33 (2020)
Abstract
Background
Studies using quantitative experimental methods have shown that intracellular spatial distribution of molecules plays a central role in many cellular systems. Spatially resolved computer simulations can integrate quantitative data from these experiments to construct physically accurate models of the systems. Although computationally expensive, microscopic resolution reactiondiffusion simulators, such as Spatiocyte can directly capture intracellular effects comprising diffusionlimited reactions and volume exclusion from crowded molecules by explicitly representing individual diffusing molecules in space. To alleviate the steep computational cost typically associated with the simulation of large or crowded intracellular compartments, we present a parallelized Spatiocyte method called pSpatiocyte.
Results
The new highperformance method employs unique parallelization schemes on hexagonal closepacked (HCP) lattice to efficiently exploit the resources of common workstations and large distributed memory parallel computers. We introduce a coordinate system for fast accesses to HCP lattice voxels, a parallelized event scheduler, a parallelized Gillespie’s directmethod for unimolecular reactions, and a parallelized event for diffusion and bimolecular reaction processes. We verified the correctness of pSpatiocyte reaction and diffusion processes by comparison to theory. To evaluate the performance of pSpatiocyte, we performed a series of parallelized diffusion runs on the RIKEN K computer. In the case of fine lattice discretization with low voxel occupancy, pSpatiocyte exhibited 74% parallel efficiency and achieved a speedup of 7686 times with 663552 cores compared to the runtime with 64 cores. In the weak scaling performance, pSpatiocyte obtained efficiencies of at least 60% with up to 663552 cores. When executing the MichaelisMenten benchmark model on an eightcore workstation, pSpatiocyte required 45 and 55fold shorter runtimes than Smoldyn and the parallel version of ReaDDy, respectively. As a highperformance application example, we study the dual phosphorylationdephosphorylation cycle of the MAPK system, a typical reaction network motif in cell signaling pathways.
Conclusions
pSpatiocyte demonstrates good accuracies, fast runtimes and a significant performance advantage over wellknown microscopic particle methods in largescale simulations of intracellular reactiondiffusion systems. The source code of pSpatiocyte is available at https://spatiocyte.org.
Background
Intracellular space plays an important role in many biochemical systems operating in the timescales of minutes to hours such as cell signaling [1], division [2], polarization [3], morphogenesis [4] and chemotaxis [5, 6]. These systems are regulated by the spatiotemporal dynamics of molecules. Recent quantitative biology methods can obtain highresolution spatiotemporal measurements of the molecules. These disparate sources of measurements can be combined and interpreted using spatially resolved simulators to construct models of the systems that are realistic and consistent with physical principles. By simulating the models, detailed analysis of the systems can be conducted in silico and future experiments can be designed [7].
The choice of spatial simulators largely depends on the timescale and spatial resolution of the system of interest [8–13]. For example, highperformance molecular dynamics (MD) simulators [14] can accurately capture the atomistic behavior of up to several macromolecules in a system but are limited in their timescale, allowing only simulations for up to a few milliseconds [15, 16]. It is thus not feasible to use MD for example, to simulate cell signaling systems such as the mitogenactivated protein kinase (MAPK) cascade, which takes place at the cellular scale with timescales spanning minutes to hours [17, 18]. For these longer spatial and temporal scales, numerical methods that solve partial differential equations (PDEs) or coarsegrained stochastic particle simulation methods can be used. PDEbased tools such as the freely available Virtual Cell [19] and the commercially available COMSOL Multiphysics (COMSOL Inc.) are useful when the system of interest is deterministic. They are especially fast and convenient when simulating molecules with very high copy numbers. Conversely, particle methods are preferable when we need to account for the noise and fluctuations arising from low copy number of reacting molecules in the cell [20].
Latticebased particle methods based on the reactiondiffusion master equation (RDME) have the advantage to simulate a large number of diffusing molecules for extended spatiotemporal scales [21–25]. However, since RDME methods represent molecules as dimensionless point particles in lattice voxels, they do not directly capture the effects of excluded volume brought by intracellular macromolecular crowding [26]. About 20–30% of the total volume inside cells are occupied by macromolecules [27]. This amount of crowding has been shown to affect reaction equilibria both in vivo and in vitro, and alter protein binding and gene expression characteristics [28–31]. Moreover, crowded media can also cause nonintuitive effects such as molecules performing directed motion [32], and a change in the statistics of molecular number fluctuations in simple reactions [33]. To capture the effects of volume exclusion in systems that are in equilibrium, Cianci and colleagues [34] recently reported a modified version of the RDME method called vRDME. Despite this enhancement, RDMEbased methods are still constrained when simulating diffusionlimited reactions and rebinding events [35–37] because they assume molecules to be wellmixed in each voxel.
Offlattice microscopic particle methods such Smoldyn [38], eGFRD [36], SpringSaLaD [39] and ReaDDy [40] can capture the effects of crowding directly because each molecule is represented individually with spherelike physical dimensions. These simulators also support different sizes of volume excluding molecules. ReaDDy and SpringSalaD can also account for the coarse shape of molecules. However, because of these additional details, microscopic particle methods are more computationally demanding than RDME methods. In a recent performance benchmark of the microscopic methods [41], Smoldyn required the shortest runtime when simulating the wellknown MichaelisMenten reactiondiffusion kinetics.
Spatiocyte [42] is another microscopic method but molecules diffuse on lattice by hopping from one voxel to another. The current stable version of Spatiocyte accounts for volume exclusion by allowing only a single molecule to occupy a voxel at a time. The maximum size of a molecule is roughly equals to the size of a voxel [43]. The method therefore captures steric interactions most realistically when the size of volume excluding molecules is the same and almost equals to that of a voxel. To better simulate the effects of crowding, there is also a development version of Spatiocyte that allows a single molecule to occupy multiple voxels according to its size (unpublished). In Spatiocyte, fine and fastdiffusing molecules such as messengers, metabolites and ions are simulated at the compartment scale using the NextReaction method [42, 44]. The accuracy and consistency of Spatiocyte have been validated in detail recently in both volume [43] and surface [45] compartments. Spatiocyte achieves better execution time scaling behavior compared to other methods [43] because it resolves molecular collisions by looking only at the target voxel for occupancy. At the typical intracellular protein concentration range, the performance of Spatiocyte is comparable to Smoldyn when molecules are represented as point particles. On the other hand, when the molecules have physical dimensions, the runtime of Spatiocyte is at least two times faster than Smoldyn. These recent performance benchmarks [41, 43] imply that at present Spatiocyte is one of the fastest methods for simulating individual molecules in crowded systems.
There have been several efforts in the past to improve the performance of particle simulation methods using parallelization approaches. RDME methods have been accelerated on Graphics Processing Units (GPUs) [46–48] and CPU clusters [49]. The GPUbased implementations of RDME can simulate up to two orders of magnitude faster than the serial version on CPU. Chen and De Schutter [49] used Message Passing Interface (MPI) to run a neuron model and achieved 500fold speedup on a cluster with 1000 processes. Microscopic methods such as ReaDDy [50] and Smoldyn [51, 52] have also been parallelized on GPUs. The performance gain of ReaDDy was up to 115 times over its serial counterpart on CPU. The GPU versions of Smoldyn required between 135 to 200fold shorter runtimes than the original CPU implementation. Recently, the ReaDDy method was extended to run using multiple threads in parallel on CPU [53]. It showed 6fold reduction in the simulation time when running with 6 threads compared to the serial implementation.
Here we introduce a parallel implementation of the Spatiocyte method, called pSpatiocyte. The new algorithm was written bottomup in C++ and MPI to exploit the resources of conventional workstations and massively parallel computers for highperformance simulations of large or crowded cell models. We demonstrate efficient simulations of reactiondiffusion systems at the microscopic scale with volume occupying molecules to recapitulate the crowded nature of intracellular media. We achieve scalability over 500,000 CPU cores on a distributed memory architecture.
In the following section we describe the parallelization schemes and numerical implementation of pSpatiocyte. We then provide computational results that validate parallel diffusion and reaction processes. We also demonstrate the performance of pSpatiocyte on the RIKEN K computer with thousands of cores and on a common workstation with eight cores. We show the applicability of pSpatiocyte in actual biological problems by simulating the dual phosphorylationdephosphorylation cycle of MAPK. Finally, we conclude by providing a summary of the validation and performance results, and future directions of this work.
Methods
In a Spatiocyte model, a molecule of a species s_{i} diffuses in space by performing random walk on lattice from one voxel to a nearest neighbor voxel. The diffusion interval, \(\tau _{d}^{i}\) between two successive walks is determined by the diffusion coefficient D_{i}. When the molecule collides with a molecule of a reactant species s_{j} in the target voxel, they perform a bimolecular reaction with an acceptance probability, W_{ij}. W_{ij} captures the intrinsic reaction rate k_{ij} of the pair according to the SmoluchowskiCollinsKimball (SCK) model [54, 55]. The accuracy and consistency of the bimolecular reaction on lattice in both activationlimited and diffusionlimited regimes have been verified [43, 45]. To represent volume occupying molecules, a voxel can be occupied by only a single molecule at any given time. As a result, if the diffusing molecule meets a nonreactive molecule in the target voxel, a collision occurs and it remains in its original voxel. In the following subsections, we describe the parallelization schemes of the Spatiocyte method in detail.
Coordinate system
Spatiocyte adopts the hexagonal closepacked (HCP) lattice arrangement (Fig. 1a) as it supports the highest density of sphere voxels in a given volume [56]. For comparison, the average density of HCP voxels is 74.048%, whereas the more commonly used cubic lattice has a density of 52.359%. The highest density of voxels is preferable because it allows the simulator to represent highly packed and crowded regions in a compartment to its maximum theoretical limit. Moreover, it was recently demonstrated that the voxels in HCP lattice need to be only about 2% larger than the molecule for the simulations to be consistent with the SCK model [43]. This is in contrast to the cubic lattice, which requires the voxel size to be at least 8% larger. The HCP lattice therefore, can more closely represent hardsphere molecules in space. For twodimensional (2D) planar simulations, Spatiocyte employs the triangular lattice arrangement, which is a plane of the HCP lattice [45]. Grima and Schnell [57] have previously shown that simulations on a triangular lattice are closer to Brownian dynamics and produce less discretization error than on square lattice. Simulations on square lattice also overestimate macromolecular crowding effects compared to the triangular lattice.
Although HCP lattice has a regular grid arrangement of voxels, some considerations are necessary to define the coordinate axes to access the voxels. In Fig. 1b, the axes i and j aligned to the voxels in a plane of HCP lattice make up a rhomboid instead of a rectangle. The third axis k is also tilted when aligned to the voxels across the planes of the lattice. Such unusual axes arrangement requires substantial computational steps to convert the integer coordinates of a voxel into real coordinates. A data structure without a coordinate system can also be used to identify and access neighbor voxels. For example, a onedimensional array of voxels can be used, wherein each voxel has pointers to its 12 nearest neighbors. The serial version of Spatiocyte adopts this scheme [42]. From our performance profiling results, the additional memory required to store 12 pointers per voxel and indirect memory accesses to load neighbor voxel data adversely impacts performance because of increased memory bandwidth usage and cache misses. In addition, with an unconventional grid it can also be cumbersome to split the computational domain spatially into smaller subdomains (green, red, orange and blue regions in Fig. 1b) for parallel execution by processes.
To overcome these issues, we propose a coordinate system called twisted Cartesian as depicted in Fig. 1c. It comprises a straight and two zigzag axes. The figure illustrates how each of the 12 neighbors is identified with these axes. Depending on whether the voxel plane, k is even or odd numbered, one of two procedures is used in i and jaxes to identify a neighbor voxel. The two procedures are shown by the top and bottom panels of Fig. 1c. This coordinate system can be readily mapped onto a conventional Cartesian coordinate system without almost any modification and enables straightforward programmability. Despite being unconventional, the twisted Cartesian coordinate system works well for identifying and accessing neighbor voxels pointerfree by only using conditional statements. In a preliminary implementation of pSpatiocyte, we used the system successfully for parallelization [58]. Recently, a similar approach was adopted for cellular automata simulations in 2D space [59].
Parallelized Spatiocyte algorithm
The Spatiocyte method advances the simulation time, t_{s} in discrete steps using an event scheduler [42, 60]. The scheduler gets the next event, e to be executed from a priority queue Q, which contains events sorted according to scheduled times, t_{e}. The types of events that can be defined in a model include walk (performs diffusion and bimolecular reactions), unimolecular reaction, species numbers logger and molecule coordinates logger. Upon execution, an event returns the interval, τ_{e} for its next execution. The next execution time, t_{e}=t_{s}+τ_{e} is then passed to the priority queue to reschedule the event. The scheduler executes all events in a loop until the simulation end time, t_{end} is reached.
The pSpatiocyte method is a parallelized version of Spatiocyte and is completely written in C++. The method is parallelized using the domain decomposition approach illustrated in Fig. 2a. With this approach, the complete lattice space is divided equally into N subdomains to be executed concurrently by N processes. To minimize synchronization overheads between processes, the scheduler and events are duplicated across all processes at initialization, which is described in Algorithm 1. The scheduler of each process then executes the main loop of the simulation in parallel according to Algorithm 2. The simulation proceeds synchronously over all processes by ensuring that (1) the scheduled execution time of each event is identical across processes; and (2) each process synchronizes with adjacent processes when molecules from local subdomain walk or react across adjacent subdomains. We describe how these two conditions are satisfied by each event in the following subsections.
Parallelized walk event
Einstein [61] and von Smoluchowski [62] have independently shown that small particles in onedimensional (1D) system perform Brownian walk with a rootmeansquared displacement of \(\sqrt {2Dt}\), where t is the interval between walks and D is the diffusion coefficient of the particles. The 1D relation can be expressed in threedimensional (3D) space with the meansquared displacement (MSD) given as 6Dt. Similarly, in the 3D HCP lattice space, the displacement of a molecule of species s_{i} within a diffusion interval \(\tau _{d}^{i}\) must be consistent with its diffusion coefficient D_{i}. Since the molecule displacement over the interval is equivalent to the voxel diameter, we can use the MSD relation in 3D to obtain the interval, \(\tau _{d}^{i}=2r_{v}^{2}/(3D_{i})\), where r_{v} is the voxel radius. We have previously shown that this approach is accurate for modeling bimolecular reactions on HCP lattice when r_{v}≈1.0209R, where R is the molecule radius [43].
Bimolecular reactions are handled by the walk event because they take place upon the collision of two reactant molecules on lattice during diffusion. Since the reaction acceptance probability, \(W_{ij}=k_{ij}/(6\sqrt {2}(D_{i}+D_{j})r_{v})\) is inversely proportional to the diffusion coefficients of reactants, highly diffusionlimited reactions can cause W_{ij}>1 [42, 43]. To address this inaccurate condition, we first determine the maximum W_{ij} of all bimolecular reactions involving s_{i} and assign it as ρ_{i}. Then, we obtain the walk probability,
where P_{i} is a userdefined upper limit of reaction acceptance probabilities and 0<P_{i}≤1 (by default, P_{i}=1). Next, we rescale the reaction probabilities to W_{ij}=W_{ij}α_{i}. Finally, we obtain the walk interval as \(\tau _{e}^{i}=\tau _{d}^{i}\alpha _{i}\). This walk interval is fixed and identical across all processes throughout the simulation procedure. With the above bimolecular reaction scheme, we have previously shown that the rebindingtime probability distribution of a reactive molecule pair on HCP lattice agrees well with continuum theory (see Section III.A of [43]). We have also verified the accuracy of the reaction rate coefficient and its timedependent behavior by comparison to SCK theory. Consequently, reactions that are not highly diffusionlimited (W_{ij}≪1, α_{i}=1) will have a smaller impact on the simulation performance since they will be far fewer than diffusion steps in the microscopic lattice space.
One of the difficult problems in parallelizing stochastic diffusion and reaction events is maintaining consistency at subdomain boundaries during simulation time steps. Processes should take careful consideration when accessing or writing to voxels residing in adjacent subdomains since they are also simultaneously accessible to the adjacent processes. Figures 2a and b illustrate our scheme to achieve consistency during walk and reaction events. We define the voxels at the edge of a subdomain and adjoining other subdomains as out voxels. We add a virtual set of voxels called ghost voxels locally in each subdomain to represent the current state of out voxels residing in adjacent subdomains. With updated ghost voxels, molecules in a subdomain can walk and react across subdomains seamlessly in a time step without requiring many interprocess synchronization requests. At the end of a walk event, the state of out voxels in adjacent subdomains will be updated to reflect the state of local ghost voxels. Since in a walk event a molecule can at most hop to or react with a molecule in one of its immediate neighbor voxels, only a single layer of ghost voxels is necessary to encapsulate local out voxels (Fig. 2b).
To ensure the updated state of ghost voxels remains valid until the end of the walk event, we further divide the subdomain equally into eight subvolumes and execute each subvolume synchronously with all processes. In Fig. 2b, four of the subvolumes are shown for each subdomain. Only the ghost voxels belonging to the selected subvolume is updated before executing the molecules in the subvolume. This scheme ensures out voxels in adjacent subdomains are isolated and free from modification when their corresponding ghost voxels in the local subvolume are accessed.
Algorithm 3 provides the complete pseudocode of the walk and bimolecular reaction procedure in a subdomain. For the walk event, pSpatiocyte uses two random number generators. The first generator is initialized with a seed that is unique to each process, whereas the second generator is initialized with a global seed. With the globally seeded generator, a random number that is drawn locally will be the same for all processes. This scheme reduces communication cost when we need a common random number for all processes. Unless stated otherwise, all random numbers are drawn using the locally seeded generator from a uniform distribution with the interval [0,1). Both generators use the Mersenne Twister algorithm [63] to generate random numbers.
The walk event executes the random walk of all molecules of a species when it is called. For each molecule m of species s_{i}, a random target voxel, v_{1} out of 12 neighbor voxels is first selected. If v_{1} is a ghost voxel, m is appended to a list \(M_{i}^{g}\), containing molecules targeting ghost voxel, while v_{1} is added to V_{g}, a list of the targeted ghost voxels. However, if v_{1} is not a ghost voxel and is vacant, a random number r is drawn. If r is less than or equals to the species walk probability α_{i}, the walk is successful and m is moved to v_{1}. Otherwise, if v_{1} contains a reactant pair of species s_{j}, then a random number r is drawn. If r is less than or equals to the bimolecular reaction probability W_{ij}, then the reaction is performed. If v_{1} is instead occupied by a nonreactive molecule, a collision occurs and m stays in its current voxel.
After completing the above procedure for all molecules of s_{i}, a list containing the execution order of eight subvolumes is randomly shuffled using the globally seeded random number generator. For each subvolume V_{h} in the ordered list, we first update its ghost voxels by loading the state of corresponding out voxels from adjacent subdomains. The update is performed using the MPI Sendrecv function. Next, for each molecule m in \(M_{i}^{g}\) that is in the current subvolume V_{h}, we get its target ghost voxel v_{1} from V_{g}. If v_{1} is vacant, a random number r is drawn. If r is less than or equals to the species walk probability, α_{i} then the molecule is moved to the target voxel v_{1}. Otherwise, the walk fails and the molecule stays in its voxel. If v_{1} is instead occupied by a reactive pair of species s_{j}, then a random number r is drawn. If r is less than or equals to reaction acceptance probability W_{ij}, the reaction is executed. Otherwise, the reaction fails and the molecule stays in its voxel. After executing all the molecules in the subvolume, we update the out voxels in adjacent subdomains with the state of their corresponding ghost voxels using MPI Sendrecv function. The process then repeats the above procedure with the next subvolume in the ordered list and continues until all subvolumes have been executed.
Note that the walk event in each subvolume is performed locally at all times by each process. Although mutual exclusion is naturally realized by this scheme, interprocess communication is performed once at the beginning and again at the end of each subvolume execution. This scheme of mutual exclusion however does not necessarily require eightfold communication requests because the number of voxels to be sent or received is also reduced in proportion to the subvolume boundary surface area. We have confirmed the effectiveness of this method with at least a few thousand processes [58].
Another problem common in latticebased parallel computations is the communication at the subdomain vertices. Out voxels located at the vertices are accessed by many more processes than at other locations. Generally, latency has the most impact during the short communications required at these voxels. In addition, contention between requests tends to occur because of the limited bandwidth or the number of available communication channels. From the viewpoint of strong scaling, the communications will show poor performance especially when they involve such a large number of processes. To overcome these constraints, we employed the threestage communication scheme, which is wellestablished and described previously [64]. With this scheme, it is sufficient for each process to communicate with six directly adjacent processes in three consecutive stages. In our implementation, we adapted the scheme to update ghost and out voxels when executing each of the eight subvolumes. An example of the communication scheme is illustrated in Fig. 2b.
Parallelized unimolecular reaction event
The sequential Spatiocyte method employs the NextReaction method for unimolecular reaction events [42, 44]. However, in the pSpatiocyte method, we have adopted the Gillespie’s directmethod [65] to execute the events in parallel because of its simplicity. In the directmethod, the propensity for a unimolecular reaction, \(\mathrm {A} \overset {k_{j}}{\rightarrow } \mathrm {B}\) is a_{j}=k_{j}N_{A}, where N_{A} is the number of A molecules in the volume. If there are m unimolecular reaction channels, the total propensity is given by
The interval, τ_{e} for the next reaction event is expressed as
where r_{1} is a random number drawn from a uniform distribution between 0 and 1. At the end of the interval, the reaction channel u, out of the total m channels is selected to be executed such that
where r_{2} is another random number from the uniform distribution.
We have parallelized the directmethod using two simple schemes. First, in each process, we use MPI Allgather to get the local propensities, a_{l} from all subdomains and sum it locally to get the global propensity, a_{g}. Second, we use the globally seeded random number generator to draw r_{1} and r_{2} to determine τ_{e} and u, respectively. With these two schemes, τ_{e} and u will be identical for all processes without additional synchronization requests. The pseudocode of the parallelized directmethod is given in Algorithm 4.
At initialization, the scheduler gets the next time to execute the unimolecular reaction event, t_{e}=τ_{e} by calling the get_direct_method_new_interval function and schedules it in the priority queue, Q. The new interval is calculated from Eq. (3). In the main loop of the simulation, if a walk event of a unimolecular reactant species has been called, the number of reactant molecules may have changed. Therefore, at the end of the walk event, the next time of the reaction event is updated by calling the get_direct_method_next_interval function. It gets the updated interval from the scaling expression, τ_{e}=a_{0}(t_{e}−t_{s})/a_{g}, where a_{0} and a_{g} are the old and new global propensities, respectively, t_{e} is the old scheduled time and t_{s} is the the current simulation time. Finally, the scheduler executes the reaction event at the scheduled time by calling the react_direct_method function and reschedules it using a new interval from get_direct_method_new_interval. The react_direct_method function selects the reaction channel to be executed according to (4).
When the reaction channel is executed, if there are two product molecules, one of them will replace the reactant in its current voxel. Another random vacant voxel from the 12 nearest neighbors of the reactant will be selected to occupy the second product. When the compartment or the region near the reactant is highly crowded, no vacant voxel may be found for the product. In the original Spatiocyte method, this will result in a failed unimolecular reaction. We have also adopted this approach for the pSpatiocyte method. In addition, for such a highly crowded scenario, we have added an option in the model to randomly vacate one of the neighbor voxels of the reactant and place the second product in it. The voxel can only be vacated if the molecule occupying it is a mobile (diffusing) species. The voxel is first vacated by moving the molecule to a vacant neighbor voxel. If there is no vacant voxel available for the molecule, the procedure is repeated with another randomly selected neighbor molecule of the reactant. The reaction fails if none of the nearest neighbors of the reactant can be vacated for the second product.
Parallelized logger events
Two types of logger events are available to save the snapshots of pSpatiocyte simulation. The species numbers logger event saves the number of molecules of each species in a commaseparated values (CSV) file when called by the scheduler. At initialization, the log interval is fixed and duplicated across all processes, ensuring that the scheduled execution time of the logger event is always identical across processes. During simulation, each process writes the molecule numbers available in its subdomain into a local file, thus avoiding interprocess communication. A Python script is provided to gather and sum all the numbers from the process files into a single conventional CSV file after the simulation.
The coordinates logger is implemented the same way as the numbers logger but it saves the unique identity number (ID) and integer (voxel) coordinates of each molecule within its subdomain. The ID of a molecule is persistent across subdomains, hence it is possible to track the trajectory of each molecule throughout the simulation. The logger can also save the coordinates of out and ghost voxels for debugging purposes. A Python script gathers and converts the integer coordinates into real 3D coordinates before saving them into a CSV file.
Results and Discussion
To confirm the consistency of pSpatiocyte in terms of physical accuracy, we validated diffusion and reaction processes when running on all eight cores of a workstation with Intel Core i99900K CPU (8 cores, 5 GHz maximum processor frequency), 64 GB memory and Ubuntu 19.10 operating system. Parallel performances of diffusion were examined across thousands of computational cores of the K computer [66]. pSpatiocyte performance was also compared with Spatiocyte, Smoldyn and ReaDDy when simulating the benchmark enzymatic reaction model on the workstation. Finally, the parallelized simulation outcomes of the well known MAPK model are provided as an application example. All simulations were performed on lattices with reflective boundaries.
Validation of diffusion
We initially observed the trajectories of molecules diffusing across subdomains to verify the coordinates logger, interprocess communications and the overall simulation algorithm. Figure 3a displays the trajectories of five molecules diffusing with D=0.06 μm^{2}s^{−1} for 10 s. The simulation was executed with eight processes, employing all cores of the workstation. At the beginning of the simulation, the molecules were placed randomly in a compartment volume of 10 μm^{3} with lattice dimensions 476^{3}, divided into eight subdomains. All trajectories in Fig. 3a appear consistent with molecules performing Brownian motion.
We then validated the consistency of pSpatiocyte in reproducing the correct diffusion behavior in a dilute volume that is equally distributed to eight processes. The MSD of a molecule performing random walk in 3D space is given as 6Dt. We monitored the diffusion of a single molecule placed at the center of a 960^{3} lattice with voxels measuring 2.5 nm in radius. No other molecules were present on the lattice. We then performed random walks repeatedly with the same initial conditions aside from the random number generator seed. 100 random walks were performed for 40 ms and the average MSDs were computed from the ensembles. Figure 3b shows the loglog plots of the results for three different diffusion coefficients. The slopes, the vertical distances, and the absolute values coincide well with the expected theoretical lines.
When a compartment is crowded with obstacles as in the cell [67], the effective rate of diffusion is expected to decrease. To evaluate if pSpatiocyte is able to replicate the rate reduction when running with eight processes, we obtained the MSD of a diffusing molecule in a compartment occupied by immobile crowder molecules. The fraction of compartment voxels occupied by the crowder molecules is given by ϕ. We evaluated three different ϕ conditions, with 1000 independent simulation runs for each condition. Each run adopted a unique seed for drawing random numbers. Hence, the random placement of immobile crowders at initialization was different in each run. The averaged MSDs and the fitted effective diffusion rates are displayed in Fig. 3c. As expected, a significant decrease in the diffusion rate corresponding to an increase in ϕ is clearly observed. Further detailed analysis is required to compare the effective diffusion rates in crowded condition on HCP lattice with the rates in continuous space as reported by Novak et al. [68].
Validation of reactions
We validated parallelized irreversible and reversible reactions separately because they have distinct underlying physics.
Irreversible reaction
A unimolecular reaction given by
is irreversible. In pSpatiocyte, the reaction is executed according to the parallelized Gillespie’s directmethod. We applied three different reaction rates, k in an uncrowded volume and compared the results with that of an ordinary differential equation (ODE) solver. The simulation was executed on eight CPU cores with parameters D=10 μm^{2}s^{−1}, voxel radius r_{v}=5 nm and 960^{3} lattice size. The initial number of reactant molecules was 64,000. Figures 4a and b show the simulation results of the reactant and products, respectively. In all cases, the outcomes of simulation agree very well with the ODE solver.
We investigated the effects of crowding on the dissociation rate of (5), with the two different ways of finding a vacant voxel for the second product molecule. In the original approach, the reaction fails if all neighbor voxels of the reactant are occupied. In the second approach, if they are all occupied, the simulator attempts to vacate one of them. The reaction also fails if no voxels can be vacated for the product. Figure 4c shows the loglog plots of the reactant concentration using the two approaches when ϕ is between 0.5 and 1.0. At ϕ=1.0, all voxels of the HCP lattice are occupied, giving about 74% volume occupancy. With the original approach, the dissociation rates agree well with the ODE result up to ϕ=0.7, which translates to about 52% volume occupancy. In the second approach, the rates are comparable up to ϕ=0.9. In vitro results of crowding experiments showed that the dissociation rate of molecules are unaffected even when the volume occupancy reaches 30% [69]. However, it is still unknown if occupancies above 50% would affect the dissociation rate as we have found with our original approach. The results show that the original approach is sufficient for simulating the estimated 30% volume exclusion in the cytoplasm [70]. In cases where the dissociating molecules are much smaller that the voxel size, such as messengers, metabolites and ions, the sequential version of Spatiocyte simulates them at the compartment scale using the NextReaction method. Since this feature is not yet supported by pSpatiocyte, we leave it for future work
Reversible reaction
A bimolecular reaction is dependent on the diffusion of reactant molecules because the reaction takes place as the molecules meet and collide in space. For simplicity, we considered a forward bimolecular reaction with a single product and a reverse unimolecular reaction, as an example of reversible reaction,
where k_{f} and k_{r} denote the effective forward and reverse reaction rates, respectively. The effective forward rate can be converted to the intrinsic rate that is used by pSpatiocyte,
where k_{D}=8πr_{v}(D_{B}+D_{C}) [43]. D_{A} and D_{B} are the diffusion coefficients of B and C, respectively. The intrinsic reverse reaction rate is given as
The forward reaction is executed by the walk event when molecules of the reactant species collide on lattice, whereas the reverse reaction is performed by the unimolecular reaction event. The parameters of the simulation include k_{f}=2 μm^{3}s^{−1}, k_{r}=1.35 s^{−1}, D=10 μm^{2}s^{−1} and r_{v}=5 nm. The initial number of A and B molecules was 64,000 each and the lattice size was 960^{3}. The simulation volume was distributed to eight processes. As comparison, an ODE solver was used to generate the output of the reversible reaction with the effective rates, k_{f} and k_{r}. The results of the simulation are provided in Fig. 4d. The closely matching curves of pSpatiocyte and the ODE solver verify the parallel simulation accuracy of reversible bimolecular reactions.
Performance of parallelized 3D diffusion
In Spatiocyte simulations, diffusion of molecules typically takes place at step intervals that are several orders of magnitude smaller than that of reactions if they are not highly diffusionlimited. The fine intervals are needed for the very short displacements between voxels. Since diffusion computations at these fine intervals dominate the total computation cost of most simulations, we used a diffusion model to evaluate the parallel simulation performance of pSpatiocyte. The simulation parameters and conditions are the same as in the diffusion model in the previous section, except for the lattice resolution and occupancy.
To estimate the parallel performance of pSpatiocyte, we measured its strong and weak scaling efficiencies. Strong scaling measures how fast a program is able to process a fixed workload by splitting it into smaller sizes and distributing them to an increasing number of CPUs or cores. On the other hand, weak scaling measures how large of a problem a program can handle without the loss of speed. To measure the efficiency of weak scaling, the amount of workload given to each CPU or core is fixed and the total workload processed by the program is increased by adding CPUs or cores with the corresponding amount of workload.
For strong scaling, we used three different voxel radii to measure performance. Given that the physical dimensions of a compartment remain the same, smaller voxels would result in higher resolution and finer lattice. We denote voxels having 10, 5, and 2.5 nm radii as coarse, intermediate, and fine lattices, respectively. The molecule occupancy, ϕ was fixed at 0.3, whereas the voxel sizes determined the spatial resolution of the compartment. The compartment resolution was 512^{3} (coarse), 1024^{3} (intermediate) or 2048^{3} (fine). For computations using the maximum 663552 cores of the K computer, the resolution was set to 512×480×540 (coarse), 1024×960×1080 (intermediate) or 2048×1920×2160 (fine) to ensure that the model conforms to the physical configuration of the processes.
Note that we examined the relative speedups instead of the floating point operations per second (FLOPS) because on lattice, integer or logical instructions dominate the overall computational cost. The speedups measured from the elapsed times are shown in Fig. 5a. Here, we used the results of the coarse lattice with 64 cores as the baseline reference to calculate the speedups since simulations with fewer cores were not possible due to memory size limitation. For the intermediate and fine lattices, the results from 64 cores were extrapolated using the time consumed per voxel on the coarse lattice. The parallel efficiencies of strong scaling are summarized in Fig. 5b. For the fine lattice with 663552 cores, we obtained a speedup of 7686. The corresponding strong scaling efficiency was at 74.1%. In contrast to the coarse and intermediate lattices, the results from the fine lattice were the closest to the ideal curve. By extrapolating the results from Fig. 5a, we can predict that the speedup and efficiency on the fine lattice would be about 13000 times and at 40%, respectively if two million cores were utilized.
To identify the cause of the performance deterioration on the coarse lattice, we determined the major components of elapsed time as shown in Fig. 5c. We found that the times taken for initialization, computation, pack and unpack events were decreasing at least with up to 262144 cores, whereas the MPI time saturated and exceeded these times when it was over 32768 cores. To improve the performance, the constant duplication time due to redundant communicator objects should be eliminated by sophisticated programming. The saturation of MPI time is likely the most significant factor that needs to be addressed to improve the scaling performance further. However, such saturated timings generally originate from the latency of interprocess communication, which is dependent on the hardware and firmware. Therefore, the immediate approach for improving strong scaling is to reduce the computation time and ensure that it is as close to the latency as possible.
Figure 5d displays the weak scaling performance of pSpatiocyte in elapsed time per voxel per step. Here, the labels on the horizontal axis, smallest, smaller, medium, larger and largest, denote the subdomain lattice dimensions, 16^{3}, 32^{3}, 64^{3}, 128^{3} or 256^{3} on each process, respectively. In an ideal weak scaling performance setting, these times would be identical for all lattice sizes since they would be independent of the number of cores and voxel sizes. In spite of the variation in the absolute values, all lattice dimensions provided good scaling properties. We scrutinized the elapsed times of the smallest and smaller lattices and it revealed that the times are dominated by the constant communication latency. This explains the source of the larger absolute times in the smaller lattices. The efficiencies of parallel computation in terms of weak scaling are summarized in Fig. 5e. For each lattice size, the elapsed time with 64 cores was used as a reference to calculate the parallel efficiency. Although the efficiencies tend to deteriorate on higher number of cores, we were still able to achieve 60% efficiency with more than half a million cores.
Performance benchmark
We compared the runtime of pSpatiocyte with Spatiocyte (git 5e88f40), Smoldyn (v2.61) and ReaDDy (v2.0.2py37_55_g78bd07) when executing the benchmark MichaelisMenten enzymatic reaction model [41] on a common workstation as it is more accessible to general users than a supercomputer. ReaDDy has both serial and parallel versions. All simulators were evaluated on a single core. Additionally, the parallel simulators pSpatiocyte and ReaDDy were executed on two, four and eight cores. We used the same model parameters from [41, 71] but increased the size of the reaction volume tenfold to 909 μm^{3}. The larger volume raises the computational cost to an adequate level when running the model in parallel on all eight cores of the workstation. For Smoldyn and ReaDDy, we set the simulation interval, Δt to 1 ms. In pSpatiocyte and Spatiocyte models, the event with the smallest interval is the walk event and it was set to 0.5 ms. We also evaluated pSpatiocyte with a smaller walk event interval of 0.2 ms to see how the runtime scales with the interval. Smaller simulation interval results in an overall higher computational cost and up to a certain limit, better accuracy.
Figure 6a displays the results of the performance benchmark. The runtimes shown are the averages of three independent runs of each simulator to execute the model for 10 s. The resulting concentration profiles from each simulator are plotted in Fig. 6b. On a single core, pSpatiocyte with Δt=0.5 ms is about two times faster than with Δt=0.2 ms. It is also about 2.4 times faster than the sequential version of Spatiocyte. On a single core, the main difference between pSpatiocyte and its sequential counterpart is the new pointerfree voxel accessing scheme adopted by the former. This scheme has likely lowered the memory bandwidth usage and cache misses, contributing to the significant reduction in simulation runtime. The execution time of pSpatiocyte is about 7.7 times shorter than Smoldyn on a single core. It is also roughly 30 and 38fold times faster than the serial and parallel versions of ReaDDy, respectively.
The runtime of pSpatiocyte (Δt=0.5 ms) also scales favorably with the number of additional cores used in the simulation. When the number of cores was increased from one to two, the runtime was reduced by 1.87 times. Similary, the runtime was shorter by about 1.82 times when the number of cores increased from two to four. On eight cores, it is about 1.7 times faster than on four cores. Similar scaling behavior was also observed with Δt=0.2 ms. On eight cores, pSpatiocyte (Δt=0.5 ms) is roughly 55 times faster than the parallel version of ReaDDy. It also required about 45 and 14fold shorter runtimes than Smoldyn and Spatiocyte, respectively to complete the simulation on the workstation. Overall, the benchmark results show that pSpatiocyte has a significant performance advantage over other wellknown microscopic particle simulators.
Parallelized simulation of MAPK model
In the dual phosphorylationdephosphorylation cycle of the MAPK cascade, shown in Fig. 7a, molecular rebinding effects at the microscopic scale can alter the macroscopic dynamics of the system [36]. MAPK kinase (KK) phosphorylates MAPK (K) in a twostep process to generate a doubly phosphorylated MAPK (Kpp), whereas the phosphatase, P dephosphorylates Kpp twice to recover K. Upon unbinding from their products, the enzymes go through an inactive state (denoted by ^{∗}). The time required to reactivate the enzymes is given by τ_{rel}≃1/k_{a}, where
If the enzymesubstrate reactions are diffusionlimited and τ_{rel} is short, a newly dissociated enzyme can rebind to its product to catalyze it again, before escaping into the bulk. Takahashi et al. [36] have previously shown with eGFRD particle simulations that these rebinding events can change the response sensitivity of the phosphorylation state, which could result in the loss of bistability. We have also recently replicated the results with Spatiocyte [43]. These spatiotemporal correlations between enzyme and substrate molecules, and fluctuations at the molecular scale are difficult to be captured by RDME and PDEbased methods.
As an example of pSpatiocyte biological application and to further verify the method, we have simulated the MAPK model with the same parameters from [36] but increased the volume tenfold (10 μm^{3}) to raise the computational cost. The initial molecule numbers of K, KK and P were 1200, 300 and 300, respectively. There were no initial molecules for the remaining species. The model was simulated for 300 s with τ_{rel}=1 μs. For all species, D=4 μm^{2}s^{−1}. The lattice size was 476^{3} with r_{v}=2.5 nm. Figure 7b shows the concentration profiles in the first 100 s of the simulation. The total runtime of the simulation was 5466 s using all eight cores of the workstation. In contrast, it took 76650 s to complete the simulation with Spatiocyte. Thus, pSpatiocyte is about 14 times faster than its sequential counterpart. As comparison, in Fig. 7b we have also plotted the predictions of the corresponding meanfield (ODE) MAPK model [36]. The model describes the diffusion of molecules implicitly by renormalizing the reaction rates. pSpatiocyte concentration profiles coincide well with the ODE outcomes, although in the former, there were fluctuations from the small number of reacting molecules.
The total number of molecules in the MAPK model is 1800, which is a small number for highperformance simulations. To evaluate how well pSpatiocyte scales with such low number of molecules, we compared the runtimes with varying number of cores and diffusion coefficients. Figure 7c shows the runtimes when the model was executed for 10 s. With D=0.06 μm^{2}s^{−1}, pSpatiocyte is about twofold faster on four cores than on a single core. Increasing the number of cores from four to eight did not noticeably improve the runtime further. However, with D=4 μm^{2}s^{−1}, the speedup achieved with four cores is 2.6 times. With the addition of another four cores, the speedup increased to 3.4 times. This is notable because pSpatiocyte is still able to achieve a favorable speedup although each core only executed on average 225 molecules. Compared to the slower diffusion model, the shorter diffusion interval (\(\tau _{d}^{i}\)) when D=4 μm^{2}s^{−1} increases the computational cost more than the communication overhead, resulting in the improved speedups.
We also simulated the MAPK model with different initial ratios of KK_{0}/P_{0} and diffusion coefficients, and obtained the steadystate Kpp/K_{0} curves as shown in Fig. 7d. The outcomes of the corresponding meanfield models of a distributive (with D=4 μm^{2}s^{−1}) and a processive (with D=0.06 μm^{2}s^{−1}) system are also shown. In the distributive scheme, the enzyme needs to detach from the substrate before it can catalyze it the second time. The double encounters between enzyme and substrate molecules can lead to ultrasensitive switchlike response. Conversely, in the processive scheme, a single encounter between them is sufficient to generate the dual modifications of the substrate. At fast diffusion (D=4 μm^{2}s^{−1}), pSpatiocyte response curve agrees well with that of the distributive meanfield model. However, at much smaller diffusion coefficient (D=0.06 μm^{2}s^{−1}), it instead reproduces the graded response curve of the processive model.
How slower diffusion in the pSpatiocyte model weakens the switchlike response curve can be explained as follows. At D=4 μm^{2}s^{−1}, the resulting diffusion interval, \(\tau _{d}^{i}\) and walk event interval, \(\tau _{e}^{i}\) for the enzymes are the same (1 μs) because the walk probability, α_{i}=1. Since \(\tau _{d}^{i}\) and τ_{rel} have the same intervals, after catalyzing the substrates the first time, the enzymes KK^{∗} and P^{∗} can escape into the bulk before they can reactivate and rebind with the substrates. In constrast, at D=0.06 μm^{2}s^{−1}, it gives \(\tau _{d}^{i}=72\,\mu \mathrm {s}\) and \(\tau _{e}^{i}=2\,\mu \mathrm {s}\) for the enzymes because α_{i}=0.028. Since \(\tau _{d}^{i} \gg \tau _{rel}\), the enzymes have enough time to rebind with their substrates upon reactivation, before they can escape. This processivelike mechanism leads to the graded response curve as shown in Fig. 7d. Overall, we note that pSpatiocyte correctly reproduces the expected ultrasensitivity dynamics of MAPK [36].
Conclusion
We have developed a high spatiotemporal resolution parallel stochastic method to simulate intracellular reactiondiffusion systems on HCP lattice. To realize largescale parallel computations, we have introduced several advanced simulation schemes, including a twisted Cartesian coordinate system with pointerfree voxel access, a parallelized event scheduler with priority queue, synchronized random subvolume executions and parallelized Gillespie’s directmethod using globally seeded random number generators, and threestage interprocess data transfers.
We have also validated the physical correctness of the simulator. The simulated diffusion rates in dilute conditions showed very good agreement with theory. In crowded conditions, the diffusion rates decreased, as expected. Further work is required to compare the crowded diffusion behavior on lattice with the relation between diffusion and excluded volume fraction obtained in continuous space [68]. Both irreversible and reversible reaction curves coincided very well with predicted ODE results. Parallel performance of diffusion on the K computer was sufficiently high for largescale computations. From the viewpoint of strong scaling, pSpatiocyte achieved a 7686fold speedup with 663552 cores compared to the runtime with 64 cores on a 2048^{3} lattice. The efficiency was equivalent to 74.1%. In terms of weak scaling, efficiencies of at least 60% were obtained.
In the MichaelisMenten enzymatic reaction benchmark, pSpatiocyte performed significantly better than other wellknown microscopic particle simulators. On a workstation with eight CPU cores, pSpatiocyte is about 55 times faster than the parallel version of ReaDDy and 45 times faster than Smoldyn. In addition, the parallelized simulation of the MAPK model revealed that the program can correctly capture the weakening of ultrasensitive response by enzymesubstrate rebindings at very short timescales. The accurate simulation of the model also demonstrated that pSpatiocyte is applicable in real biological problems. On the same workstation with eight cores, pSpatiocyte required 14fold faster execution times than the sequential version of Spatiocyte to simulate the MAPK model. Notably, pSpatiocyte is able to achieve 3.4 times speedup with all cores on the workstation although the average number of molecules executed per core is only 225.
In recent papers by Smith and Grima [72] and Lawson et al. [73], RDME reactions with nonmassaction propensities such as Hilltype and MichaelisMenten were shown not converging to the chemical master equation. At present, both sequential and parallel versions of Spatiocyte only support elementary reactions, namely unimolecular and bimolecular reactions. Since all complex reactions, including Hilltype and MichaelisMenten, can be broken down to elementary (massaction) reactions, the currently supported reactions should be sufficient for most modeling purposes. Nonetheless, to conveniently model complex reactions, it would be beneficial to support them in the future. Further work would also be needed to solve the convergence problem.
Recently, several GPUbased highperformance simulators of reactiondiffusion systems have been reported [46, 48, 50–52, 74]. The ability to run fast simulations on common workstations equipped with GPUs supports wider application of the simulators. Implementing the Spatiocyte method to run on GPUs should be straightforward since most of the parallelization schemes presented in this work can also be applied on a GPU.
Availability of data and materials
The pSpatiocyte software and the complete scripts to generate the data in this study are available at https://github.com/satyaarjunan/pspatiocyte.
Abbreviations
 1D:

onedimensional
 2D:

twodimensional
 3D:

threedimensional
 CPU:

Central processing unit
 FLOPS:

Floating point operations per second
 GPU:

Graphics processing unit
 HCP:

Hexagonal closepacked
 MAPK:

Mitogenactivated protein kinase
 MD:

Molecular dynamics
 MPI:

Message passing interface
 MSD:

meansquared displacement
 PDE:

Partial differential equations
 RDME:

Reactiondiffusion master equation
 SCK:

SmoluchowskiCollinsKimball
References
 1
Li X, Holmes WR. Biophysical attributes that affect CaMKII activation deduced with a novel spatial stochastic simulation approach. PLoS Comput Biol. 2018; 14(2):e1005946.
 2
Denk J, Kretschmer S, Halatek J, Hartl C, Schwille P, Frey E. MinE conformational switching confers robustness on selforganized Min protein patterns. Proc Natl Acad Sci. 2018; 115(18):201719801.
 3
Trogdon M, Drawert B, Gomez C, Banavar SP, Yi TM, Campàs O, Petzold LR. The effect of cell geometry on polarization in budding yeast. PLoS Comput Biol. 2018; 14(6):e1006241.
 4
Du H, Wang Y, Haensel D, Lee B, Dai X, Nie Q. Multiscale modeling of layer formation in epidermis. PLoS Comput Biol. 2018; 14(2):1–25.
 5
Miao Y, Bhattacharya S, Edwards M, Cai H, Inoue T, Iglesias PA, Devreotes PN. Altering the threshold of an excitable signal transduction network changes cell migratory modes. Nat Cell Biol. 2017; 19(4):329–40.
 6
Tan RZ, Chiam KH. A computational model for how cells choose temporal or spatial sensing during chemotaxis. PLoS Comput Biol. 2018; 14(3):1–21.
 7
Berro J. Essentially, all models are wrong, but some are useful  a crossdisciplinary agenda for building useful models in cell biology and biophysics. Biophys Rev. 2018; 10(6):1637–47. https://doi.org/10.1007/s1255101804784.
 8
Takahashi K, Arjunan SNV, Tomita M. Space in systems biology of signaling pathways–towards intracellular molecular crowding in silico. FEBS Lett. 2005; 579(8):1783–8.
 9
Burrage K, Burrage PM, Marquezlago TM, Nicolau DV. Stochastic Simulation for Spatial Modelling of Dynamic Processes in a Living Cell In: Koeppl H, Setti G, di Bernardo M, Densmore D, editors. Design and Analysis of Biomolecular Circuits: Engineering Approaches to Systems and Synthetic Biology chapter 2. New York: Springer: 2011. p. 43–62.
 10
Klann M, Koeppl H. Spatial simulations in systems biology: from molecules to cells. Int J Mol Sci. 2012; 13(6):7798–827.
 11
Schöneberg J, Ullrich A, Noé F. Simulation tools for particlebased reactiondiffusion dynamics in continuous space. BMC Biophys. 2014; 7(1):11.
 12
Earnest T, Cole JA, LutheySchulten ZL. Simulating biological processes: Stochastic physics from whole cells to colonies. Rep Prog Phys. 2018; 81(5):052601.
 13
Smith S, Grima R. Spatial Stochastic Intracellular Kinetics: A Review of Modelling Approaches. Bull Math Biol. 2019; 81(8):2960–3009.
 14
Bottaro S, LindorffLarsen K. Biophysical experiments and biomolecular simulations: A perfect match?. Science. 2018; 361(6400):355–60.
 15
Shaw DE, Dror OR, Salmon JK, Grossman JP, Mackenzie KM, Bank JA, Young C, Deneroff MM, Batson B, Bowers KJ, Chow E, Eastwood MP, Ierardi DP, Klepeis JL, Kuskin JS, Larson RH, LLarsen K, Maragakis P, Moraes MA, Piana S, Shan Y, Towles B. Millisecondscale molecular dynamics simulations on Anton. In: International Conference for High Performance Computing, Networking, Storage and Analysis. Portland: ACM/IEEE: 2009.
 16
Kohlhoff KJ, Shukla D, Lawrenz M, Bowman GR, Konerding DE, Belov D, Altman RB, Pande VS. Cloudbased simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways. Nat Chem. 2014; 6(1):15–21.
 17
Simon J, Arthur C, Ley SC. Mitogenactivated protein kinases in innate immunity. Nat Rev Immunol. 2013; 13(9):679–92.
 18
Burotto M, Chiou VL, Lee JM, Kohn EC. The MAPK pathway across different malignancies: A new perspective. Cancer. 2014; 120(22):3446–56.
 19
Blinov ML, Schaff JC, Vasilescu D, Moraru II, Bloom JE, Loew LM. Compartmental and Spatial RuleBased Modeling with Virtual Cell. Biophys J. 2017; 113(7):1365–72.
 20
Mahmutovic A, Fange D, Berg OG, Elf J. Lost in presumption: stochastic reactions in spatial models. Nat Methods. 2012; 9(12):1163–6.
 21
Baras F, Mansour MM. Reactiondiffusion master equation: A comparison with microscopic simulations. Phys Rev E. 1996; 54(6):6139.
 22
Hattne J, Fange D, Elf J. Stochastic reactiondiffusion simulation with MesoRD. Bioinformatics. 2005; 21(12):2923–24.
 23
Isaacson SA. The ReactionDiffusion Master Equation as an Asymptotic Approximation of Diffusion to a Small Target. SIAM J Appl Math. 2009; 70(1):77–111.
 24
Drawert B, Engblom S, Hellander A. URDME: a modular framework for stochastic simulation of reactiontransport processes in complex geometries. BMC Syst Biol. 2012; 6(1):76.
 25
Hepburn I, Chen W, Wils S, De Schutter E. STEPS: efficient simulation of stochastic reaction–diffusion models in realistic morphologies. BMC Syst Biol. 2012; 6(1):36.
 26
Mourão MA, Hakim JB, Schnell S. Connecting the dots: The effects of macromolecular crowding on cell physiology. Biophys J. 2014; 107(12):2761–6.
 27
Ellis RJ. Macromolecular crowding: obvious but underappreciated. Trends Biochem Sci. 2001; 26(10):597–604.
 28
Zimmerman SB, Minton AP. Macromolecular Crowding: Biochemical, Biophysical, and Physiological Consequences. Annu Rev Biophys Biomol Struct. 1993; 22(1):27–65.
 29
Zhou HX, Rivas G, Minton AP. Macromolecular Crowding and Confinement: Biochemical, Biophysical, and Potential Physiological Consequences. Annu Rev Biophys. 2008; 37(1):375–97.
 30
Rivas G, Minton AP. Macromolecular Crowding In Vitro, In Vivo, and In Between. Trends Biochem Sci. 2016; 41(11):970–81.
 31
Tan C, Saurabh S, Bruchez MP, Schwartz R, LeDuc P. Molecular crowding shapes gene expression in synthetic cellular nanosystems. Nat Nanotechnol. 2013; 8(8):602–8.
 32
Smith S, Cianci C, Grima R. Macromolecular crowding directs the motion of small molecules inside cells. J R Soc Interface. 2017; 14(131):20170047.
 33
Grima R. Intrinsic biochemical noise in crowded intracellular conditions. J Chem Phys. 2010; 132(18):185102.
 34
Cianci C, Smith S, Grima R. Molecular finitesize effects in stochastic models of equilibrium chemical systems. J Chem Phys. 2016; 144(8):084101.
 35
Lagerholm BC, Thompson NL. Theory for ligand rebinding at cell membrane surfaces. Biophys J. 1998; 74(3):1215–28.
 36
Takahashi K, TNicola S, ten Wolde PR. Spatiotemporal correlations can drastically change the response of a MAPK pathway. Proc Natl Acad Sci USA. 2010; 106(6):2473–8.
 37
Mugler A, Bailey AG, Takahashi K, ten Wolde PR. Membrane clustering and the role of rebinding in biochemical signaling. Biophys J. 2012; 102(5):1069–78.
 38
Andrews SS. Smoldyn: Particlebased simulation with rulebased modeling, improved molecular interaction and a library interface. Bioinformatics. 2017; 33(5):710–17.
 39
Michalski PJ, Loew LM. SpringSaLaD: A spatial, particlebased biochemical simulation platform with excluded volume. Biophys J. 2016; 110(3):523–29.
 40
Schöneberg J, Noé F. ReaDDya software for particlebased reactiondiffusion dynamics in crowded cellular environments. PLoS One. 2013; 8(9):e74261.
 41
Andrews SS. ParticleBased Stochastic Simulators In: Jaeger D, Jung R, editors. Encyclopedia of Computational Neuroscience. New York: Springer: 2018. p. 1–5.
 42
Arjunan SNV, Tomita M. A new multicompartmental reactiondiffusion modeling method links transient membrane attachment of E. coli MinE to Ering formation. Syst Synth Biol. 2010; 4(1):35–53.
 43
Chew WX, Kaizu K, Watabe M, Muniandy SV, Takahashi K, Arjunan SNV. Reactiondiffusion kinetics on lattice at the microscopic scale. Phys Rev E. 2018; 98:032418.
 44
Gibson MA, Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A. 2000; 104(9):1876–89.
 45
Chew WX, Kaizu K, Watabe M, Muniandy SV, Takahashi K, Arjunan SNV. Surface reactiondiffusion kinetics on lattice at the microscopic scale. Phys Rev E. 2019; 99:042411.
 46
Vigelius M, Lane A, Meyer B. Accelerating reactiondiffusion simulations with generalpurpose graphics processing units. Bioinformatics. 2011; 27(2):288–90.
 47
Roberts E, Stone JE, LutheySchulten Z. Lattice microbes: Highperformance stochastic simulation method for the reactiondiffusion master equation. J Comput Chem. 2013; 34(3):245–55.
 48
Hallock MJ, Stone JE, Roberts E, Fry C, LutheySchulten Z. Simulation of reaction diffusion processes over biologically relevant size and time scales using multiGPU workstations. Parallel Comput. 2014; 40(56):86–99.
 49
Chen W, De Schutter E. Parallel STEPS: Large Scale Stochastic Spatial ReactionDiffusion Simulation with High Performance Computers. Front Neuroinformatics. 2017; 11(February):1–15.
 50
Biedermann J, Ullrich A, Schöneberg J, Noe F. ReaDDyMM: Fast interacting particle reactiondiffusion simulations using graphical processing units. Biophys J. 2015; 108(3):457–61.
 51
Gladkov DV, Alberts S, D’Souza RM, Andrews S. Accelerating the Smoldyn Spatial Stochastic Biochemical Reaction Network Simulator Using GPUs. In: Proceedings of the 19th High Performance Computing Symposia. San Diego: Society for Computer Simulation International: 2011. p. 151–8.
 52
Dematte L. Smoldyn on graphics processing units: Massively parallel brownian dynamics simulations. IEEE/ACM Trans Comput Biol Bioinformatics. 2012; 9(3):655–67.
 53
Hoffmann M, Fröhner C, Noé F. Readdy 2: Fast and flexible software framework for interactingparticle reaction dynamics. PLOS Comput Biol. 2019; 15(2):1–26.
 54
Smoluchowski MV. Mathematical theory of the kinetics of the coagulation of colloidal solutions. Z Phys Chem. 1917; 92:129–168.
 55
Collins FC, Kimball GE. Diffusioncontrolled reaction rates. J Colloid Sci. 1949; 4(4):425–437.
 56
Szpiro GG. Kepler’s Conjecture: How Some of the Greatest Minds in History Helped Solve One of the Oldest Math Problems in the World. New York: Wiley; 2003.
 57
Grima R, Schnell S. A systematic investigation of the rate laws valid in intracellular environments. Biophys Chem. 2006; 124(1):1–10.
 58
Arjunan SNV, Miyauchi A, Takahashi K. A highperformance microscopic lattice reactiondiffusion method for biochemical network simulation. In: The Second Biosupercomputing Symposium. Tokyo: RIKEN: 2010.
 59
Szkoda S, Koza Z, Tykierko M. Accelerating cellular automata simulations using AVX and CUDA. arXiv:1208.2428. 2012.
 60
Takahashi K, Kaizu K, Hu B, Tomita M. A multialgorithm, multitimescale method for cell simulation. Bioinformatics. 2004; 20(4):538–46.
 61
Einstein A. Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Ann Phys. 1905; 322(8):549–60.
 62
von Smoluchowski M. Zur kinetischen Theorie der Brownschen Molekularbewegung und der Suspensionen. Ann Phys. 1906; 326(14):756–80.
 63
Matsumoto M, Nishimura T. Mersenne twister: A 632dimensionally equidistributed uniform pseudorandom number generator. ACM Trans Model Comp Sim. 1998; 8:3–30.
 64
Golub GH, Ortega JM. Scientific Computing: An Introduction with Parallel Computing. 1993.
 65
Gillespie DT. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. 1977; 22:403–34.
 66
Yonezawa A, Watanabe T, Yokokawa M, Sato M, Hirao K. Advanced institute for computational science (AICS): Japanese national highperformance computing research institute and its 10petaflops supercomputer K. Seattle: ACM/IEEE; 2011.
 67
Minton A. The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J Biol Chem. 2001; 276(14):10577–80.
 68
Novak IL, Kraikivski P, Slepchenko BM. Diffusion in Cytoplasm: Effects of Excluded Volume Due to Internal Membranes and Cytoskeletal Structures. Biophys J. 2009; 97(3):758–67.
 69
Phillip Y, Sherman E, Haran G, Schreiber G. Common Crowding Agents Have Only a Small Effect on ProteinProtein Interactions. Biophys J. 2009; 97(3):875–85.
 70
Zimmerman SB, Trach SO. Estimation of macromolecule concentrations and excluded volume effects for the cytoplasm of Escherichia coli. J Mol Biol. 1991; 222(3):599–620.
 71
Andrews SS, Addy NJ, Brent R, Arkin AP. Detailed Simulations of Cell Biology with Smoldyn 2.1. PLoS Comput Biol. 2010; 6(3):e1000705.
 72
Smith S, Grima R. Breakdown of the reactiondiffusion master equation with nonelementary rates. Phys Rev E. 2016; 93(5):052135.
 73
Lawson MJ, Petzold L, Hellander A. Accuracy of the Michaelis–Menten approximation when analysing effects of molecular noise. J R Soc Interface. 2015; 12(106):20150054.
 74
Holmen JK, Foster DL. Accelerating single iteration performance of CUDAbased 3D reactiondiffusion simulations. Int J Parallel Prog. 2014; 42(2):343–63.
Acknowledgements
We thank Steven Andrews and Frank Noé for their help with the benchmark models of Smoldyn and ReaDDy, respectively. We are grateful to the four anonymous reviewers for their constructive comments. We thank WeiXiang Chew for helpful discussions. We also thank Peter Karagiannis and Kylius Wilkins for reading the initial version of the manuscript and providing valuable suggestions. AM wishes to thank Yukihiro Eguchi for continuous encouragement and Hisashi Nakamura of RIST for giving him the opportunity to start this study.
Funding
This research is part of the HPCI Strategic Program for Innovative Research in Supercomputational Life Science (SPIRE Field 1), which is funded by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. Part of the results were obtained by using the K computer at RIKEN Advanced Institute for Computational Science (Proposal number hp120309). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Affiliations
Contributions
AM, KT and SNVA conceived the project. AM and SNVA conceived the parallelization schemes and simulation algorithm. AM and KI wrote the initial version of the simulator. SNVA wrote, tested and optimized the simulation algorithm and software, and created the build system. AM and SNVA generated data and wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Arjunan, S.N., Miyauchi, A., Iwamoto, K. et al. pSpatiocyte: a highperformance simulator for intracellular reactiondiffusion systems. BMC Bioinformatics 21, 33 (2020). https://doi.org/10.1186/s1285901933388
Received:
Accepted:
Published:
Keywords
 Cell simulation
 Monte Carlo method
 Particle reactiondiffusion
 Hexagonal closepacked lattice
 Mitogenactivated protein kinase
 Message passing interface
 parallelized Gillespie’s directmethod