A two-way interface between limited Systems Biology Markup Language and R

Background Systems Biology Markup Language (SBML) is gaining broad usage as a standard for representing dynamical systems as data structures. The open source statistical programming environment R is widely used by biostatisticians involved in microarray analyses. An interface between SBML and R does not exist, though one might be useful to R users interested in SBML, and SBML users interested in R. Results A model structure that parallels SBML to a limited degree is defined in R. An interface between this structure and SBML is provided through two function definitions: write.SBML() which maps this R model structure to SBML level 2, and read.SBML() which maps a limited range of SBML level 2 files back to R. A published model of purine metabolism is provided in this SBML-like format and used to test the interface. The model reproduces published time course responses before and after its mapping through SBML. Conclusions List infrastructure preexisting in R makes it well-suited for manipulating SBML models. Further developments of this SBML-R interface seem to be warranted.


Background
Systems biology markup language (SBML) is a standard for representing dynamical systems of biological interest [1,2]. Interfaces between SBML and high level computational environments are currently being developed for Mathematica [3] and Matlab [4], but to the author's knowledge, no such efforts are being carried forth for R/Splus. This brief paper presents the author's initial developments toward a two-way SBML-R interface. The interface is currently limited in the range of SBML input files that it can handle. For example, it only handles SBML level 2 and does not handle "Events" and "FunctionDefinitions." The interface can nevertheless be used for some models, examples [5,6] of which are included under "demo" in the SBMLR package [7]. This paper provides an explicit exam-ple of one approach to an SBML-R interface. It is assumed throughout that the reader is already quite familiar with both SBML [8] and R [9].

Implementation
The software exists completely in R. It is comprised of four functions and is currently being distributed as a developmental package called "SBMLR" through Bioconductor [10]. The software was written subject to two constraints: 1) models expressed in SBML-like R must be exchangeable with a range of SBML models; and 2) models must be amenable to simulation in R. The first subsection that follows defines an SBML-like R model structure, the second illustrates how it can be used in simulations, and the third describes its conversions into and out of SBML.

SBML-like Model Execution in R
Model definition codes such as that given in Figure 1, when placed in a separate file (e.g. Curto.r), can be sourced into a parent script to become globally available for simulations. For example, the purine metabolism model of Curto et al. [6] can be simulated using the execution code shown in Figure 2.
This code simulates the response to a 10-fold increase in phosphoribosylpyrophosphate (PRPP) at time t = 0 and plots the responses of inosine monophosphate (IMP) and hypoxanthine (HX) as shown in Figure 3. Two functions called by this script are defined in the SBMLR package and shown in Figure 4. They are, getIncidenceMatrix(), which computes the incidence/stoichiometry matrix used by the second function, fderiv(), which computes state derivatives for integration by the function lsoda() of the "odesolve" package. In getIncidenceMatrix(), the incidence matrix is generated automatically using an i loop over the rows (i.e. state variables) and a j loop over the columns (i.e. reactions). If a state is a product of a reaction, the corresponding matrix element becomes a positive integer equal to its stoichiometry [factor() converts string names to factors so that summary() can count them], and similarly for reactants, though with negative numbers entering the matrix in this case (or possibly zero, if a reactant of a reaction happens to also be a product of the same reaction).

A Two-Way Interface between SBML and R
Two functions comprise the SBML-R interface: write.SBML() converts SBML-like R models (e.g. Curto.r) into SBML models (e.g. Curto.xml), and read.SBML() converts SBML models (e.g. Curto.xml) into an SBML-like R model (e.g. CurtoX.r). A key component of these two interface functions is a locally defined recursive function named recurs(). This function converts arbitrary R expressions into arbitrary MathML expressions, and vice-versa; it is defined differently, locally, in each of the two functions. In write.SBML(), shown in Figure 5, recurs() initially takes as its input argument the last component of the body of the kinetic rate law function definition, which is the entire rate law expression (as mentioned above, rate laws involving multiple R statements are not supported). In R, expressions are LISP like in that they contain a first element, the operator, and the remaining elements, the argu-ments, any of which can be an expression. If the operator is the parentheses operator, the action taken is that of a unary identity operator, and we simply skip it and move on to its argument since parentheses are not needed in MathML. Each nested call to the function recurs() sends "<apply>" and the converted operator to the output file on its way in, and a matching "</apply>" on its way out. Nested calling continues until all nodes of the expression tree are of class "name" or "numeric," i.e. when all found objects are leaves of the tree rather than "expressions" that require further parsing. Leaves are then sent to the output file bracketed by <ci> and </ci>.
The second of the two SBML-R interface functions, read.SBML(), maps a limited range of SBML level 2 files (function definitions and events are not handled) into SBML-like R model files. Portions of read.SBML() are given in Figure 6. The main difference between this function, read.SBML(), and the previous function, write.SBML(), is that here, rather than using parse() to decompose the list-of-lists structure of the model defined in R, the SBML model is instead decomposed as an XML object using xmlTreeParse() of the XML package available to R [11]. In read.SBML(), the locally defined recursive function recurs() uses an overkill of parentheses to avoid operator precedence issues. This recursive function is passed a MathML reaction rate law which it parses recursively until the leaves of the tree (the "ci") are all found. During the recursion a corresponding R expression is built as a vector of character strings which, upon exit from the last of the recursive calls, is collapsed into a single string and sent to the output file as the last line of the current rate law function definition.

Results
The function write.SBML() was applied to Curto.r to generate Curto.xml and the function read.SBML() was then applied to Curto.xml to generate CurtoX.r. Execution of the script given in Figure 2 with line 4 of the execution code changed to act on CurtoX.r instead of Curto.r generated the same plots as before (Figure 3). This shows that the R model was successfully converted into an SBML file that can be reconverted back into a properly functioning R model. The intermediate file Curto.xml was successfully validated as an SBML level 2 file [12]. The SBML file could thus be imported into visualization packages such as JDesigner [13].

Discussion
If the model of Curto et al. [4] were implemented in R without any knowledge of SBML, a form that it might take is that given in Appendix B (Figure 7). Compared to its SBML-like counterparts, this code is more compact and easier to understand, e.g. the system's network connectivity is clearly visible. The disadvantage of such code is that it is not readily converted into SBML. Since the benefits of SBML are compelling, this disadvantage alone warrants the use of SBML-like model structures.
As SBML evolves to handle a broader range of dynamical systems, it will become more and more challenging for simulation packages to handle all possible SBML models. It is envisioned here that the development of this SBML-R interface will be driven by its users, and not by the model representation capabilities of SBML, i.e. it is expected that the users of this interface will be programmers who are capable of modifying it as their needs require.

Conclusions
Compared to Matlab, which may be better equipped than R to simulate arbitrarily complex dynamical systems, R has the advantage of list handling infrastructure in parse() and xmlTreeParse(), and it also has the advantage of indexing by names instead of numbers. A further advantage, though not exploited here, is that R is object-oriented; in future versions of this interface, a print() method might be defined for objects of class SBMLR (i.e. models) to generate more readable renderings of models in R. Another advantage of R over Matlab is that it provides access to a much broader collection of microarray analysis tools, e.g. see Bioconductor [10]. This aspect is important for those individuals who are interested in biochemical systems analyses of microarray data [14,15]. For statisticians already familiar with R, there are also the obvious economies of maintaining system familiarity. Finally, perhaps the biggest advantage of R over Matlab is that it is freely available. On balance, there seems to be ample motivation for further developments of this interface between SBML and R.  write.SBML<-function(filename) { # takes R model in filename.r and maps it to an SBML model in filename.xml r2ml<-function(type) # maps R operator symbols into MathML switch(type, "*" = "<times/>", "/" = "<divide/>", "+" = "<plus/>", "-" = "<minus/>", "^" = "<power/>", "exp" = "<exp/>", "log" = "<ln/>", "not found") # end of r2ml sub-function definition.

Authors' contributions
TR is the sole contributor.

Appendix A
The SBMLR package is available through Bioconductor as a developmental package [7]. It has been developed and tested only under Windows XP. To install, do NOT unzip the file SBMLR.zip after downloading to a local directory, rather, within the R GUI, click packages and install from local zip. The XML package installs similarly [11]. Note that an error message from library(XML) can be resolved by copying the *.dll files of the XML package libs directory into the "C:\windows" directory. The ODESOLVE package must be installed before running simulations. This package is installed from the R GUI by clicking packages and install from CRAN. R code for the function read.SBML() Figure 6 R code for the function read.SBML().