Software | Open | Published:
A two-way interface between limited Systems Biology Markup Language and R
BMC Bioinformaticsvolume 5, Article number: 190 (2004)
Systems Biology Markup Language (SBML) is gaining broad usage as a standard for representing dynamical systems as data structures. The open source statistical programming environment R is widely used by biostatisticians involved in microarray analyses. An interface between SBML and R does not exist, though one might be useful to R users interested in SBML, and SBML users interested in R.
A model structure that parallels SBML to a limited degree is defined in R. An interface between this structure and SBML is provided through two function definitions: write.SBML() which maps this R model structure to SBML level 2, and read.SBML() which maps a limited range of SBML level 2 files back to R. A published model of purine metabolism is provided in this SBML-like format and used to test the interface. The model reproduces published time course responses before and after its mapping through SBML.
List infrastructure preexisting in R makes it well-suited for manipulating SBML models. Further developments of this SBML-R interface seem to be warranted.
Systems biology markup language (SBML) is a standard for representing dynamical systems of biological interest [1, 2]. Interfaces between SBML and high level computational environments are currently being developed for Mathematica  and Matlab , but to the author's knowledge, no such efforts are being carried forth for R/S-plus. This brief paper presents the author's initial developments toward a two-way SBML-R interface. The interface is currently limited in the range of SBML input files that it can handle. For example, it only handles SBML level 2 and does not handle "Events" and "FunctionDefinitions." The interface can nevertheless be used for some models, examples [5, 6] of which are included under "demo" in the SBMLR package . This paper provides an explicit example of one approach to an SBML-R interface. It is assumed throughout that the reader is already quite familiar with both SBML  and R .
The software exists completely in R. It is comprised of four functions and is currently being distributed as a developmental package called "SBMLR" through Bioconductor . The software was written subject to two constraints: 1) models expressed in SBML-like R must be exchangeable with a range of SBML models; and 2) models must be amenable to simulation in R. The first subsection that follows defines an SBML-like R model structure, the second illustrates how it can be used in simulations, and the third describes its conversions into and out of SBML.
An SBML-Like Model Structure in R
To facilitate mappings between SBML and R, an SBML-like list structure is defined in this subsection using the purine metabolism model of Curto et al.  as a specific example (Figure 1). In this figure and elsewhere, ellipses (...) indicate missing code not critical to current discussions; complete source codes are available through the SBMLR package . The essential components of an SBML model, namely, its compartments, species and reactions, are all present in this R analog of an SBML model. In the model of Curto et al. , there is one compartment, be it the cell or the entire human body, and 18 species: 2 boundary conditions (bc = True) and 16 state variables (bc = False), each with an initial condition (ic) or value. Each reaction is a list that includes a reaction id, the names of species that are reactants (reacts), the names of species that are reaction rate modulators (mods), the names of species that are produced by the reaction (prods), parameter values (params), and the reaction rate law (law) function definition. In this framework, only state variables need be listed as products, boundary condition reactants can equivalently be listed as modulators, and missing terms (e.g. mods in reactions 1 and 37) are equivalent to a NULL assignment. The rate law function has as its input arguments two vectors, one carrying the concentrations of reactants and modulators (r), the other carrying reaction parameter values (p). If the body of the rate law function contains n statements, the first n-1 trivially convert input vector components into variables with the same names. The n th statement then contains the complete reaction rate law. It can occupy multiple lines, but it must be a single statement, i.e. it cannot depend on substitution variables temporarily defined in preceding statements.
SBML-like Model Execution in R
Model definition codes such as that given in Figure 1, when placed in a separate file (e.g. Curto.r), can be sourced into a parent script to become globally available for simulations. For example, the purine metabolism model of Curto et al.  can be simulated using the execution code shown in Figure 2.
This code simulates the response to a 10-fold increase in phosphoribosylpyrophosphate (PRPP) at time t = 0 and plots the responses of inosine monophosphate (IMP) and hypoxanthine (HX) as shown in Figure 3. Two functions called by this script are defined in the SBMLR package and shown in Figure 4. They are, getIncidenceMatrix(), which computes the incidence/stoichiometry matrix used by the second function, fderiv(), which computes state derivatives for integration by the function lsoda() of the "odesolve" package. In getIncidenceMatrix(), the incidence matrix is generated automatically using an i loop over the rows (i.e. state variables) and a j loop over the columns (i.e. reactions). If a state is a product of a reaction, the corresponding matrix element becomes a positive integer equal to its stoichiometry [factor() converts string names to factors so that summary() can count them], and similarly for reactants, though with negative numbers entering the matrix in this case (or possibly zero, if a reactant of a reaction happens to also be a product of the same reaction).
The function fderiv() creates the current species vector by overriding initial states with current states clipped to positive values, and by overriding any time varying boundary conditions defined by rules (SBML rules are not needed for the purine model, but are needed to implement other models ). The function fderiv() then computes the reaction rate flux vector (v) based on the current species vector (St) and multiplies it by the incidence matrix to produce the current state derivative vector (xp). The names of xp and v are reset at the end of each function call to override the problem of variables gaining new composite names from the names of their expression arguments.
A Two-Way Interface between SBML and R
Two functions comprise the SBML-R interface: write.SBML() converts SBML-like R models (e.g. Curto.r) into SBML models (e.g. Curto.xml), and read.SBML() converts SBML models (e.g. Curto.xml) into an SBML-like R model (e.g. CurtoX.r). A key component of these two interface functions is a locally defined recursive function named recurs(). This function converts arbitrary R expressions into arbitrary MathML expressions, and vice-versa; it is defined differently, locally, in each of the two functions. In write.SBML(), shown in Figure 5, recurs() initially takes as its input argument the last component of the body of the kinetic rate law function definition, which is the entire rate law expression (as mentioned above, rate laws involving multiple R statements are not supported). In R, expressions are LISP like in that they contain a first element, the operator, and the remaining elements, the arguments, any of which can be an expression. If the operator is the parentheses operator, the action taken is that of a unary identity operator, and we simply skip it and move on to its argument since parentheses are not needed in MathML. Each nested call to the function recurs() sends "<apply>" and the converted operator to the output file on its way in, and a matching "</apply>" on its way out. Nested calling continues until all nodes of the expression tree are of class "name" or "numeric," i.e. when all found objects are leaves of the tree rather than "expressions" that require further parsing. Leaves are then sent to the output file bracketed by <ci> and </ci>.
The second of the two SBML-R interface functions, read.SBML(), maps a limited range of SBML level 2 files (function definitions and events are not handled) into SBML-like R model files. Portions of read.SBML() are given in Figure 6. The main difference between this function, read.SBML(), and the previous function, write.SBML(), is that here, rather than using parse() to decompose the list-of-lists structure of the model defined in R, the SBML model is instead decomposed as an XML object using xmlTreeParse() of the XML package available to R . In read.SBML(), the locally defined recursive function recurs() uses an overkill of parentheses to avoid operator precedence issues. This recursive function is passed a MathML reaction rate law which it parses recursively until the leaves of the tree (the "ci") are all found. During the recursion a corresponding R expression is built as a vector of character strings which, upon exit from the last of the recursive calls, is collapsed into a single string and sent to the output file as the last line of the current rate law function definition.
The function write.SBML() was applied to Curto.r to generate Curto.xml and the function read.SBML() was then applied to Curto.xml to generate CurtoX.r. Execution of the script given in Figure 2 with line 4 of the execution code changed to act on CurtoX.r instead of Curto.r generated the same plots as before (Figure 3). This shows that the R model was successfully converted into an SBML file that can be reconverted back into a properly functioning R model. The intermediate file Curto.xml was successfully validated as an SBML level 2 file . The SBML file could thus be imported into visualization packages such as JDesigner .
If the model of Curto et al.  were implemented in R without any knowledge of SBML, a form that it might take is that given in Appendix B (Figure 7). Compared to its SBML-like counterparts, this code is more compact and easier to understand, e.g. the system's network connectivity is clearly visible. The disadvantage of such code is that it is not readily converted into SBML. Since the benefits of SBML are compelling, this disadvantage alone warrants the use of SBML-like model structures.
As SBML evolves to handle a broader range of dynamical systems, it will become more and more challenging for simulation packages to handle all possible SBML models. It is envisioned here that the development of this SBML-R interface will be driven by its users, and not by the model representation capabilities of SBML, i.e. it is expected that the users of this interface will be programmers who are capable of modifying it as their needs require.
Compared to Matlab, which may be better equipped than R to simulate arbitrarily complex dynamical systems, R has the advantage of list handling infrastructure in parse() and xmlTreeParse(), and it also has the advantage of indexing by names instead of numbers. A further advantage, though not exploited here, is that R is object-oriented; in future versions of this interface, a print() method might be defined for objects of class SBMLR (i.e. models) to generate more readable renderings of models in R. Another advantage of R over Matlab is that it provides access to a much broader collection of microarray analysis tools, e.g. see Bioconductor . This aspect is important for those individuals who are interested in biochemical systems analyses of microarray data [14, 15]. For statisticians already familiar with R, there are also the obvious economies of maintaining system familiarity. Finally, perhaps the biggest advantage of R over Matlab is that it is freely available. On balance, there seems to be ample motivation for further developments of this interface between SBML and R.
Availability and requirements
Project name: SBMLR
Project home page: http://www.bioconductor.org/repository/devel/package/html/SBMLR.html
Operating system(s): Windows XP
Programming language: R 2.0
Other requirements: R packages: XML and ODESOLVE
License: GNU GPL
Any restrictions to use by non-academics: no restrictions
The SBMLR package is available through Bioconductor as a developmental package . It has been developed and tested only under Windows XP. To install, do NOT unzip the file SBMLR.zip after downloading to a local directory, rather, within the R GUI, click packages and install from local zip. The XML package installs similarly . Note that an error message from library(XML) can be resolved by copying the *.dll files of the XML package libs directory into the "C:\windows" directory. The ODESOLVE package must be installed before running simulations. This package is installed from the R GUI by clicking packages and install from CRAN.
The implementation of Curto et al.'s model shown in Figure 7 is independent of any knowledge of SBML. It is included here to illustrate what comes "naturally" when implementing a model in R, see Discussion.
Systems Biology Markup Language
extensible markup language
Mathematical Markup Language
ordinary differential equation.
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novere N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19: 524–531. 10.1093/bioinformatics/btg015
Finney A, Hucka M: Systems biology markup language: Level 2 and beyond. Biochem Soc Trans 2003, 31: 1472–1473.
Shapiro BE, Hucka M, Finney A, Doyle J: MathSBML: a package for manipulating SBML-based biological models. Bioinformatics 2004, 20: 2829–2831. 10.1093/bioinformatics/bth271
Keating SM: SBMLToolbox.[http://sbml.org/software/sbmltoolbox/]
Morrison PF, Allegra CJ: Folate cycle kinetics in human breast cancer cells. J Biol Chem 1989, 264: 10552–10566.
Curto R, Voit EO, Sorribas A, Cascante M: Mathematical models of purine metabolism in man. Math Biosci 1998, 151: 1–49. 10.1016/S0025-5564(98)10001-9
Systems Biology Markup Language[http://sbml.org]
The R Project for Statistical Computing[http://www.r-project.org/]
SBML Online Tools[http://sbml.org/tools/htdocs/sbmltools.php]
Sauro HM, Hucka M, Finney A, Wellock C, Bolouri H, Doyle J, Kitano H: Next generation simulation tools: the Systems Biology Workbench and BioSPICE integration. Omics 2003, 7: 355–372. 10.1089/153623103322637670
Radivoyevitch T: Sphingoid base metabolism in yeast: Mapping gene expression patterns into qualitative metabolite time course predictions. Comparative & Functional Genomics 2001, 2: 289–294. 10.1002/cfg.106
Voit EO, Radivoyevitch T: Biochemical systems analysis of genome-wide expression data. Bioinformatics 2000, 16: 1023–1037. 10.1093/bioinformatics/16.11.1023
This research was supported by the Biostatistics Core Facility of the Comprehensive Cancer Center of Case Western Reserve University and University Hospitals of Cleveland (P30 CA43703), by the American Cancer Society (IRG-91-022-09), and by the National Cancer Institute's Integrative Cancer Biology Program (P20 CA112963-01).
TR is the sole contributor.