The construction of mathematical models of metabolic networks involving the integration of distributed data can be implemented as Taverna workflows. Automation of these processes provides systematic support for model creation, parameterisation, calibration and simulation, and thus reduces errors or inconsistencies occurring from the manual mapping and tracking of data between information repositories and models. These workflows rely on reaction data which were provided by a community effort to develop a consensus network of metabolism in yeast which met established systems biology standards in the form of SBML and MIRIAM .
The construction of models is normally a lengthy and labour-intensive process requiring the manual input of data for each biochemical reaction . This is also true when use is made of applications such as Cell Designer and COPASI which support the modelling of biological systems. Parameterised models can be semi-automatically created using online tools such as SYCAMORE, Systems biology's Computational Analysis and Modeling Research Environment, based on the selection of a set of reactions from SABIO-RK , which can then be used in simulations. The way models are constructed in these tools differs from our workflows, which relieve the need for manual entry of data by automatically building an SBML model based on some criteria, such as a list of metabolic enzymes, provided by the user (Figure 3). The resulting SBML model is annotated according to MIRIAM guidelines and this makes it possible for kinetics from SABIO-RK to be systematically integrated into SBML models by the parameterisation workflow (Figure 4). These SBML models provide a starting point for the construction of mathematical models for biological systems, and adherence to standards means that the workflows can consume models developed using other approaches, and that the models produced can be consumed by existing tools.
Previously, the manual assembly of models in systems biology has been preferred due to issues with combining distributed data sources and tools . However, online and downloadable applications can integrate the use of tools and data, for example, the BioModels database  can run simulations of the SBML models stored in it via an interface to JWS online . Models constructed using SYCAMORE can also be used in simulations by way of its interoperation with COPASI and ProMOT . A set of Java programs have also been developed by Radrich et al., (2010) to integrate data from KEGG and AraCyc to reconstruct qualitative genome-scale models of Arabidopsis thaliana . In addition, a Java application called MetaCrop has been developed by Weise et al., (2009) to reconstruct quantitative models of metabolic pathways for plants which can then be simulated using COPASI . Furthermore, a software tool called GRaPe can parameterise the kinetics of reactions and integrate gene expression and protein levels into models for simulation using the SBML ODE Solver in CellDesigner . This current work appears to be a novel application of using computational workflows for the construction, parameterisation, calibration and simulation of metabolic models. The advantage of using workflows is the interoperability of tools and databases by the loose coupling offered through the use of computational resources which have been deployed as web services. Moreover, workflows provide an explicit record of the steps involved in the construction and parameterisation of a model that can be shared for use with the systems biology community.
The enactment of a workflow by Taverna generates provenance to provide a record of the intermediate data that have been integrated into a SBML model which is generally not recorded during the manual construction of models. Using this provenance, we have examined the performance of our workflows. The execution times for both the qualitative modelling and parameterisation workflows were found to increase in a broadly linear fashion with increasing number of reactions (Additional file 3). Using glycolysis as a model test case, the parameterisation workflow took the longest time to execute at 3 min 42 s, followed by the qualitative modelling workflow which took 44.9 s on average. The calibration workflow required approximately 22 seconds to complete, whilst the simulation workflow was the fastest to enact at 6 s. The reason as to why the parameterisation workflow is the bottleneck in these workflows is due to the fact that a large number of queries has to be made to the SABIO-RK database in order to retrieve identifiers to reactions for each metabolite and enzyme for every reaction in the qualitative SBML model. These reaction identifiers are then used to perform a query to identify reaction kinetics stored in SABIO-RK that can be mapped onto reactions in the qualitative SBML model.
Our system for implementing data integration processes as workflows highlighted various data integration issues in systems biology. For example, enzyme kinetics data were not available for every reaction even in a well-studied system such as yeast glycolysis. This required failsafe measures to be undertaken by the parameterisation workflow through the substitution of mass action kinetics in these reactions. Discrepancies were also found between the list of reactants and products in reactions from the consensus model of yeast metabolism compared with those in SABIO-RK. This appears to have arisen from the charge balancing of reactions in the consensus model which caused problems with integrating data from SABIO-RK in our workflows. Inconsistent referencing of metabolites with database identifiers between web services can also hinder the automatic assembly of models. This can lead to anomalous models being built which therefore requires the careful checking of results between each workflow enactment. Future work will enhance the current set of workflows. The criteria against which models can be constructed will be expanded to use, for example, terms from the Gene Ontology  so that models for specific biological processes can be generated. A set of workflows will also be developed for the validation of results from systems biology models by their comparison with experimental data.