Meta-All: a system for managing metabolic pathway information
© Weise et al; licensee BioMed Central Ltd. 2006
Received: 22 May 2006
Accepted: 23 October 2006
Published: 23 October 2006
Many attempts are being made to understand biological subjects at a systems level. A major resource for these approaches are biological databases, storing manifold information about DNA, RNA and protein sequences including their functional and structural motifs, molecular markers, mRNA expression levels, metabolite concentrations, protein-protein interactions, phenotypic traits or taxonomic relationships. The use of these databases is often hampered by the fact that they are designed for special application areas and thus lack universality. Databases on metabolic pathways, which provide an increasingly important foundation for many analyses of biochemical processes at a systems level, are no exception from the rule. Data stored in central databases such as KEGG, BRENDA or SABIO-RK is often limited to read-only access. If experimentalists want to store their own data, possibly still under investigation, there are two possibilities. They can either develop their own information system for managing that own data, which is very time-consuming and costly, or they can try to store their data in existing systems, which is often restricted. Hence, an out-of-the-box information system for managing metabolic pathway data is needed.
We have designed META-ALL, an information system that allows the management of metabolic pathways, including reaction kinetics, detailed locations, environmental factors and taxonomic information. Data can be stored together with quality tags and in different parallel versions. META-ALL uses Oracle DBMS and Oracle Application Express. We provide the META-ALL information system for download and use. In this paper, we describe the database structure and give information about the tools for submitting and accessing the data. As a first application of META-ALL, we show how the information contained in a detailed kinetic model can be stored and accessed.
META-ALL is a system for managing information about metabolic pathways. It facilitates the handling of pathway-related data and is designed to help biochemists and molecular biologists in their daily research. It is available on the Web at http://bic-gh.de/meta-all and can be downloaded free of charge and installed locally.
Modern molecular biological research produces large amounts of data. One main problem remains the elucidation and representation of relationships between the compounds of biological systems. A large number of information systems holding data from molecular biology has been developed by public and private research projects . A rapidly growing field is the collection of data associated to metabolic pathways, available from information systems such as KEGG , BRENDA , UM-BBD  or Reactome . Depending on the requirements defined upon the initiation of these projects, the features and implementations differ significantly among these systems. For example, BRENDA holds detailed kinetic information about enzymes, whereas KEGG contains maps of metabolic pathways and detailed descriptions about their elements. Often, the existing systems are not well structured, because they have been grown over the years. Hence, they are not able to store all necessary information . For example, many of the systems are not capable of describing the complexity of higher organisms. Especially the differentiation between loci inside of organisms, the consideration of developmental stages and stress factors, and detailed representation of kinetics and regulation is only partially possible. Additionally, there are errors resulting from faulty text-mining procedures or genome-based pathway predictions, which make the data only conditionally useful. Hence, existing systems often represent collections of reference pathways. This is useful for getting an idea of the metabolic processes as in the case of the well-known Boehringer charts [7, 8], however, in many cases it is insufficient. Only rudimentary details are stored about the real location of these processes, such as the organism, tissue, cell type and compartment, while comprising comprehensive information about pathway structures in general. Also, only sparse information is given about circumstances under which the data points were determined, such as growth conditions or sampling time. Hence, some of the represented pathways are non-occurring in reality. As an exception the SABIO-RK system  should be mentioned, which focuses on the storage of fine-grained and high-quality data in a central instance.
Another important problem biochemists and molecular biologists are faced with is the management of their own data. Currently, experimentalists can develop an own information system for managing that own data, but this is very time-consuming and costly. An alternative would be to try to get their data into existing systems. Often, this is not possible. In case that scientists have different opinions about certain items, pathway data needs to be stored in different parallel versions. However, this is not possible with most of the existing systems.
In an attempt to overcome these problems and to provide the scientific community with a software system that meets their requirements, we initiated the META-ALL project. META-ALL is designed for management of detailed information about metabolic pathways, including reactions, translocations, substances, pathways, locations and kinetic parameters. The system contains a versioning system and a Web interface for entering and querying of data.
Meta-All is a software package that is initially distributed with an example data-set only. It can be installed locally to help biologists and biochemists in their daily research. The user can enter own experimental data as well as data sets from other sources such as publications or databases. Once the user's instance of Meta-All is populated with sufficient amounts of data, a variety of complex queries will be possible due to the detailed structure of the database schema. Examples for such queries are dependent on the filling of the system, and could be: "What is the K m value for enzyme x in compartment y of organ z of organism w?" or "If a value for organism x is missing, what value was measured in the closest relative (in taxonomical sense) of that organism?"
In this paper, at first we give an overview about the technical background of the system. In the second part we describe the database schema, before we provide details about the user-interface we developed for managing the contents of the database. Finally, we discuss our system in relation to other systems.
Technical design of Meta-All
META-ALL uses the Oracle database management system, which is extensively used in academia and industry. Furthermore, Oracle provides several useful tools for easy creation of user interfaces and a security concept for protecting selected data. For the user-frontend, we use Oracle Application Express . It allows the wizard-aided creation of Web-interfaces based on the underlying database schema with the help of a set of components, such as graphical and non-graphical reports, forms and trees, which can be easily integrated into different areas (regions) of Web pages. We use the user management provided by Oracle Application Express for allowing the creation of users with different rights such as read/write or read-only access. Oracle Application Express addresses also the concerns of multiple-user access with built-in check constraints. Hence, it is possible for multiple users to use META-ALL at the same time; it is ensured that two users cannot edit the same data simultaneously and that one user cannot overwrite the respective changes of the other user.
META-ALL can be accessed in two ways. First, we provide a demo instance of META-ALL on our server which is accessible via our project Web page . This instance is intended to provide an overview about the abilities of META-ALL. A guest user account with the username "guest" and the password "meta-all" allows the user to see a pathway from sucrose breakdown in potato as a test data set . The demo instance will be reset at regular intervals. The second possibility is to download META-ALL and to install it locally. That requires Oracle DBMS and Oracle Application Express running. This can be either a commercial version of Oracle software or an Oracle Express Edition , which is a freely available lightweight Oracle DBMS coming along with an integrated Application Express installation. The META-ALL application including the initial data set and the installation procedure is available for download free of charge at our project Web page.
Conversions and substances
The central parts of the schema are conversions and substances. A conversion is a reaction or a translocation, both of them either actively (enzyme-catalysed or transporter-mediated, respectively) or passively (spontaneous or by diffusion, respectively). All necessary information can be stored with every conversion: name and synonym(s), formula, type (reaction, translocation) and subtype (kinetic reaction type for enzymes, type of transporter). Conversions are categorised into pathways and pathways into super-pathways. Furthermore, some specific information can be included, e.g. for the layout of cycles (if there is a clockwise or an anti-clockwise cycle, or if it is an open or closed one). The substance is an umbrella term for transporters, enzymes, metabolites and macromolecules, the latter of which consists of one or more types of metabolite units. Each substance plays a certain role in a conversion, which can be catalyst in case the substance is an enzyme, or substrate (= input substance) in case the substance is a metabolite. Furthermore, a metabolite can be a modulator (activator or inhibitor) of a reaction, the product (= output substance) or act as a coenzyme. Each of the role information can be enriched by detailed information such as V max values, affinity constants, etc. Each piece of information is assigned to the location information and publications.
Locations and taxonomy
Conversions take place at different locations inside an organism depending on the developmental state and environmental effects, and the database schema reflects as many of these parameters as possible. We focused on the use of existing ontologies in order to have a controlled vocabulary allowing the comparison of data from different sources. For our instance of META-ALL we decided to use PlantOntology  terms to distinguish cytological aspects of plants and also for developmental stages, because the scientific focus of our institute is plant research. The META-ALL system can be easily extended by other ontologies or user-defined terms, respectively. To determine the taxonomy of the organisms, we use the NCBI taxonomy ID  enriched by established attributes such as family, genus or species. Additionally, META-ALL allows the storage of genetic distance matrices of different organisms. The user can infer these matrices for example by comparison of gene sequences from the organisms stored in the database. Thus, in case a pathway is poorly investigated in one organism, this feature allows the recruitment of data from a phylogenetically closely related species.
As publications represent probably the most important knowledge source, it is crucial to store reference information with as many data points as possible. We decided to use the widely accepted standard of PubMed IDs instead of developing an integrated publication management. Hence, META-ALL stores the PubMed ID for each publication and a text field is provided to store necessary information for yet unpublished data. The record has to be updated manually once a PubMed ID is available. If using the META-ALL Web-Interface described below, the user is shown the PubMed ID and, if existing, the remark field. The NCBI Web page with the abstract of the publication can be opened in a new window using a provided link.
Quality tags and data versioning
The data for which META-ALL is intended usually has different quality levels. Hence, the storage of additional quality information and the possibility to store curated data, as it was pointed out in , is required. In addition, there is a strong need for a versioning system for a pathway database like META-ALL, because biochemical data is sometimes ambiguous and even long-established views are subject to change. We need to be able to store different revisions of pathways as scientists tend to have different opinions about certain issues. There are several possibilities for data versioning [18, 19]. Linear versioning could be used, which means that each version has only one successor. In contrast, as one of our requirements to the system was to enable parallel instances of pathways, a hierarchical versioning would be a benefit. Both linear and hierarchical versioning can be subdivided into continuous versioning and discrete versioning. Continuous versioning means that every simple change is a new version. In contrast, if using discrete versioning, which was used for META-ALL, a new version is created at a certain interval. In the context of META-ALL, the decision to create a new version is made by the user. All versioned values are stored in the same table distinguished by the status. We do not work with shadow tables archiving older values, because then only one value would actually be available. The term "status" as used in META-ALL is a label comprising the time-stamp at which a new version was created. In addition, it contains information such as the author of the version, the originality of the data (e.g. in-house data, public database), the quality of the data (e.g. hand-curated, imported from external source, putative) and the ID of the status record the newly generated status (or version) is based on. The labelled data sets can be improved gradually. Additionally, old pathways can be kept for publication purposes (in case the pathway has been improved in the meanwhile).
The Meta-All Web-Interface
META-ALL is an information system for managing data about metabolic pathways, which is available for download and local installation, in order to help experimentalists in their daily research. We provide both the database schema (together with an initial set of data from sucrose breakdown in the potato tuber) and the user interface. It is available from the META-ALL Web page. Additionally, we provide a demo instance of the META-ALL system on our server.
The pathway by which sucrose is transformed to starch in developing potato tubers has been subject of numerous studies over several decades (for a review see ). A detailed kinetic model has been created  in an attempt to better understand the dynamics of this pathway. The model consists of 14 reactions with their according rate-laws, which are further defined by a total of 74 constants. The data contained in the model represents a useful test data set for META-ALL and was thus entered into the system. It is possible to represent all data from the model in META-ALL, including:
The enzyme name, EC number and stoichiometric formula.
The kinetic type of the reaction, together with the organism this was determined in.
The name of each metabolite, the role it plays in which reaction, and the binding- and dissociation order in these reactions.
The kinetic parameters of each reaction such as binding or inhibitory constants, maximal velocity and equilibrium constant; their value and unit; the organism, tissue, cell, and compartment the constant was measured in.
For all data, the reference and the according PubMed ID (if available) is given, together with a tag if this kinetic type was measured or estimated.
The structure of the metabolic network has been exported from META-ALL as an SBML file and imported into the data visualisation system VANTED for a visualisation of the pathway. Figure 3 shows the result of the visualisation. While this procedure shows the potential of the system, it should be pointed out that the connection of META-ALL to VANTED is only one of numerous possibilities that could be implemented depending on the preferences of the user.
Although there exist several information systems for storing metabolic pathway data, we decided to develop a new one, because many of the existing systems are not capable of meeting the requirements of experimentalists working on higher organisms. We analysed a variety of existing information systems concerning their advantages and disadvantages and made these information the fundament of our requirement analysis. An exciting study in this field is published in .
An information system of high interest is SABIO-RK . It focuses on the curation of data about biochemical reactions, including their kinetic properties. The limitations of SABIO-RK are that data is stored in a central instance with a read-only access and that there is no user interface for inserting data. A system with similar goals as Meta-All is aMAZE . It consists of a WorkBench for storing pathway data and a front-end (LightBench) for accessing this data. Additional components, namely the Snow Workbench and a direct SQL access to the data, are available. To the best of our knowledge, aMAZE cannot store information about reaction kinetics, and is not able to write SBML files. In the current version the aMAZE LightBench can be used for querying data in the central installation in Belgium only. The intension of META-ALL is to support experimentalists in their daily work. While most of the existing systems focus on storing and curating data in a central instance, e. g. KEGG, and can just be accessed read-only, META-ALL can be installed locally, and data can be inserted and edited. Research groups or even whole institutes can manage their "fresh" data, or even data subject to current investigations. The management is supported by versioning and quality tagging. Furthermore, not only reaction data, but also translocation data can be managed with META-ALL.
META-ALL uses an Oracle database management system for persistently storing metabolic pathway data and an Oracle Application Express user interface. An installation instruction containing database schema, user interface and initial data set can be obtained from the META-ALL project Web page. It can be used with either a commercial Oracle license or the Oracle Database 10 g Express Edition, which is available free of charge from Oracle Corporation.
We prepared several reports for the META-ALL user interface and used our system in conjunction with VANTED for pathway visualisation. We plan to connect META-ALL to the Systems Biology Modelling Environment (SYBME)  to further support the user in the kinetic metabolic modelling. Within SYBME, a user will be able to browse through the information of metabolites and reactions available in his/her META-ALL instance, may combine this information into a kinetic model and can finally visualise and simulate the model with the two connected simulators, GEPASI  and JARNAC .
For the future, we plan to facilitate the following application using META-ALL: the semi-automatic generation of kinetic models of a certain pathway. The procedure for this task will be as follows: the user enters the pathway that should be modelled and defines the boundaries of the model (substrates, products). An application of META-ALL then queries the database for all information about the enzymes present in this pathway at a specific location in a given organism. If a certain kinetic parameter is not available for an enzyme, it will be possible to take the data from the taxonomically nearest neighbour species for which the data is available with the help of the genetic distance matrix included in META-ALL. This is a common approach for constructing kinetic models, which is normally done through tedious literature searches. The data can then be presented to the user who can decide which data to keep and which to change. Subsequently, the model could be exported in standard formats such as SBML. We are aware that this is a future application implying that the database is filled with a sufficient amount of data, which is in the responsibility of the user. The user could, for example, import large datasets from existing sources and improve the data step by step. This procedure is supported by the quality and versioning tagging described in subsection Database schema.
For the daily work, the existing Web-Interface should be appropriate. In the future, we plan to expand META-ALL by an SBML importer allowing the import of larger amounts of data at once.
In this paper, we presented META-ALL, a metabolic pathway information system, which can be downloaded and installed locally. It is intended to support biochemists and molecular biologists in their daily research by providing a platform for the management of detailed information about metabolic pathways, including reactions, translocations, substances, pathways, locations and kinetic parameters. META-ALL contains a versioning system, quality tags and a Web interface for entering and querying of data. Pathways can be exported into SBML files for use in visualisation or simulation tools.
Availability and requirements
Project name: META-ALL
Project home page: http://bic-gh.de/meta-all
Operating system(s): Client: only a Web browser required; Server: OS depending on the Oracle installation, e. g. Microsoft Windows, Linux
Programming language: User-interface using the Oracle Application Express technology and SQL, PL/SQL
Other requirements: Oracle DBMS 9i/10g (version 188.8.131.52 and higher) and Oracle Application Express 2.0 license if available, or Oracle Database 10g Express Edition (free of charge entry-level Oracle DBMS, including Application Express)
License: Apache License, Version 2.0
Any restrictions to use by non-academics: According to license
We thank Sophia Biemelt, Frederik Börnke, Mohammad Hajirezaei, Hardy Rolletschek and Uwe Sonnewald for valuable discussions, the anonymous reviewers for valuable comments and the German Federal Ministry for Education and Research (BMBF) for financial support.
- Galperin MY: The Molecular Biology Database Collection: 2006 update. Nucleic Acids Research 2006, (34 Database):D3-D5. 10.1093/nar/gkj162Google Scholar
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Research 2006, (34 Database):D354-D357. 10.1093/nar/gkj102Google Scholar
- Schomburg I, Chang A, Schomburg D: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Research 2004, (32 Database):D431–433. 10.1093/nar/gkh081Google Scholar
- Ellis LBM, Roe D, Wackett LP: The University of Minnesota Biocatalysis/Biodegradation Database: the first decade. Nucleic Acids Research 2006, (34 Database):D517-D521. 10.1093/nar/gkj076Google Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 2005, (33 Database):D428–432.Google Scholar
- Philippi S, Köhler J: Addressing the problems with life-science databases for traditional uses and systems biology. Nature Reviews Genetics 2006, 7(6):482–488. 10.1038/nrg1872View ArticlePubMedGoogle Scholar
- Michal G, (Ed): Biochemical Pathways (wall charts). Boehringer Mannheim, Penzberg Third edition. 1993.Google Scholar
- Michal G, (Ed): Biochemical Pathways – An Atlas of Biochemistry and Molecular Biology. A John Wiley & Sons, Inc. and Spektrum Akademischer Verlag Co-Publication; 1999.Google Scholar
- Wittig U, Golebiewski M, Kania R, Krebs O, Mir S, Weidemann A, Anstein S, Saric J, Rojas I: SABIO-RK: Integration and Curation of Reaction Kinetics Data. In Data Integration in the Life Sciences: Third International Workshop, DILS 2006, Hinxton, UK, July 20–22, 2006. Proceedings, Volume 4075 of Lecture Notes in Bioinformatics. Edited by: Leser U, Naumann F, Eckman B. Springer Berlin/Heidelberg; 2006:94–103.View ArticleGoogle Scholar
- Oracle Application Express[http://www.oracle.com/technology/products/database/application_express]
- Meta-All Project Web Page[http://bic-gh.de/meta-all]
- Junker BH: Sucrose breakdown in the potato tuber. PhD thesis. Potsdam University; 2004.Google Scholar
- Oracle Database 10g Express Edition[http://www.oracle.com/technology/products/database/xe/index.html]
- Boehm B: A spiral model of software development and enhancement. IEEE Computer 1988, 21: 61–72.View ArticleGoogle Scholar
- The Plant Ontology Consortium: The Plant Ontology Consortium and Plant Ontologies. Comp Funct Genomics 2002, 3: 137–142. 10.1002/cfg.154PubMed CentralView ArticleGoogle Scholar
- Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, Tatusova TA, Rapp BA: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 2000, 28: 10–14. 10.1093/nar/28.1.10PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang P, Foerster H, Tissier CP, Mueller S, Paley S, Karp PD, Rhee SY: MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research. Plant Physiol 2005, 138: 27–37. 10.1104/pp.105.060376PubMed CentralView ArticlePubMedGoogle Scholar
- Dadam P, Lum VY, Werner HD: Integration of Time Versions into a Relational Database System. In 10th International Conference on Very Large Data Bases (VLDB). Edited by: Dayal U, Schlageter G, Seng LH. Morgan Kaufmann; 1984:509–522.Google Scholar
- Date CJ, Darwen H, Lorentzos N: Temporal Data and the Relational Model. Morgan Kaufmann; 2003.Google Scholar
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin I, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Novere NL, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19: 524–531. 10.1093/bioinformatics/btg015View ArticlePubMedGoogle Scholar
- W3C Math Home[http://www.w3.org/Math]
- Junker BH, Klukas C, Schreiber F: VANTED: A System for Advanced Data Analysis and Visualization in the Context of Biological Networks. BMC Bioinformatics 2006, 7: 109. [EPub] [EPub] 10.1186/1471-2105-7-109PubMed CentralView ArticlePubMedGoogle Scholar
- COPASI Web Page[http://www.copasi.org]
- Geigenberger P: Regulation of sucrose to starch conversion in growing potato tubers. Journal of Experimental Botany 2003, 54: 457–465. 10.1093/jxb/erg074View ArticlePubMedGoogle Scholar
- Wittig U, De Beuckelaer A: Analysis and comparison of metabolic pathway databases. Brief Bioinformatics 2001, 2: 126–142. 10.1093/bib/2.2.126View ArticlePubMedGoogle Scholar
- Lemer C, Antezana E, Couche F, Fays F, Santolaria X, Janky R, Deville Y, Richelle J, Wodak S: The aMAZE LightBench: a web interface to a relational database of cellular processes. Nucleic Acids Research 2004, 32: D443-D448. 10.1093/nar/gkh139PubMed CentralView ArticlePubMedGoogle Scholar
- Junker BH, Koschützki D, Schreiber F: Kinetic Modelling with the Systems Biology Modelling Environment SyBME. Journal of Integrative Bioinformatics 2006, 1: 18. [EPub] [EPub]Google Scholar
- Mendes P: GEPASI: a software package for modelling the dynamics, steady states and control of biochemical and other systems. Computer Applications in the Biosciences 1993, 9: 563–571.PubMedGoogle Scholar
- Sauro HM: Jarnac: a system for interactive metabolic analysis. In Animating the Cellular Map: Proceedings of the 9th International Meeting on BioThermoKinetics. Edited by: Hofmeyr JH, Rohwer JM, Snoep JL. Stellenbosch University Press; 2000:221–228.Google Scholar
- Fowler M, Scott K: UML Distilled: A Brief Guide to the Standard Object Modeling Language: AND The Unified Process Explained. 3rd edition. Addison Wesley; 2004.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.