BNDB – The Biochemical Network Database
© Küntzer et al.. 2007
Received: 02 July 2007
Accepted: 02 October 2007
Published: 02 October 2007
Skip to main content
© Küntzer et al.. 2007
Received: 02 July 2007
Accepted: 02 October 2007
Published: 02 October 2007
Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources.
We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB.
BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
The development of high-throughput technologies has generated an extensive quantity of -omics data over the last decades. Despite the technological progress, improvements in the application area, e.g. in drug discovery, have failed to keep pace with increased research and development spending, as demonstrated by Nightingale et al. . One of the main reasons for this discrepancy is the increasing number of highly focused databases differing in both the data models and the interfaces . The databases are often independently developed, have a substantial overlap and are not well standardized. The absence of a standardization limits the usability of these databases and leads to a demand for a unified access to the data .
Hence, a large number of systems addressing this problem with diffierent approaches have been developed. These approaches can be classified by their architecture into three main categories : navigators, mediators, and warehouses. The first category, navigators, is based on the idea of a navigational or link-based integration of several data sources. Such a portal normally does not integrate the data itself, but provides the user with pages navigating to external data sources. Well-established examples of portal systems are SRS , BioNavigator , and Entrez . A mediator gives access to distributed data by reformulating the queries of the user at runtime into queries on external data sources. However, availability and efficiency are major drawbacks of such solutions. Examples for this category are Discovery Link , TAMBIS , and BioMediator . Systems of the third category, warehouses, require a complete semantic integration of the data from various external data sources into a single local database via an integrative data model. Such approaches allow for an efficient execution of queries since they avoid typical problems of the other methods such as network bottlenecks, short-time unavailability of the external data sources, and changes in the external data sources. However, data warehouses usually require complex data models and regular updates of the integrated data sources, in order to avoid the possibility of returning outdated query results. BNDB is a representative of this category, as are other systems like GUS , ONDEX , cPath , and Biozon .
In the current state, BNDB represents a comprehensive collection of biological data integrated from the following data sources:
For the horizontal data integration [29, 30] of these data we implemented comprehensive merging heuristics. The key concept behind these methods is the integration of complementary data sources and the elimination of redundancy in the data. We use two fundamental approaches for the merging of the data:
(1) object matching based on unambiguous external identifiers and (2) structural matching based on identical object relations.
The merging process itself consists of several steps: In an initial step, we merge most of the database objects by their identifiers and remove redundancy in their attributes through the first approach. Then, in the second step we collect and merge all equivalent events in BNDB through the second structural approach.
For accessing BNDB we offer three different ways: a web interface, a network visualizer, and a programming interface.
The graph and visualization capabilities of our application are comparable to that of visualization systems such as Cytoscape , PathSys , VisANT , or commercial tools such as MetaDrug  or PathwayStudio . Additionally, BiNA offers a multifunctional workbench, which is easily extensible. The viewer itself can be regarded as a collection of modules that depend on each other. The hierarchical plugin system automatically resolves dependencies between plugins through a well-defined and very powerful interface. The plugin structure of BiNA allows for an easy integration of own analysis routines. Currently, several plugins exists, e.g. for mapping gene expression data onto the network, pathway search algorithms, or exporting pathways into SBML and BioPAX.
BNDB is fully integrated with the Biochemical Network Library BN++ [15, 16] providing a sophisticated programming interface. Hence, arbitrary data like a complete pathway can be serialized and deserialized from C++ by a single line of code. This speeds up the development process of analysis routines, since a programmer can concentrate on the implementation of the algorithm. In addition, the BN++ software framework offers a comprehensive collection of implemented analysis routines.
The C++ programming interface provides a convenient, but very flexible way to merge the data. With a few lines of code it is possible to construct a customized local meta-database containing only that data the user requires.
With BNDB we present a data warehouse system integrating a large number of different biological databases. Access to these data is provided through a generic web interface allowing for adding, editing, and searching the data in BNDB. In addition, we have developed BiNA, a powerful and extensible tool for visualizing biochemical networks directly from BNDB. Through the BN++ software framework BNDB is easily accessible for software developers and can be integrated into tailor-made applications and customized to user needs. All tools and methods described herein, BNDB, BiNA, the source code, the web interface to BNDB, and the underlying data model are freely available from our website.
A major advantage of BNDB is its underlying data model BioCore. This comprehensive and extensible object model can represent most currently known biochemical entities and processes. Therefore, BNDB is able to store a huge variety of different biochemical data. Researchers can easily adapt it to their own needs and build customized databases. Another benefit is the full integration of BNDB into the visualizer BiNA. Other systems often present only a database with an analysis tool (e.g. Biozon), or a database with a web interface (e.g. Entrez). For the graphical representation of the networks, many of these systems use standard visualizer (e.g. Cytoscape). However, we think that the full integration of an own visualization tool facilitates the visualization and presentation of the stored data.
We have developed several applications based on BNDB that show the usefulness of the approach, e.g. an efficient gene set analysis tool, GeneTrail , which enables the user to identify enriched functional categories in protein or gene sets. GeneTrail has been successfully applied to detect a molecular target of the antimicrobial metabolite kendomycin .
In summary, BNDB is a comprehensive database system, which makes it not only possible to retrieve the combined information of integrated data sources in an easy way, but can also be customized and extended to meet the needs of different users.
Project name: BNDB;
Project home page: http://www.bndb.org;
Operating system(s): Platform independent;
Programming language: Java; Other requirements: Java 1.6.0 or higher;
Licence: GNU GPL;
BNDB is freely accessible at http://www.bndb.org. The current versions of BN++ and BiNA are distributed under the GNU GPL license and available from the website http://www.bnplusplus.org/downloads.
Biochemical Network Library
Biological Network Analysis
Database Management System
National Center for Biotechnology Information
Boost Graph Library
Standard Query Language
Systems Biology Markup Language
Biological Pathways Exchange
Gene Expression Omnibus database
The project was funded by the Deutsche Forschungsgemeinschaft (BIZ4:1-4) and the Klaus Tschira Foundation.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.