DIPSBC - data integration platform for systems biology collaborations
© Dreher et al.; licensee BioMed Central Ltd 2012
Received: 20 December 2011
Accepted: 1 May 2012
Published: 8 May 2012
Modern biomedical research is often organized in collaborations involving labs worldwide. In particular in systems biology, complex molecular systems are analyzed that require the generation and interpretation of heterogeneous data for their explanation, for example ranging from gene expression studies and mass spectrometry measurements to experimental techniques for detecting molecular interactions and functional assays. XML has become the most prominent format for representing and exchanging these data. However, besides the development of standards there is still a fundamental lack of data integration systems that are able to utilize these exchange formats, organize the data in an integrative way and link it with applications for data interpretation and analysis.
We have developed DIPSBC, an interactive data integration platform supporting collaborative research projects, based on Foswiki, Solr/Lucene, and specific helper applications. We describe the main features of the implementation and highlight the performance of the system with several use cases. All components of the system are platform independent and open-source developments and thus can be easily adopted by researchers. An exemplary installation of the platform which also provides several helper applications and detailed instructions for system usage and setup is available at http://dipsbc.molgen.mpg.de.
DIPSBC is a data integration platform for medium-scale collaboration projects that has been tested already within several research collaborations. Because of its modular design and the incorporation of XML data formats it is highly flexible and easy to use.
KeywordsData integration XML Data visualization
Systems biological research is frequently carried out within collaborations connecting multiple labs each conducting a specific type of experimental work. The ultimate goal of these research collaborations is the integrated analysis of the data generated within the consortium. Data integration involves the storage and cross-linking of initially independent and heterogeneous data sets. This allows for the simultaneous analysis of data sets and therefore enhances the overall functional interpretation, which provides additional information compared to the sequential analysis of single data sets [1–4]. An important prerequisite for data integration is the standardization of storage and exchange formats, both within data domains (e.g. mass spectrometers of different manufacturers) and across different data domains (e.g. mass spectrometry and DNA microarrays), since such data typically show a lack of coherence [5, 6].
In this article we describe a data integration platform that provides a flexible representation of collaborative data based on XML. It is designed for research collaborations, typically involving heterogeneous 'omics' data along with functional data from validation experiments, genetic and phenotypic data. The introduction of new data types or the modification of existing data types can be easily accomplished, thus providing high format extensibility. This data representation approach takes advantage of a growing number of XML data formats in biotechnology [7–15].
The system is built upon three components: a) the web-server (Foswiki), providing a convenient user interface; b) the search index (Solr/Lucene), which can be accessed through the user interface, providing a fast full-text search engine; and c) helper applications (Java applets), providing interactive, data specific analysis functionality.
All components of the system are platform-independent, open-source developments, and thus can be easily adopted by researchers. An example installation of the collaboration platform with proto-typical public data sets is provided at http://dipsbc.molgen.mpg.de.
The functionality of the data integration system is realized by a combination of four components: XML, Solr/Lucene, Foswiki, and Java applets. In the following we describe the implementation and interplay of these components.
Integration of existing and user-defined XML formats
Standards initiatives and XML formats for different experimental technologies
MGED society (http://www.mged.org)
mzML, mzData, mzXML
HUPO PSI-MS (http://www.psidev.info)
HUPO PSI-MI (http://www.psidev.info)
In situ hybridization / Immunohistochemistry
MGED society (http://www.mged.org)
MIACA Standards Initiative (http://miaca.sf.net)
RDML consortium (http://www.rdml.org)
Genomics standards consortium (http://gensc.org)
Systems biology / Pathways
SBML, CellML, BioPAX
Data normalization and indexing
In a first data integration step in DIPSBC, all experimental data are converted to XML. The conversion can be done either by publicly available or by custom parsers. We use XML Schema Definition (XSD) in order to syntactically define the structure of the XML files and to ensure their data integrity. If available, community compliant XSDs like mzData (for mass spectrometry; ), MAGE-ML (for DNA microarrays; ), or PSI-MI (for molecular interactions; ) are used. This is an ultimate benefit since it ensures wide acceptance and compatibility of the data formats. For more specific data sets that lack community standards, custom schemas can be easily developed.
Index contents of the current DIPSBC example installation
Nr. of entries
Protein mass spectra
PRIDE acc. 8538
Peptide tandem mass spectra (Homo sapiens) with identifications
GEO acc. GSE3325
Prostate cancer study; chip platform: Affymetrix U133 Plus 2.0 arrays (Homo sapiens)
GEO acc. GSE1133
Novartis gene atlas 2004 (mouse and human arrays)
GEO acc. GSE10204, GSE11193
Genetic functional basics of water-binding- capacity in pork; chip platform: Affymetrix Porcine Whole Genome Array
Summary tables of statistical analyses
Test result tables
Results of statistical analyses of microarray experiments
Microsatellite markers / phenotypes
Pig marker and trait values
Yeast-2-hybrid datasets from Rual et al. and Stelzl et al.
Interactions involving genes, proteins, and compounds; source: ConsensusPathDB
Mathematical models of gene regulatory pathways
Pig genome annotations
Homo sapiens genome annotations
Protein sequences (FASTA format)
Publications in PubMed starting from 1970
Web pages within the DIPBSC platform
Total nr. of entries
Foswiki collaboration platform and incorporation of helper applications
DIPSBC uses the Foswiki content management software as a browser-independent user interface due to its advanced features for managing collaborations. For example, users can create or edit web-pages within their browser and directly upload and share data. All modifications made to the website or attached files are tracked by a built-in revision control system. Therefore, different document versions can be compared; moreover, e-mail notifications that automatically inform users about document changes can be enabled. Additionally, the Foswiki technology provides a fine-grained user management system, which can be used to define rights for viewing or editing web pages for different users or user groups. For more general data privacy, password-protection or IP-range checks at the web server level can be applied as well.
An important feature of the proposed platform is the possibility of a straightforward incorporation of helper applications associated with the different data types. For this purpose we take advantage of the Foswiki plug-in interface to integrate specialized programs as Java applets, resulting in minimal installation efforts on the client side, as the applets are automatically started within the user's web browser. Currently the system includes three different applets: the Argo Genome Browser  and two custom developed applets: an mzData viewer, which provides a graphical representation of peptide spectra, and a graph browser, which reads molecular interaction data stored in PSI-MI files and dynamically visualizes the underlying protein-protein interaction networks (Results and Discussion).
Here, we illustrate the usage of the system with several archetypical use cases that incorporate different levels of integrated primary data.
Integration of experimental results from proteomic and transcriptomic data
Nowadays large-scale profiling on mRNA and proteome levels has become routine and increasing numbers of large-scale data sets have become available. A combination of these different experimental approaches will help to gain a more comprehensive view of biological processes and molecular networks [23–25]. Observing evidence of genes (proteins) in different heterogeneous data sets might lead to better disease markers. Data integration systems give a first glance in searching through these data sets. As an example, we used our data integration system to screen a prostate cancer gene expression study together with a mass spectrometry study of the Human Plasma Proteome Project II. In general, plasma proteome data sets could be used to identify biomarkers for certain disease states, as proteins up-regulated in diseased tissues may enter the blood stream in higher concentrations than usually . Outside of the platform, we analyzed mRNA expression differences between primary and metastatic prostate cancer cells of the transcriptomic study , GEO accession GSE3325 statistically with R and identified 5,142 differentially expressed genes (Wilcoxon-test, P < 0.01; GCRMA normalization).
We then added these study results to the Solr index. An overview of the results, including download links to the test result tables, can be found by entering 'vindex:studies study 4' in the index search field. Likewise, genes differentially expressed in the study can most easily be found by entering 'study 4' in the index search field. Because the genes' search score is boosted according to the respective log2ratio and p-value, the most significant genes will be listed on top of the result page.
Characterization of candidate genes for an animal genome with sparsely known functional information
Querying protein-protein interactions based on two different network datasets
Linking genes to interaction networks and computational models
Comparison to related data integration systems
ISA Infrastructure consists of several Java desktop applications and a relational database, built around the ISA-Tab format. Amongst others, the system provides tools for metadata structure definition (ISAconfigurator) and data input and processing by collaborating experimentalists (ISAcreator). Experiment metadata is stored in the generic ISA-Tab format and can be exported to XML-based, community compliant formats to meet the standards of public repositories like ArrayExpress, PRIDE or European Nucleotide Archive (ENA). The system is well suited for the production of standardized, richly annotated experimental data and its formal validation. However, in comparison to DIPSBC, the system's data analysis and visualization options are rather small yet. Also, it has a less strong focus on the collaboration platform as has been realized in DIPSBC by incorporating the Foswiki system and its features.
BioMart is a data management system aiming at the integration of disparate, geographically distributed data sources. Typically the latter are relational databases, each maintained independently and with its own data structure. BioMart provides a consistent graphical user interface for the unified query of all contained sources. These can be filtered by different attributes, e.g. genomic region or gene ontology term. BioMart is used by several large-scale research consortia, e.g. the International Cancer Genome Consortium (ICGC) . In general, the system is best suited for readily processed, i.e. finished data and its decentralized structure leads to a lightweight installation. However, it is less well suited to integrate complex, evolving data types that change frequently as is the case in particular at the start of new collaborative projects. Furthermore, it neither features a document sharing option nor the possibility of an index based full-text search.
Overall, both ISA infrastructure and BioMart are systems that are well suited for rather large collaboration projects, at the cost of increased time consumption and man-power. In comparison, our system has the advantage of being very flexible and extensible. In contrast to systems based on relational databases, our XML index based platform offers a straightforward way of integrating data sets via common keywords, supported by a very fast full-text search.
We presented a data integration system that is utilizing and indexing XML-based data representation formats. Thus, the basic unit of data stored within the DIPSBC platform is 'XML document'. This unit is very generic and can range from genes and pathways to whole genome microarray experiment results, implicating a very high variability in data granularity. We use XML as central data format in order to capture this granularity and to make heterogeneous data compatible, a prerequisite for the coordinated integration of the various data sets.
As a result, the document management of our system is highly flexible, community compliant and well suited for data collaborations. On the one hand, the adoption of community standards enables cross-referencing proprietary data with publicly available data sets and applets for data visualization such as genome browser etc. as was demonstrated in the use cases. On the other hand, in particular with data types that are not yet standardized or that are so heterogeneous that they can not be standardized, for example the very specific data analysis results, the system offers full format flexibility and has basically no restrictions as was demonstrated by introducing a custom standard for data analysis results (Figure 1).
Currently the procedure of adding new data to the system involves two steps: first, the member of the consortium who generated the data set (e.g. from a microarray experiment) transfers it to the administrator. Second, the administrator checks the data for integrity by XSD schema validation and then adds the normalized XML to the index. Although this procedure ensures improved data integrity by manual curation, it would still be favourable to automate the procedure of XML transformation, validation, normalization and indexing, for example by implementing custom Perl plugins. These plugins could provide data upload interfaces, enabling members of a given collaboration to directly add their experimental data to the system. A corresponding interface is currently under development and will be provided in a future version of DIPSBC.
In the age of 'omics'-data, researchers are faced with ever growing data set sizes. While the proposed XML structure is feasible for most of the functional genomic data types, it can not be applied to high-throughput sequencing experiments. The usage of XML for the representation of such data might be counterproductive here, because XML is a human-readable format which adds lots of redundant text to the actual data. Therefore, in practice we do not transform such data sets to XML, but rather create metadata XML files for the search index that store processed data. The raw files (e.g. BAM files in the case of next generation sequencing or CEL files in the case of microarrays) are stored in the file system and are only referenced by the indexed metadata XML file.
One important issue within collaborative research groups is data security. Experimentalists need to be able to maintain in control of their raw data and study results need to be dealt with confidentially before they are published in a research journal. This can best be accomplished by securing the system with password protection and possibly also IP range restrictions at the web server configuration.
Also a more fine-grained user management can be realized by using the Foswiki user group functionality. Then, certain pages of the web site can be restricted to certain users or groups. Additionally, this concept could easily be extended to the central Solr index search so that particular search results would be restricted to specific users. For this purpose, the Solr-Search-Plugin would need to read the current user ID via the respective Foswiki variable and then filter the index results according to the logged in user. An overview of corresponding current and planned developments can be found at the DIPSBC homepage under the section 'Roadmap'.
Another advantage of the Foswiki collaboration platform worth mentioning is its intuitive data exchange function. At each page, users can upload files by clicking the 'Attach' button. Other users can then download the respective files. This has two important advantages compared to data sharing via e-mail: first, files that are too large for e-mail transmission can be shared; second, the reference file is stored only once at a central location, and if the file is changed, it can be downloaded again from the same location.
An important part of the proposed data integration system is the incorporation of data analysis results that add additional value to the raw experimental data and aid in the interpretation of these data. Currently, data analyses which lie beyond the capabilities of the Java applets need to be generated outside of the platform (see above use case 'Integration of experimental results from proteomic and transcriptomic data'). However, for future development steps it might be worth considering the integration of an R interface that could enable the direct statistical processing of experimental data.
Our data integration system was already applied within several research projects, typically involving between 5 and 15 collaboration partners located at different sites. These small to medium sized projects likely represent the typical size for the majority of research projects. However, the system might as well be suited for larger collaborations, because the web server and Foswiki collaboration platform can still handle a lot more simultaneous accessions than would be generated by tens or even hundreds of participating users. This is proved by the fact that many companies use Foswiki as their intranet system, sometimes including thousands of web pages and high access rates.
As for scalability of the index machine, of course its search and index performance decreases with increasing numbers of stored documents. Nevertheless, the Solr/Lucene software library is optimized for very fast text queries on large amounts of data. E.g., the current index size of our data integration system amounts to almost 35 million indexed documents or 22.1 GB of physical storage, with Pubmed and UniProt records representing the major part. While indices of smaller size typically can be queried within split seconds, query times of this rather large index lie in the range of below one second for general queries and up to a few seconds for very complex queries. Therefore the system can be conveniently used to handle quite large amounts of documents. However, if larger index sizes are needed, as might be the case e.g. with meta-data of next-generation sequencing experiments, Solr/Lucene offers native support of distributed searches. For this purpose, a large index is split into several smaller indices on different machines, and thereby fast response times can be maintained.
All parts of the introduced system can be straightforwardly implemented. The basic system setup with the Foswiki user interface and the Solr backend can be achieved in less than one day by an experienced programmer. Also, an important advantage of the system is the fact that its components are open source. Therefore it can be modified and adjusted for specific functions.
Because of its flexibility, the system can easily incorporate additional or new data types like patient data, high-throughput sequencing data, or any other data types that will occur during future developments of experimental techniques. Adequate helper applications that make use of the underlying XML files can be developed or adapted efficiently in order to support the analysis of such new data. Therefore, the combination of a fast indexing machine with a web-based collaboration platform makes this system highly flexible, evolvable, scalable and easy to use for research collaborations.
We developed DIPSBC, a systems biology data integration platform that utilizes a large number of XML-based exchange formats and connects primary data with higher-level data. The combination of a fast indexing machine with an online content management platform makes this system highly flexible and easy to use for research collaborations. Furthermore, the incorporation of helper applications is a powerful feature of the system, which distinguishes it from a mere data repository. Since all parts of the platform are open source, it can easily be modified and adjusted for specific functions.
Availability and requirements
Project name: DIPSBC
Project home page: http://dipsbc.molgen.mpg.de
Operating system(s): Platform independent
Programming language(s): Perl, Java
Other requirements: Perl 5.8 or higher, Java 1.5 or higher, Foswiki web server, Solr/Lucene index
License: GNU GPL
Any restrictions to use by non-academics: None.
This work was supported by the German Research Foundation (grants HE 4607/2-1 and HE 4607/3-1), the BMBF under its NGFN-Plus (01GS08111), NGFN-transfer (01GR0809) and MEDSYS programs (0315428A), and by the Max Planck Society.
The authors would like to thank Wasco Wruck, Jevgeni Erehman, and André Rauschenbach for valuable comments and technical support. All authors read and approved the final manuscript.
- Smith , et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007, 25: 1251–1255. 10.1038/nbt1346PubMed CentralView ArticlePubMed
- Goble C, Stevens R: State of the nation in data integration for bioinformatics. J Biomed Inform. 2008, 41: 687–693. 10.1016/j.jbi.2008.01.008View ArticlePubMed
- Lee , et al.: Incorporating collaboratory concepts into informatics in support of translational interdisciplinary biomedical research. Int J Med Inform. 2009, 78: 10–21. 10.1016/j.ijmedinf.2008.06.011PubMed CentralView ArticlePubMed
- Cochrane , et al.: Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Res. 2009, 37: D19–25. 10.1093/nar/gkn765PubMed CentralView ArticlePubMed
- Brazma , et al.: Standards for systems biology. Nat Rev Genet. 2006, 7: 593–605.View ArticlePubMed
- Taylor , et al.: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol. 2008, 26: 889–896. 10.1038/nbt.1411PubMed CentralView ArticlePubMed
- Brazma A: Minimum Information About a Microarray Experiment (MIAME)–sucesses, failures, challenges. ScientificWorldJournal 2009, 9: 420–423.View ArticlePubMed
- Taylor , et al.: The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 2007, 25: 887–893. 10.1038/nbt1329View ArticlePubMed
- Orchard , et al.: The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol. 2007, 25: 894–898. 10.1038/nbt1324View ArticlePubMed
- Deutsch , et al.: Minimum information specification for in situ hybridization and Immunohistochemistry experiments (MISFISHIE). Nat. Biotechnol. 2008, 26: 305–312. 10.1038/nbt1391PubMed CentralView ArticlePubMed
- Lefever , et al.: RDML: structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Res. 2009, 37: 2065–2069. 10.1093/nar/gkp056PubMed CentralView ArticlePubMed
- Field , et al.: The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 2008, 26: 541–547. 10.1038/nbt1360PubMed CentralView ArticlePubMed
- Le Novère , et al.: Minimum information requested in the annotation of biochemical models (MIRIAM). Nat. Biotechnol. 2005, 23: 1509–1515. 10.1038/nbt1156View ArticlePubMed
- Le Novère , et al.: The Systems Biology Graphical Notation. Nat Biotechnol. 2009, 27(8):735–741. 10.1038/nbt.1558View ArticlePubMed
- Whirl-Carrillo , et al.: An XML-based interchange format for genotype-phenotype data. Hum Mutat. 2008, 29: 212–219. 10.1002/humu.20662View ArticlePubMed
- Ball CA, Brazma A: MGED standards: work in progress. OMICS 2006, 10: 138–144. 10.1089/omi.2006.10.138View ArticlePubMed
- Jones , et al.: The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol. 2007, 25: 1127–1133. 10.1038/nbt1347View ArticlePubMed
- Sansone , et al.: The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?". OMICS 2008, 12: 143–149. 10.1089/omi.2008.0019View ArticlePubMed
- Orchard , et al.: Proteomic data exchange and storage: the need for common standards and public repositories. Methods Mol Biol. 2007, 367: 261–270.PubMed
- Spellman , et al.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 2002, 3: 1–0046.9. research0046 research0046View Article
- Kerrien , et al.: Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol. 2007, 5: 44. 10.1186/1741-7007-5-44PubMed CentralView ArticlePubMed
- Engels R, Yu T, Burge C, Mesirov JP, DeCaprio D, Galagan JE: Combo: a whole genome comparative browser. Bioinformatics 2006, 22(14):1782–1783. 10.1093/bioinformatics/btl193View ArticlePubMed
- Gry J, et al.: Correlations between RNA and protein expression profiles in 23 human cell lines. BMC Genomics 2009, 10: 365. 10.1186/1471-2164-10-365PubMed CentralView ArticlePubMed
- Seliger , et al.: Combined analysis of transcriptome and proteome data as a tool for the identification of candidate biomarkers in renal cell carcinoma. Proteomics 2009, 9(6):1567–1581. 10.1002/pmic.200700288PubMed CentralView ArticlePubMed
- Rogers , et al.: Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics 2008, 24(24):2894–2900. 10.1093/bioinformatics/btn553PubMed CentralView ArticlePubMed
- Yu , et al.: Stable isotope dilution multidimensional liquid chromatography-tandem mass spectrometry for pancreatic cancer serum biomarker discovery. J Proteome Res. 2009, 8(3):1565–1576. 10.1021/pr800904zPubMed CentralView ArticlePubMed
- Varambally , et al.: Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. Cancer Cell 2005, 8(5):393–406. 10.1016/j.ccr.2005.10.001View ArticlePubMed
- Richiardi , et al.: Promoter methylation in APC, RUNX3, and GSTP1 and mortality in prostate cancer patients. J Clin Oncol. 2009, 27(19):3161–3168. 10.1200/JCO.2008.18.2485View ArticlePubMed
- Sawhney , et al.: A novel role of ERK5 in integrin-mediated cell adhesion and motility in cancer cells via Fak signaling. J Cell Physiol. 2009, 219(1):152–161. 10.1002/jcp.21662View ArticlePubMed
- Drake , et al.: ZEB1 enhances transendothelial migration and represses the epithelial phenotype of prostate cancer cells. Mol Biol Cell. 2009, 20(8):2207–2217. 10.1091/mbc.E08-10-1076PubMed CentralView ArticlePubMed
- Perkins , et al.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20: 3551–3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2View ArticlePubMed
- Dreher F, Kamburov A, Herwig R: Construction of a pig physical interactome using sequence homology and a comprehensive reference human interactome. Evol Bioinform Online , 8: 119–126.
- Liu G, et al.: A genome scan reveals QTL for growth, fatness, leanness and meat quality in a Duroc-Pietrain resource population. Anim Genet 2007, 38: 241–252. 10.1111/j.1365-2052.2007.01592.xView ArticlePubMed
- Comparative map at PigQTLdb. [http://www.animalgenome.org/cgi-bin/QTLdb/SS/link_rh2hs?chromos=5]
- Tsai S, et al.: Annotation of the Affymetrix porcine genome microarray. Anim Genet. 2006, 37(4):423–424. 10.1111/j.1365-2052.2006.01460.xView ArticlePubMed
- Kamburov A, Wierling C, Lehrach H, Herwig R: ConsensusPathDB–a database for integrating human functional interaction networks. Nucleic Acids Res. 2009, 37: D623–628. 10.1093/nar/gkn698PubMed CentralView ArticlePubMed
- Rual , et al.: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–1178. 10.1038/nature04209View ArticlePubMed
- Stelzl , et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005, 122(6):957–968. 10.1016/j.cell.2005.08.029View ArticlePubMed
- Kerrien , et al.: IntAct - Open Source Resource for Molecular Interaction Data. Nucleic Acids Res 2006, 35(Database issue):D561–565.PubMed CentralPubMed
- Ryo , et al.: Stable suppression of tumorigenicity by Pin1-targeted RNA interference in prostate cancer. Clin Cancer Res. 2005, 11(20):7523–7531. 10.1158/1078-0432.CCR-05-0457View ArticlePubMed
- Li , et al.: BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models. BMC Systems Biology 2010, 4: 92. 10.1186/1752-0509-4-92PubMed CentralView ArticlePubMed
- Kim , et al.: A hidden oncogenic positive feedback loop caused by crosstalk between Wnt and ERK pathways. Oncogene 2007, 26(31):4571–4579. 10.1038/sj.onc.1210230View ArticlePubMed
- Rocca-Serra P, et al.: ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics 2010, 26: 2354–2356. 10.1093/bioinformatics/btq415PubMed CentralView ArticlePubMed
- Zhang J, et al.: BioMart: a data federation framework for large collaborative projects. Database (Oxford) 2011, 2011: bar038.
- Zhang J, et al.: International Cancer Genome Consortium Data Portal - a one-stop shop for cancer genomics data. Database (Oxford) 2011, 2011: bar026.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.