Skip to main content

Gauging triple stores with actual biological data



Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations.


Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one.


Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.


Semantic Web (SW) technologies [1, 2] are increasingly being adopted by the scientific community, and Life Sciences researchers are no exception [3, 4]. Our own contribution to this emerging field has been the development of two semantically integrated knowledge bases - the Cell Cycle Ontology (CCO) and BioGateway [5, 6]. SW technologies open a new dimension to data integration, one of the main current challenges in biological knowledge management [4, 7, 8]. These technologies provide a sound framework to share and combine information via standard data exchange formats. This enables the categorization of information (using ontologies [9]) and fosters the scalability of the integrated system (adaptability to data growth). Moreover, SW technologies provide a sophisticated means to interrogate the originally integrated facts as well as the "hidden" bits of information (via automated reasoning tools [10]). An increasing number of principal biological data providers, such as UniProt [11], have started to make their data available in the SW formats, first of all in the RDF language [12]. In RDF the information is represented in the form of triples subject-predicate-object, e.g. "BMC Bioinformatics -has_type- journal". Unlike the current Web, which utilizes just a single semantically undefined predicate (the hyper-link), SW makes use of multiple predicate types with clearly defined semantics.

Data in RDF format is typically stored in dedicated database management systems called triple stores (also known as RDF stores) which expose so-called SPARQL endpoints. Those endpoints allow querying of the store through SPARQL [13], a standard query language recommended by W3C. The triples present in disparate stores form a unified cloud of triples which is accessible for querying from any individual endpoint. A paradigmatic example of such cloud of triples is constituted by the project Linked Open Data [14].

Triple stores

Currently, there are a number of solutions [15] to store information as RDF triples and they are increasingly becoming adopted by the biological community for the purpose of overcoming some of the limitations (see above) of classical storage solutions (mainly based on relational database management systems).

The development of triple stores has flourished during the last 5 years. Currently, there are more than 20 systems available [16]. Both the academic and private sectors have been involved in developing these triple stores. This race has created a healthy competition leading to rapid progress in key aspects of database management system performance -- querying and loading efficiency, scalability, and stability. The SW community has contributed to this progress by promoting open contests and providing proof-cases for SW applications [17]. It is encouraging for the scientific community that many of these triple stores are freely available for academic use. Obviously, the performance of triple stores is an important issue, especially the response time experienced with SPARQL. The time that a user needs to wait before an answer is returned is a simple but crucial metric that determines the acceptance by the prospective users of the knowledge base in question and the SW technology as a whole. As the number of available triple store implementations is steadily growing, it becomes increasingly difficult to decide which one to use. This necessitates a systematic comparison of the available triple store implementations with respect to their performance (known as benchmarking).

Benchmarking efforts

Much of the benchmarking done previously on triple stores was based on computationally generated sets of triples that could at best only mimic an actual domain specific knowledge base. Among the "standard" sets used are: the Lehigh University Benchmark (LUBM [18]) and the Berlin SPARQL Benchmark (BSBM [19]), which respectively emulate an organization and e-commerce knowledge bases. In the life sciences domain, studies performed by UniProt [11], demonstrated the current limitations of some triple stores [20]. In this paper we present the 'NTNU SPARQL benchmark', which is based on a Life Science use case, and we report the outcome of the benchmarking of five popular triple store implementations. We tested two additional stores (SwiftOWLIM and 4Store) not included in the previous benchmark experiments [21] and instead of (artificial) computationally generated data, we used biologically relevant real life data from our knowledge base CCO [5], as well as queries which evolved from the direct interaction with the user community.

Our benchmark fills the need for an empirical testing of the performance of triple stores with respect to queries and data whose characteristics are representative of Life Sciences information.


The NTNU dataset

The dataset used in our analysis consists of ten RDF graphs constituting the Cell Cycle Ontology (CCO) [5] (a total of 11.315.866 triples). There are four taxon-specific graphs (for H. sapiens, A. thaliana, S. cerevisiae, S. pombe) and an integrated graph. All of them share a core set of triples (ontologies) and a set of taxon-specific triples. The integrated graph contains additional orthology relations for proteins. Each of the five graphs has a counterpart graph augmented with pre-computed transitive and reflexive closures.

More details on the characteristics of this dataset can be found in Table 1.

Table 1 Characteristics of the NTNU dataset.

The set of queries

In our analysis we used 24 queries which were defined to answer real life questions identified in close collaboration with the end users of CCO (Additional file 1). Our query set encompasses a broad range of SPARQL features and combinations thereof, as summarised in Table 2. As shown in this table, queries used in our analysis ensure a comprehensive assessment of the performance of the triple stores with respect to a real Life Science use case.

Table 2 Overview of the query features.

Overall performance

A summary of the results of our analysis on the five selected triple stores is reported in Table 3, which summarize loading times and query answering.

Table 3 Overall performance.

First of all, we compared the loading performance of the five selected stores. Loading performance is of paramount importance in the Life Sciences domain, as primary data sources are frequently updated and the content of triple stores needs to be maintained up-to-date. In this respect 4Store shows an exceptionally good performance, followed by SwiftOWLIM.

In order to get a bird's-eye view on the querying performance of the selected stores, we aggregated the averaged response times (ART, see Methods) into a single cumulative total response time and estimated the average relative standard errors for each of the stores. (Note that the version of SwiftOWLIM used in our experiment did not support the COUNT operator, therefore the values for this store do not include data for queries Q17, Q19, Q20 and Q21). The total query execution time varied in a very broad range over the triple stores (Table 3).

Load times have been found consistent across test-runs, whereas query answering times have presented some significant variation (most notably 4Store and SwiftOWLIM), even though individual test runs were performed in equivalent conditions (see Methods). Variability could be explained by the different environment of the operating system during different runs.

On the basis of these data, Virtuoso emerges as an overall winner, however, the picture changes significantly when we look into the query-specific behaviour.

Query-specific performance

Table 4 makes clear that all the stores behave in a query-specific manner. A highly query-specific behaviour has also been observed by Bizer and Schultz [22]. Nevertheless, a couple of common trends are discernible. SwiftOWLIM is by far the best performer with the relatively short ART queries; 4Store shows the best performance with the moderate ART queries; and Virtuoso is doing best of all with the long ART queries. Jena SDB is consistently the slowest store with all the short and moderate ART queries. Virtuoso is the only store whose ARTs are better than average for all the 24 queries.

Table 4 Query-specific performance.

The three slowest queries, Q14, Q3 and Q18, display extremely idiosyncratic behaviour and defy any generalization. Two of them affect specifically the performance of SwiftOWLIM - Q14 (mildly) and Q3, the only two cases where SwiftOWLIM shows the slowest response of all the stores. The impact of Q18 is more severe and it affects the performance of Jena TDB, SwiftOWLIM and 4Store in the order of magnitude. In the case of 4Store the ART for Q18 is 9141 (sic !) times longer than the ART of this store averaged over all the other 23 queries. Both Q14 and Q18 include the ORDER BY modifier, not used by any other query, which might suggest that the implementation of this feature is suboptimal in the affected stores. However, this proposition is in conflict with the response of 4Store to these two queries (17859 fold difference in the ARTs). On the other hand, Q3 does not make use of any unique features; instead it includes a rather wide range of features found as well in other queries. At present it is not possible to determine whether any particular combination of these features is responsible for the long execution time.


Finally, we wanted to see how well the ARTs scale up with respect to the graph sizes. The ARTs were summed over all the queries (except for the queries Q17, Q19, Q20. Q21 for SwiftOWLIM) and plotted against the total number of triples (Figure 1). The figure shows that SwiftOWLIM scales up exceptionally well, followed by Virtuoso. In contrast, 4Store demonstrated the poorest performance with respect to scalability. However, as pointed out earlier, the behaviour of SwiftOWLIM and 4Store is strongly affected by a few outliers. Therefore, to eliminate the impact of the outliers we excluded the three slowest queries Q3, Q14 and Q18 from the plot (Figure 2). Although the mutual arrangement of the individual graphs on the plot changed in favour of SwiftOWLIM and 4Store, the conclusion about the scalability drawn above did not change. It should be noted that the fact that queries were developed using Virtuoso should not have any impact on the scalability.

Figure 1

The scalability of the five stores. The ARTs were summed over all the 24 queries (except for the queries Q17, Q19, Q20, Q21 for SwiftOWLIM) and plotted against the total number of triples in the graphs (Y axis is logarithmic).

Figure 2

The scalability of the five stores in the absence of outliers. Cumulative ARTs are shown, plotted against the total number of triples in the graphs. This figure is similar to Fig. 1 but omitting the queries Q3, Q14 and Q18.


Our benchmark is intended to represent a realistic usage of triple stores in the Life Sciences context. It is designed to measure loading and query answering times, which are key parameters in the choice of a triple store. However, there are other considerations to be made when selecting a triple store for a specific use case.

One of these considerations is the architecture for which a triple store is designed, and in particular whether this can be deployed on a cluster-based architecture. Some of the systems we have benchmarked (4Store, Virtuoso) can be deployed on a cluster making an efficient use of this affordable parallel architecture, which promises better scalability for very large datasets. As a note, SwiftOWLIM, which we have tested, is designed for optimal performance for datasets below 100M triples, whereas BigOWLIM is designed for large datasets and offers improved performance on a cluster-based architecture.

Triple stores vary also in the programmatic access they offer and in the features they support. As an example, the big difference in performance between TDB and SDB (which are both from the same software suite) are compensated by the different functionalities they offer. SDB performs less well in queries, but provides transactional support and concurrent access, which are not offered by TDB.

It should also be noted that the query performance of triple stores is dependent on the efficiency of their query optimization engines. The relevance of such optimization depends on the usage conditions of triple stores. Where a few queries are routinely executed, optimization can be performed by users at design time and the lack of an optimizer is not penalizing. TDB is a peculiar case in that it offers an optimizer which is statistics based, and which needs to be updated after a change in the data content.

Query performance may also depend on the characteristics of datasets in ways which can be difficult to detect. For example, we have observed that 4Store loading times scale well with respect to the total number of triples, while the number of different properties in the dataset could be a limiting factor.

It should be noted that as our queries have evolved from real use cases on the CCO ontology, which has been mostly accessed via a Virtuoso-based SPARQL endpoint. It is not possible to rule out that queries displaying idiosyncratic behaviour towards Virtuoso were naturally avoided or optimized. For this reason, we have provided figures which don't take such outliers in consideration. At the same time, it should be noted that neither SDB nor TDB show any particular outlier behaviour even though our queries were not optimised for these stores. We hope that the identification of such idiosyncratic behaviour will anyway be useful for the respective development communities.

Finally, we did not include in our testing any element directly related to reasoning performance, which is a relevant issue in the Life Sciences, where simple chains of transitive properties are common in biomedical terminologies. Testing reasoning features would have introduced an additional layer of complexity in the interpretation of results and we have preferred to focus on testing basic query-answering performance. For this reason, we have simulated transitive properties via a materialization of inferred triples (see section Methods).


We have compared the performance of a set of biologically relevant queries in five popular triple stores, OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We have used a dataset and a set of queries which are representative of a real life sciences application case. We have focused in our analysis on loading time, query time and its reproducibility, under "common" deployment and usage conditions.

In general, the performance proved to be quite query-specific. Nevertheless, it was possible to identify three groups of queries displaying similar behaviour with respect to the different stores:

queries with relatively short ART,

queries with moderate ART,

queries with relatively long ART.

SwiftOWLIM proved to be a winner in the first group, 4store in the second one and Virtuoso in the third one. Virtuoso emerged from our analysis as a very balanced performer in our application case - its upload time and response time for all the 24 queries were better than average among the tested stores and it showed a very good scalability. Even though in our study we used only moderately large triple stores (~11 M triples), others demonstrated that Virtuoso excels when confronted with much larger stores, up to 100-200 M triples [19, 22]. From our experience we conclude that Virtuoso is well suited for managing large volumes of biological data as is the case of our BioGateway project where it gracefully supports querying over ~1.8 billion triples [23].

Note added in proof

After the completion of this work Berlin Benchmark v.3 has been released. A line-up comparable to the one in our work (Virtuoso, Jena TDB, 4store, BigData, and BigOWLIM) was used for benchmarking and the conclusions reached are similar to ours [24].



The set of triple store implementations included Virtuoso OpenSource 6.0.0, SwiftOWLIM 2.9.1, 4Store 1.0.2, Jena SDB 1.3.1, Jena TDB 0.8.2. The stores were run under CentOS 5.5 operating system.


Default configurations were used except the parameters listed below.


FileExtend = 100000

MaxCheckpointRemap = 1048576

ServerThreads = 100

CheckpointInterval = 600


NumberOfBuffers = 500000

MaxDirtyBuffers = 18000

MaxStaticCursorRows = 50000

TransactionAfterImageLimit = 50000000

FreeTextBatchSize = 10000000


Segment1 = 1536M, db-seg1-1.db, db-seg1-3.db, db-seg1-2.db


Segment32 = 1536M, db-seg32-1.db, db-seg32-3.db, db-seg32-2.db

MaxQueryCostEstimationTime = 84400

MaxQueryExecutionTime = 7200

Swift Owlim

$JAVA_HOME/bin/java -Xmx4096m -cp "$CP_TESTS:$EXT_LIBS" 10

query.language: sparql

preload.format: rdfxmls


owlim:entity-index-size "200000"

owlim:jobsize "200"

Jena SDB

sdb:layout         "layout2/index"


The analysis was performed on a Dell R900 machine with 24 Intel(R) Xeon(R) CPUs (2.66GHz). The machine was equipped with 132G main memory and 14x500GB 15K SAS hard drives.

Testing procedure


for each triplestore {

         repeat 3 times {

                  delete db files

                  switch on triplestore

                  load data

                  run all queries (in exactly the same order)

                  switch off triplestore



The ten RDF graphs constituting the CCO [5] were used for the analysis (11.315.866 triples in total, Table 1 for more details).

The graphs were queried with the 24 SPARQL queries from the library of queries on the CCO web site [25], (see Additional File 2). The queries were executed on each of the graphs consecutively from query Q1 through Q24. The experiments were replicated three times. Prior to each experiment the contents of the stores were completely cleared and uploaded anew.

Data analysis

The average response times and the corresponding standard errors for these three observations were computed for all the data points (24 queries and 10 graphs, available at [23]) and used to aggregate the data for Tables 3 and 4 and Figures 1 and 2. The relative standard errors (RSE) were produced by dividing standard errors by the corresponding ART. Also, the total data load times (10 graphs) were averaged over the three replicates and presented in Table 3.

Data availability

The RDF files used for uploading the triple stores are downloadable from the CCO web site [26], the queries used in our analysis are avaiable as Additional File 2.



averaged response time


Cell Cycle Ontology


Resource Description Framework


Relative Standard Error


Semantic Web


World Wide Web Consortium


Norwegian University for Science and Technology


the Lehigh University Benchmark


the Berlin SPARQL Benchmark


SPARQL Protocol and Query Language.


  1. 1.

    Berners-Lee T, Hendler J, Lassila O: The Semantic Web - a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am 2001, 284: 34.

    Article  Google Scholar 

  2. 2.

    Shadbolt N, Hall W, Berners-Lee T: The Semantic Web revisited. Ieee Intell Syst 2006, 21: 96–101.

    Article  Google Scholar 

  3. 3.

    Jenssen TK, Hovig E: The semantic web and biology. Drug Discov Today 2002, 7: 992–992.

    Article  PubMed  Google Scholar 

  4. 4.

    Antezana E, Kuiper M, Mironov V: Biological knowledge management: the emerging role of the Semantic Web technologies. Brief Bioinform 2009, 10: 392–407. 10.1093/bib/bbp024

    Article  PubMed  Google Scholar 

  5. 5.

    Antezana E, Egana M, Blonde W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M: The cell cycle ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol 2009, 10: R58. 10.1186/gb-2009-10-5-r58

    PubMed Central  Article  PubMed  Google Scholar 

  6. 6.

    Antezana E, Blonde W, Egana M, Rutherford A, Stevens R, De Baets B, Mironov V, Kuiper M: BioGateway: a semantic systems biology tool for the life sciences. BMC Bioinformatics 2009, 10(Suppl 10):S11. 10.1186/1471-2105-10-S10-S11

    PubMed Central  Article  PubMed  Google Scholar 

  7. 7.

    Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR, Thorne D: Calling International Rescue: knowledge lost in literature and data landslide! Biochem J 2009, 424: 317–333. 10.1042/BJ20091474

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  8. 8.

    Hodgson J: The headache of knowledge management. Nat Biotechnol 2001, 19(Suppl):BE44–46.

    CAS  PubMed  Google Scholar 

  9. 9.

    Chandrasekaran B, Josephson JR, Benjamins VR: What are ontologies, and why do we need them? Ieee Intell Syst App 1999, 14: 20–26.

    Article  Google Scholar 

  10. 10.

    Automated reasoning[]

  11. 11.

    UniProt Consortium: Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res 2011, 39: D214–219.

    Article  Google Scholar 

  12. 12.

    Resource Description Framework (RDF)[]

  13. 13.

    SPARQL Query Language for RDF[]

  14. 14.

    Linked data - connect distributed data across the Web[]

  15. 15.


  16. 16.

    List of Triplestore Implementations[]

  17. 17.

    The Semantic Web Challenge[]

  18. 18.

    Guo YB, Pan ZX, Heflin J: LUBM: a benchmark for OWL knowledge base systems. J Web Semant 2005, 3: 158–182. 10.1016/j.websem.2005.06.005

    Article  Google Scholar 

  19. 19.

    Bizer C, Schultz A: The Berlin SPARQL Benchmark. Int J Semant Web Inf 2009, 5: 1–24.

    Google Scholar 

  20. 20.

    Weekend Triple Billionaire - maintaining a large RDF data set in the life sciences[]

  21. 21.

    RDF Store Benchmarking[]

  22. 22.

    BSBM Results for Virtuoso, Jena TDB, BigOWLIM (November 2009)[]

  23. 23.

    Semantic Systems Biology - BioGateway - Querying[]

  24. 24.

    BSBM V3 Results (February 2011)[]

  25. 25.

    Semantic System Biology - CCO - Querying - SPARQL[]

  26. 26.

    Semantic Systems Biology - CCO - Download - Ontologies[]

Download references


The authors appreciate the indispensable technical assistance of Einar Jensen and Bjørn Lindi (High Performance Computing Group, NTNU). VM was supported by FUGE Midt Norge.

This article has been published as part of BMC Bioinformatics Volume 13 Supplement 1, 2012: Semantic Web Applications and Tools for Life Sciences (SWAT4LS) 2010. The full contents of the supplement are available online at

Author information



Corresponding author

Correspondence to Martin Kuiper.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

VM conceived and designed the experiment, oversaw its execution, analysed and interpreted the results and drafted the manuscript; NS carried out the experiment; WB developed the queries and contributed to the drafting of the manuscript; EA and AS contributed to the analysis and interpretation of the test results and to the drafting of the manuscript; MK was responsible for the project, oversaw its execution and oversaw and contributed to the drafting of the manuscript. All authors approved the final version of the paper.

Electronic supplementary material

benchmarking results

Additional file 1: . This file contains the raw and derived data values. There is one sheet with the upload data and one separate sheet for each of the stores with the query response times. Additionally there is a sheet with the aggregated data which were used to generate the tables and figures in the manuscript. (XLS 364 KB)

SPARQL queries

Additional file 2: . This file contains the SPARQL code of the 24 queries used for testing the stores. (TXT 17 KB)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mironov, V., Seethappan, N., Blondé, W. et al. Gauging triple stores with actual biological data. BMC Bioinformatics 13, S3 (2012).

Download citation


  • Resource Description Framework
  • Response Time Query
  • Triple Store
  • Resource Description Framework Graph
  • SPARQL Endpoint