SPARQL Assist language-neutral query composer

Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.


Introduction
The health care and life science sectors have been some of the most enthusiastic adopters of semantic web technologies.The benefits of the RDF/OWL data model are well-understood by bioinformaticians who have too long had to deal with the problem of integrating data from multiple sources with wildly different underlying schema.These benefits are less obvious, however, to clinicians and researchers who merely see one mysterious query language (SQL) exchanged for another (SPARQL).Even a semantic web-savvy informatician can be daunted when faced with the challenge of querying an unfamiliar data source whose particular RDF vocabulary is initially unknown.
The issue is compounded by the growing use of opaque, semantic-free URIs for ontological classes and properties (OBO [1], SIO [2], CWA [3]).Where the meaning of rdf:type or dc:title in a SPARQL query is relatively clear, the meaning of, for example, sio:SIO_000253 is considerably harder to glean without looking up its ontological definition.Nevertheless, there are many valid reasons for designing ontologies this way, not the least of which is language neutrality.
RDF/XML provides built-in language neutrality by way of the xml:lang attribute; an ontology can easily be internationalized by providing multiple rdfs:label or rdfs:comment properties with appropriate xml:lang attributes.However, even those projects who have, in principle, adopted language neutrality for their classes (e.g.OBO), have not done so for their properties (OBO Relationship Ontology [4]).This is no-doubt due, at least in part, to the difficulty of composing SPARQL queries in which predicates have opaque identifiers.Nevertheless, it is crucial that we do not allow convenience to direct the development of a core global resource -the Semantic Web -and thus the problem should be solved at the level of the tools provided, rather than the resources themselves.

SPARQL Assist
In this demonstration, we present SPARQL Assist: a web application that facilitates the construction of SPARQL queries by providing context-sensitive type-ahead completion.In addition to assistance with basic syntax, ontological terms are indexed by their labels, allowing a query to be composed in a user's preferred language, assuming appropriate labels are present in the ontology.Terms are read on-the-fly from any ontology specified in a FROM clause, but SPARQL Assist can also be configured to pre-load terms from particular ontologies or SPARQL endpoints.
The entire query, as it is being constructed, is used to provide context for the typeahead suggestions.Previously declared variables or known individuals are suggested in the subject or object position of a clause and known properties are suggested in the predicate position.If a clause specifies an individual, properties that individual is known to have are displayed preferentially.Similarly, if a clause contains a variable that can ultimately be connected to a known individual in another part of the query, that connection is used to find the most likely properties in the current clause.
Terms are cached on the client side to speed up repeated look-ups, but most of the processing is done on the server side in Java to take advantage of the mature OWL toolkit on that platform.In the future, as much computation as possible will be transferred to the client side to improve both performance and flexibility of deployment.
For this demonstration, SPARQL Assist has been implemented in the context of creating queries that will be resolved by the Semantic Health and Research Environment (SHARE [5]).

SHARE
SHARE is an advanced SPARQL query client built on top of the SADI Framework [6] for Semantic Web Services.In SADI, services attach properties to input OWL instances and are indexed in a central registry based on the properties they attach.SHARE maps the triple patterns of a SPARQL query onto these indexed properties, allowing a user to query the entire virtual graph of registered SADI services.The RDF data required to answer a given query is thus dynamically generated in response to that query.
In the context of this demonstration, this infrastructure makes SPARQL query composition even more difficult, since there is no pre-existing database to inspect for candidate properties and individuals.The specialized SPARQL Assist provider for SHARE, therefore, uses the SADI registry, in addition to any loaded ontologies, to suggest properties to be used in a query.As in the generic case, if a clause contains a named individual or a variable previously connected to an individual, that information is used to further refine the suggestions; in this case by filtering services (and the resulting suggested properties) that cannot accept a particular individual.

Conclusion
SPARQL Assist provides prototype solutions for two important problems.First, to hasten the uptake of Semantic Web technologies, it is important to improve access to, and usability of, Semantic Web resources for the lay-end-user while still maintaining best-practices in the way these resources are modeled.Opaque identifiers for both classes and properties are important, as they allow us to avoid "churn" as an ontology evolves over time.We must therefore support the end-user in constructing queries over resources formatted in this way.Second, the Semantic Web is intended to be a global resource, of use to all.As such, a respect for internationalization is also critical, even at these early stages in Semantic Web evolution.We believe that SPARQL Assist provides motivation to more widely adopt what are clearly bestpractices in Semantic Web data provision.