Volume 10 Supplement 10
Bio-jETI: a framework for semantics-based service composition
© Lamprecht et al; licensee BioMed Central Ltd. 2009
Published: 01 October 2009
The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributed world of bioinformatics services. Without adequate management and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions.
In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI's model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developer may either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations.
We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web.
Research projects in modern molecular biology rely on increasingly complex combinations of computational methods to handle the data that is produced in the life science laboratories. A variety of bioinformatics databases, algorithms and tools is available for specific analysis tasks. Their combination to solve a specific biological question defines more or less complex analysis workflows or processes. Software systems that facilitate their systematic development and automation [1–7] have found a great popularity in the community.
More than in other domains the heterogeneous services world in bioinformatics demands for a methodology to classify and relate resources in a both human and machine accessible manner. The Semantic Web [8, 9], which is meant to address exactly this challenge, is currently one of the most ambitious projects in computer science. Collective efforts have already lead to a basis of standards for semantic service descriptions and meta-information.
Most importantly, the World Wide Web Consortium (W3C) set up a number of working groups addressing different technological aspects of the Semantic Web vision. Among their outcomes are the Semantic Annotations for WSDL (SAWSDL) recommendation , the Resource Description Framework (RDF) specification , and the Web Ontology Language (OWL) . While SAWSDL is designed to equip single entities with predicates, RDF and the more powerful OWL formally define relationships between the resources of a domain.
Without a reasonably large set of semantically annotated (web) services, it is, however, difficult to evaluate the Semantic Web technologies with significant results and develop practical software for the client side. On the other hand, providers are not willing to put effort in annotating their services as long as they can not be confident which technologies will finally become established. Community initiatives like the Semantic Web Services (SWS) Challenge  or the Semantic Service Selection Contest (S3C)  address this problem. They provide collections of services, domain information and concrete scenarios that the different participants, being developers of methodologies for different Semantic Web aspects, have to deal with. In the scope of the S3 Contest, OPOSSum [15, 16], an "online portal to collect and share SWS descriptions" , was set up. It aims at collecting, sharing, editing, and comparing SWS descriptions within a community infrastructure in order to collaboratively evaluate and improve SWS formalisms. As of March 2009, however, OPOSSum does not list any bioinformatics services.
An example of a knowledge base particularly capturing bioinformatics data types and services are the constantly evolving namespace, object and service ontologies of the BioMoby service registry [17, 18]. BioMoby's aim is to "achieve a shared syntax, shared semantic, and discovery infrastructure suitable for bioinformat-ics"  as a part of the Semantic Web. Originating from the early 2000s, the 1.0 MOBY-S(ervices) spec-ifications, however, do not adhere to the Semantic Web standards that have been developed in the last years. Consequently, the S(emantic)-MOBY branch of the project came into being to migrate to common technologies. It has recently been merged into the SSWAP (Simple Semantic Web Architecture and Protocol) [20, 21] project, which aims at providing life science knowledge using standard RDF/OWL technology. SSWAP provides a number of own ontologies, but also incorporates third-party domain knowledge like the MOBY-S object and service ontologies.
Generally, the development of ontologies in the bioinformatics community is already very promising. Projects like the Gene Ontology (GO)  and the Open Biomedical Ontologies (OBO)  have already become widely used and also, for instance, incorporated by the SSWAP project. The majority of publicly available ontologies in the bioinformatics domain is, however, designed for the classification of scientific terms and the description of actual data sets, and not for (technical) descriptions of service interfaces and data types.
The lack of properly semantically annotated services has evidently already been recognized by the community, as different projects are commencing to address the issue. For instance, major service providers like the European Bioinformatics Institute (EBI) plan to extend their service infrastructure to provide meta-information conforming to Semantic Web standards. Other initiatives aim at setting up stand-alone collections of service URIs and corresponding annotations, without influencing the service infrastructures as such.
While the provision of semantically annotated services is mainly the service providers' task, on the client side software is needed that fully utilizes the available semantic information in order to provide helpful tools to the in silico researcher. The challenge for user-side software is to abstract from the underlying Semantic Web technology again and provide the achievements in an intuitive fashion.
More advanced examples of utilizing semantic information about services are, for instance, available in the scope of the SWS Challenge . Among others, projects like SWE-ET (Semantic Web Engineering Environment and Tools)  and WSMX  participate in the challenge, adressing both discovery and mediation scenarios for Semantic Web Services. However, these solutions demand quite some technical understanding from the user, which hampers the uptake by a larger biological user community.
As an example from the bioinformatics domain, the BioMoby project provides a simple composition functionality for its services. [17, 18]. With the MOBY-S Web Service Browser  it is, e.g., possible to search for an appropriate next service, while in addition the sequence of actually executed tools is recorded and stored as a Taverna  workflow. A substantial drawback of this approach is, however, its restriction to the services that are registered in the respective platform.
In this paper, we present our approach to semantics-based service composition in the Bio-jETI platform [7, 28]. By integration of automatic service composition functionality into an intuitive, graphical process management framework, we are able to maintain the usability of the latter for semantically aware workflow development. Furthermore, we can integrate services and domain knowledge from any kind of heterogeneous resource at any location, and are not restricted to any semantically annotated services of a particular platform.
This manuscript is structured as follows: In the next section, Results and Discussion, we discuss two examples that we developed in Bio-jETI with the help of a semantics-aware workflow synthesis method and model checking: a simple phylogenetic analysis workflow and a more sophisticicated, highly customized phylogenetic analysis process based on Blast and ClustalW. Subsequently, the Conclusion deals with directives for the future development of our approach. Finally, the Methods section describes the applied techniques in greater detail.
Results and discussion
Workflow development in Bio-jETI is already supported by several plugins of the jABC framework, for instance providing functionality for component validation or step-wise execution of the process model for debugging purposes. Now we are going to exploit further jABC technology, such as model checking and workflow synthesis, in order to enable Bio-jETI to support the development of processes in terms of service semantics.
Model checking [38, 39] can be used for reasoning about properties of process models. This can help to detect problems like undefined data identifiers, missing computations, or type mismatches. Solving these problems might require the introduction of further computational steps, for instance a series of conversion services in case of a data type mismatch. The approach here is to automate the creation of such process parts via workflow synthesis methodology [40–43] that allows for the automatic creation of (linear) workflows according to high-level, logical specifications. Figure 3 (top) illustrates the relationship between our specification language SLTL (Semantic Linear Time Logic) and the actual Bio-jETI workflow models, the SLGs: Provided with a logical specification of the process and semantically annotated services, the workflow synthesis algorithm generates linear sequences of services, which can be further edited and combined into complex process models on the SLG level.
Exemplary set of services. Fragment of a component library that we used in the examples. The table lists the names of the building blocks (SIBs) along with function descriptions and selected service predicates.
Displays a phylogenetic tree .
type:visualization, location:local, contributor:forester.org
BLAST  against a DDBJ database.
type:analysis, location:ddbj, contributor:ddbj
Runs ClustalW .
type:analysis, location:ddbj, contributor:ddbj
EMBOSS  interface to ClustalW.
type:analysis, location:ebi, contributor:emboss
Extracts all parts of a string that match a regular expression.
type:stringprocessing, location:local, contributor:jabc
Fetches an entry in at file format from a DDBJ database .
type:dataretrieval, location:ddbj, contributor:ddbj
Fetches an entry in FASTA format from a DDBJ database .
type:dataretrieval, location:ddbj, contributor:ddbj
Concatenates all entries of a list.
type:stringprocessing, location:local, contributor:jabc
Tries to match a string against a regular expression pattern.
type:condition, location:local, contributor:jabc
Stores a user-supplied context expression or its value into the execution context.
type:definition, location:local, contributor:jabc
Provides an integer value.
type:definition, location:local, contributor:jabc
Realizes a counting loop.
type:loop, location:local, contributor:jabc
Replaces substrings of a string with another character sequence.
type:stringprocessing, location:local, contributor:jabc
Input dialog, provides a string.
type:definition, location:local, contributor:jabc
type:dataretrieval, location:ebi, contributor:ebi
In the jABC, the SIBs are displayed to the user in a taxonomic view, classified according to their position in the file system (by default) or to any other useful criterion, like the provider or the kind of service. The SIBs have user-level documentation, explaining what the underlying tool or algorithm does, that is derived directly from the provider's service descriptions. In addition, the SIBs provide information about their input and output types via a specific interface. This is already an integral part of the semantic information that helps to systematically survey large SIB libraries and it is used by our process synthesis and model checking methods. It is, in addition, possible to add arbitrary annotations to the SIB instances and by doing so providing further (semantic) information that is taken into account by our formal methodologies.
The knowledge base that is needed for the process synthesis consists, furthermore, of service and type taxonomies that classify the services and types, respectively. Taxonomies are simple ontologies that relate entities in terms of is-a and has-a relations. These classifications provide sufficient information for our synthesis methodologies.
Exemplary set of types. The set of data types that was used in the example processes.
Single accession number.
Iteratable (java.util.)list of accession numbers.
Concatenation of accession numbers, separated by some character.
Multiple sequence alignment.
Tool output of BLAST.
Tool output of ClustalW.
Counter, i.e. positive integer value.
DDBJ entry in flat file format.
Limit, i.e. positive integer value.
Single or multiple nucleic or amino acid sequences.
Example 1: a simple phylogenetic analysis workflow
An experienced bioinformatician might be aware of the problem immediately, due to his familiarity with the involved tools. This is, however, only a small workflow. An automatic, semantically supported detection of misconfigurations and modeling errors unfolds its full potential when processes become more complex, and it is not feasible for the in silico researcher to dive into the documentations of all services or to explore their behaviour by trial-and-error executions.
Once detected, there are different ways to fix the problem. One can look for replacements for one of the involved SIBs that essentially compute the same results, but provide them in a data format that fits in the surrounding process. Another approach, assuming that the user has chosen these services for good reason, is to search for a sequence of additional services that resolve the mismatch and insert them into the process. Such data mediation sub-workflows are usually linear. They can consist of type conversions that simply adapt the involved data, or also of real computational services when the match can not be realized so easily.
Extract the IDs of the hits from the BLAST result (using a regular expression).
Turn the matches into a comma-separated list.
Call DBFetch (fetching the corresponding sequences from a database).
Run emma (computing a multiple sequence alignment and phylogenetic tree).
Example 2: Blast-ClustalW workflow
A simple phylogenetic analysis like in the previous example is an often recurring element of complex in silico experiments. In many cases, however, a customized, more specific processing of intermediate results is required, like in the Blast-ClustalW workflow  that is one of the DDBJ's sample workflows for the Web API for bioinformatics . It is the archetype for our second example.
Call the Blast web service to search the DDBJ database for homologues of a nucleic acid sequence. The input is a 16S RNA sequence in FASTA format, the output lists the database IDs of the similar sequences and basic information about the local alignment, e.g. its range within the sequences.
Call the GetEntry web service with a database ID from the Blast output to retrieve the corresponding database entry.
Extract accession number, organism name and sequence from the database entry. Trim the sequence to the relevant region using the start and end positions of the local alignment that are available from the BLAST result.
Call the ClustalW web service to compute a global alignment and a phylogenetic tree for the prepared sequences.
Due to the loop that is required for repeating steps 2 and 3 a certain number of times, this process can not be created completely by our current synthesis algorithm, which is restricted to produce linear sequences of services. It is, however, possible to predefine a sparse process model in which the looping behaviour and other crucial parts are manually predefined, and to subsequently fill in linear parts of the process automatically.
At this state of the process, the local checking of the components detects no errors, but the model checker reveals problems (overlay icons top right): As in the previous example, the SIB Archaeopteryx uses a variable tree, which is not defined before. Moreover, the SIBs extract organism and extract sequence use a variable ddbjentry, which is defined with an incompatible type. Details on the model checking procedure can be found in the Methods section.
To resolve the first problem, we proceed similar as in example 1, by providing the synthesis algorithm with a temporal formula that asks for a sequence of services that takes a set of sequences as input (which is the last intermediate result that is computed previous to Archaeopteryx in the process) and produces a phylogenetic tree (the input that Archaeopteryx expects). As Figure 9 (center) shows, a single call to emma is one of the (shortest) sequences that fulfils this request.
The second problem is the presence of a type ddbjfasta where the type ddbjentry is expected. To solve this mismatch, we ask our synthesis algorithm for a way to derive the latter from the former. It returns with an empty result (see Figure 9, center), which means that our SIB collection can not provide an appropriate sequence of services. We exclude the type ddbjfasta and the SIB getFASTA_DDBJEntry, by which is it produced, and try our luck with the type ddbjaccession, which has been defined last, as starting point for the synthesis. The answer is a service sequence consisting of the SIB getDDBJEntry (center), by which we can now substitute the improper data retrieval SIB from above.
The bottom of Figure 9 shows the completely assembled process. We omit to demonstrate its execution behaviour, as it is very similar to that of example 1.
Discussion and perspectives
By means of two examples, the previous sections demonstrated the local checking, model checking and workflow synthesis methodology that is currently available in the jABC framework and thus part of Bio-jETI. The Local Checker plugin provides domain-independent functionality and is already conveniently integrated in the framework. We are now working on a user-friendly integration of the domain-specific model checking and synthesis techniques, especially with regard to the bioinformatics application domain. This ongoing work spans three dimensions, which are discussed in the following sections: domain modeling, model checking, and model synthesis.
This dimension is the heart of making information technology available to biologists, as it enables them to express their problems in their own language terms – on the basis of adequately designed ontologies. It raises the issue where the domain knowledge ideally comes from. It is, of course, possible for each user to define custom service and type taxonomies, allowing for exactly the generalization and refinement that is required for the special case. However, as the tools and algorithms that are used are mostly third-party services, it is desirable to automatically retrieve domain information from a public knowledge repository as well. Therefore we plan to incorporate knowledge from different publicly available ontologies, like BioMoby [17, 18] and SSWAP [20, 21], and to integrate it into the service and type taxonomies for use by our synthesis methodology.
It is, of course, also necessary that the services themselves are equipped with meta-information in terms of these ontologies. Again, we are looking at BioMoby with interest: numerous institutions have registered their web services at Moby Central, describing functionality and data types in pre-defined structures using a common terminology. Although BioMoby does not yet use standardized description formalisms like SAWSDL, it is already clear that there is semantic information available that we can use as predicates for automatic service classification.
Furthermore it will be interesting to consider the incorporation of more content-oriented ontologies like the Gene Ontology  or the OBO (Open Biomedical Ontologies)  into our process development framework. This would allow the software to not only support the process development on a technical level, but also in terms of the underlying biological and experimental questions. Additional sources of information, like the provenance ontologies of  could be also easily exploited by our synthesis and verification methods.
This dimension is meant to systematically and automatically provide biologists with the required IT knowledge in a seamless way, similar to a spell checker which hints at orthographical mistakes – perhaps already indicating a proposal for correction. Immediate concrete examples of detectable issues are (cf. the examples presented earlier):
Missing resources: a process step is missing, so that a required resource is not fetched/produced.
Mismatching data types: a certain service is not able to work on the data format provided by its predecessor.
However, this is only a first step. Based on adequate domain modeling, made explicit via ontologies/taxonomies, model checking can capture semantic properties to guarantee not only the executability of the biological analysis process but also a good deal of its purpose, and rules of best practice, like:
All experimental data will eventually be stored in the project repository.
Unexpected analysis results will always lead to an alert.
Chargeable services will not be called before permission is given by the user.
On a more technical side, model checking allows us also to apply the mature process analysis methodology that has been established in programming language compilers in the last decades  and has shown to be realizable via model checking [54, 55]. By providing a predefined set of desirable process properties to the model checker we plan to achieve a thorough monitoring of safety and liveness properties within the framework. Similar to the built-in code checks that most Integrated (Software) Development Environments provide, this would help Bio-jETI users to avoid the most common mistakes at process design time. In addition, the list of verified properties is extendable by the user, and can thus be easily adapted to specific requirements of the application domain.
This dimension can be seen as a step beyond model checking: The biologist does not have to care about data types at all – the synthesis automatically makes the match by inserting required transformation programs. This is similar to a spell checker which automatically corrects the text, thus freeing the writer from dealing with orthography at all. (In our model-based framework, things are well-founded, without the uncertainties of natural language. Please do not be put off by this example because of annoying experiences with spell checkers!)
The potential of this technology goes even further: ultimately, biologists will be able to specify their requests in a very sparse way, e.g. by just giving the essential corner stones, and the synthesis will complete this request to a running process. In our text writing analogy, this might look like a mechanism that automatically generates syntactically and intentionally correct text from text fragments according to predefined rules that capture syntax and intention. For instance, the fragments "ten cars", "1000 Euro for shipping", "19% value added tax", "four days" and "Mercedes", may be sufficient to synthesize a letter in which a logistics company offers its services to Mercedes according to a specific request.
Back to biology, the fragments "DNA sequences", "phylogenetic tree", and "visualization", may automatically lead to a process that fetches EBI sequence data, sends them in adequate form to a tool that is able to produce a phylogenetic tree, and then transfers the result to an adequate viewer. Typically there are many processes that solve such a request. Thus our synthesis algorithm provides the choice of producing a default solution according to a predefined heuristics, or to propose sets of alternative solutions for the biologist to select.
We demonstrated by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI's model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developer may either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations.
In the first example we developed a simple phylogenetic analysis workflow. The model checker detected a SIB trying to access a data item that has not been defined previously in the workflow, which indicates that necessary computation steps are missing. We used the synthesis algorithm to generate the sequence of these missing steps.
The second example dealt with a more complex phylogenetic analysis workflow, involving several local steps processing intermediate data. Here, the model checker did not only detect missing computations, but also a type mismatch that lead to an incorrect process model. Again, the synthesis algorithm was used to find an appropriate intermediate sequence of services and an alternative to the erroneous part of the workflow, respectively.
We believe that our model checking and synthesis technologies have great potential with respect to making highly heterogeneous services accessible to in silico researchers that need to design and manage complex bioinformatics analysis processes. Our approach aims at lowering the required technical knowledge according to the "easy for the many, difficult for the few" paradigm . After an adequate domain modeling, including the definition of the semantic rules to be checked by the model checker or to be exploited during model synthesis, biologists should ultimately be able to profitably and efficiently work with a world-wide distributed collection of tools and data, using their own domain language. This goal differentiates us from other workflow development frameworks like Kepler  or Triana , which can be seen as middleware systems that facilitate the development of grid applications in a workflow-oriented fashion. They require quite some technical knowledge. In Kepler, for instance, the workflow design involves choosing an appropriate Director for the execution, depending on, e.g., whether the workflow depends on time, requires multiple threads or distributed execution, or performs simple transformations. These aspects have to be taken into account for efficient execution of complex computiations, but not necessarily when dealing with the actual composition of services. This way, these frameworks address a bioinformatics user, and not the biologists themselves.
We believe that Bio-jETI's control flow-oriented approach is suitable for adressing non-IT personnel: it allows them to continue to think in "Dos" and "Dont's", and steps and sequences of action in their own terms at their level of domain knowledge. In contrast, dataflow-oriented tools like Kepler , Taverna , or Triana  require their users to change the perspective to a resource point of view, which, in fact, requires implicit (technical) knowlegde to profitably use them.
The challenge for us is now to integrate the available semantic information and the semantically aware technologies into our process development framework in the most user-convenient way. One central issue is to find an appropriate level of abstraction from the underlying technology: we would like to provide a set of general, pre-defined analyses and synthesis patterns, but at the same time give experienced users a way to add specialized specifications. Another issue is how to integrate semantic information about the application domain and its services into this (partly) automated workflow development process, since such knowledge is essential to achieve adequate results.
On the one hand, this requires predicates characterizing the single services, i.e. their function and their input/output behaviour. On the other hand, taxonomies or ontologies are required which provide the domain knowledge against which the services (their predicates) are classified. The majority of this information has to be delivered by the tool and database providers, covering semantics of services as well as semantics of data. The convenience on the client side will increase as the Semantic Web spreads and new standards become established.
This section describes the methodologies for process model verification and synthesis that we used for developing the presented examples.
Process model verification via model checking
where ϕ is expressed in terms of a modal or temporal logic. Applying model checking to process models can help to detect problems in the design phase. It is in particular useful to analyze aspects of the whole model, where syntax or type checking at the component level is not sufficient. Examples for errors whose detection requires awareness of the whole model are manifold, ranging from undefined variables or simple type mismatches to computational gaps and incomplete processes. The list of properties against which the model is evaluated is easily extendable, since including a new constraint in the verification only requires to write a modal or temporal formula expressing the property of interest.
GEAR extends this variant of CTL further and includes additional overlined modalities representing a backward view, i.e. considering the paths that end at a given state. We apply it to our (bioinformatics) process models, the Service Logic Graphs (SLGs), where the entire processes are the models, the individual activities (the services, in the form of SIBs) are the nodes, and the edges express the conditional flow of control. As both nodes and edges are labeled, these models are formally so-called Kripke Transition Systems.
While this is sufficient to ensure that the variable x has been defined at all, it does not say anything about type correctness. Since the name x, however, could be used to refer to different data throughout the process, it is reasonable to extend the above constraint and to include the type of the used variable. In example 1, we considered, for instance, a variable of tree of type Tree:
The model checking reveals a property violation, as can be seen in Figures 6 (top left) and 10 (top): the rightmost SIB is marked by a red overlay icon in the upper right corner, indicating that the property is violated at that node. The reason is that the process does not provide the appropriate input type for the tree visualizer. The same formula can be applied analogously to other variables with other types, as we did, for instance, in our second example.
By process synthesis we refer to techniques that construct workflows from sets of services according to logical specifications . The algorithm that we use for our approach is based on a modal logic that combines relative time with descriptions and taxonomic classifications of types and services . It was implemented for the ABC and ETI platforms [43, 59], and lately also used within the jABC framework. We applied it, for instance, in the SWS Challenge  to synthesize a mediator process converting between different message formats that were used by the web service providers in the scenario of [60, 61].
In the following we describe how to apply our synthesis method, i.e. 1) how the domain knowledge forms a configuration universe, 2) how a modal logic can be used for workflow specification, and 3) what the algorithm can finally derive from this information. Note that we focus on usage here, details on the underlying logics and algorithms can be found in [40, 59].
The configuration universe
In addition, the domain knowledge can be extended further by hierarchically organizing types and services in taxonomies, i.e. simple ontologies that relate entities in terms of is-a and has-a relations. The types and service taxonomies for our examples are given in Figures 4 and 5. The taxonomies are considered by the synthesis algorithm when evaluating type or service constraints.
The specification language
where t c and s c express type and service constraints, respectively.
Thus, SLTL combines static, dynamic, and temporal constraints. The static constraints are the taxonomic expressions (boolean connectives) over the types or classes of the type taxonomy. Analogously, the dynamic constraints are the taxonomic expressions over the services or classes of the service taxonomy. The temporal constraints are covered by the modal structure of the logic, suitable to express the order in which services can be combined.
A formal definition of the semantics of SLTL can be found in . Intuitively, true is satisfied by every sequence of services, and t c by every sequence whose first component has an input interface satisfying t c . Negation and disjunction are interpreted in the usual fashion. The construct ⟨s c ⟩ϕ is satisfied if the first component satisfies s c , and the continuation of the service sequence satsifies ϕ. A formula of the form Gϕ requires that ϕ is satisfied G enerally, and ϕUψ expresses that the property ϕ holds for all services of the sequence, U ntil a position is reached whare the corresponding continuation satisfies the property ϕ.
are two common examples.
As we have seen in the workflow examples, already this simple query has a real practical impact, as it allows to autmatically resolve type mismachtes.
Note that the service constraints in the formula are not concrete service names, but terms from the service taxonomy that define higher-order service categories. The synthesis algorithm takes care of instantiating the result with concrete services.
The synthesis algorithm
The synthesis algorithm interprets SLTL formulas over paths of the configuration universe, i.e. provided with a specification, it searches the configuration universe for (finite) corresponding paths. The algorithm is based on a tableau method, of which a detailed description is given in . It automatically generates all, all minimal, or all shortest service compositions that satisfy a specification, according to the selected synthesis mode. The algorithm's output is the basis for the final assembly of the corresponding SLG.
We are currently re-implemening the algorithm in Java, making it suitable for seamless integration into the jABC framework. Also, we will add functionality for facilitating the synthesis procedure for the user, for instance by providing a graphical interface supporting the domain modeling and formula patterns for the specification of workflows. Furthermore, we plan to incorporate alternative methods for the composition of services, such as an algorithm based on MoSeL  or different tools that are available in the Plan-jETI collection of planning algorithms.
List of abbreviations
Application Programming Interface
Basic Local Alignment Search Tool
Bielefeld Bioinformatics Server
Computation Tree Logic
DNA Data Bank of Japan
European Bioinformatics Institute
European Molecular Biology Open Software Suite
Game-based Easy And Reversed model checking tool
Graphical User Interface
Application Building Center (Java implementation)
Electronic Tool Integration platform (Java implementation)
Linear Time Logic
Monadic Second order Logic
Open Bioinformatics Ontologies
Web Ontology Language
Resource Description Framework RNA: RiboNucleic Acid
Semantic Service Selection Contest
Semantic Annotations for WSDL
Service-Independent Building block
Service Logic Graph
Semantic Linear Time Logic SSWAP: Simple Semantic Web Architecture and Protocol
Semantic Web Services
Uniform Resource Identifier
World Wide Web Consortium
Web Service Description Language.
Many thanks to Stefan Naujokat for technical support with the synthesis tools and algorithms.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 10, 2009: Semantic Web Applications and Tools for Life Sciences, 2008. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S10.
- Bausch W, Pautasso C, Alonso G: BioOpera: Cluster-aware Computing. Proceedings of the 4th IEEE International Conference on Cluster Computing (Cluster) 2002, 99–106.View ArticleGoogle Scholar
- Eker J, Janneck J, Lee E, et al.: Taming heterogeneity – the Ptolemy approach. Proceedings of the IEEE 2003, 91: 127–144.View ArticleGoogle Scholar
- Altintas I, Berkley C, Jaeger E, et al.: Kepler: An Extensible System for Design and Execution of Scientific Workflows. 16th Intl Conf on Scientific and Statistical Database Management (SSDBM'04) 2004, 21–23.Google Scholar
- Oinn T, Addis M, Ferris J, et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045–3054.View ArticlePubMedGoogle Scholar
- Taylor I, Shields M, Wang I, Harrison A: The Triana Workflow Environment: Architecture and Applications. In Workflows for e-Science. Springer, New York; 2007:320–339.View ArticleGoogle Scholar
- Tang F, Chua CL, Ho L, et al.: Wildfire: distributed, Grid-enabled workflow construction and execution. BMC Bioinformatics 2005, 6: 69.PubMed CentralView ArticlePubMedGoogle Scholar
- Margaria T, Kubzcak C, Steffen B: Bio-jETI: a Service Integration, Design, and Provisioning Platform for Orchestrated Bioinformatics Processes. BMC Bioinformatics 2008, 9(Suppl 4):S12.PubMed CentralView ArticlePubMedGoogle Scholar
- Berners-Lee T, Hendler J, Lassila O: The Semantic Web – A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American 2001, 284(5):34–43.View ArticleGoogle Scholar
- W3C Semantic Web Activity[http://www.w3.org/2001/sw/]
- Semantic Annotations for WSDL Working Group[http://www.w3.org/2002/ws/sawsdl/]
- Resource Description Framework (RDF)/W3C Semantic Web Activity[http://www.w3.org/RDF/]
- Web Ontology Language OWL/W3C Semantic Web Activity[http://www.w3.org/2004/OWL/]
- SWS Challenge Website[http://sws-challenge.org]
- International Contest S3 on Semantic Service Selection[http://www-ags.dfki.uni-sb.de/~klusch/s3/index.html]
- OPOSSum Online Portal for Semantic Services[http://fusion.cs.uni-jena.de/opossum/]
- Küuster U, König-Ries B, Krug A: OPOSSum – An Online Portal to Collect and Share SWS Descriptions. In Proceedings of the 2008 IEEE International Conference on Semantic Computing. IEEE Computer Society; 2008:480–481.View ArticleGoogle Scholar
- Wilkinson MD, Links M: BioMOBY: an open source biological web services proposal. Briefings in Bioinformatics 2002, 3(4):331–41.View ArticlePubMedGoogle Scholar
- Wilkinson MD, Senger M, Kawas E, et al.: Interoperability with Moby 1.0-it's better than sharing your toothbrush! Briefings in Bioinformatics 2008, 9(3):220–31.View ArticlePubMedGoogle Scholar
- Wilkinson MD, Gessler D, Farmer A, Stein L: The BioMOBY Project Explores Open-Source, Simple, Extensible Protocols for Enabling Biological Database Interoperability. Proceedings of the Virtual Conference on Genomics and Bioinformatics 2003, 3: 17–27.Google Scholar
- Gessler D: SSWAP – Simple Semantic Web Architecture and Protocol.2009. [http://sswap.info/docs/SSWAP.pdf]Google Scholar
- Simple Semantic Web Architecture and Protocol2009. [http://sswap.info]
- Ashburner M, Ball CA, Blake JA, et al.: Gene ontology: tool for the unification of biology. Nature Genetics 2000, 25: 25–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith B, Ashburner M, et al.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotech 2007, 25(11):1251–1255.View ArticleGoogle Scholar
- Garvey TD, Lincoln P, Pedersen CJ, Martin D, Johnson M: BioSPICE: access to the most current computational tools for biologists. Omics: A Journal of Integrative Biology 2003, 7(4):411–420.View ArticlePubMedGoogle Scholar
- Brambilla M, Celino I, Ceri S, Cerizza D, Valle ED, Facca F: A Software Engineering Approach to Design and Development of Semantic Web Service Applications. In The Semantic Web – ISWC. Springer Berlin/Heidelberg; 2006:172–186.Google Scholar
- Haselwanter T, Kotinurmi P, Moran M, Vitvar T, Zaremba M: WSMX: A Semantic Service Oriented Middleware for B2B Integration. In Service-Oriented Computing – ICSOC. Springer Berlin/Heidelberg; 2006:477–483.Google Scholar
- Dibernardo M, Pottinger R, Wilkinson M: Semi-automatic web service composition for the life sciences using the BioMoby semantic web framework. Journal of Biomedical Informatics 2008.Google Scholar
- Bio-jETI Website[http://biojeti.cs.tu-dortmund.de/]
- Margaria T, Kubczak C, Njoku M, Steffen B: Model-based Design of Distributed Collaborative Bioinformatics Processes in the jABC. Proceedings of ICECCS, IEEE Computer Society 2006, 169–176.Google Scholar
- Kubczak C, Margaria T, Fritsch A, Steffen B: Biological LC/MS Preprocessing and Analysis with jABC, jETI and xcms. Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2006 2006, 303–308.Google Scholar
- Lamprecht A, Margaria T, Steffen B, et al.: GeneFisher-P: variations of GeneFisher as processes in Bio-jETI. BMC Bioinformatics 2008, 9(Suppl 4):S13.PubMed CentralView ArticlePubMedGoogle Scholar
- Lamprecht A, Margaria T, Steffen B: Seven Variations of an Alignment Workflow – An Illustration of Agile Process Design and Management in Bio-jETI. In Bioinformatics Research and Applications. Volume 4983. LNBI, Atlanta, Georgia: Springer; 2008:445–456.View ArticleGoogle Scholar
- Steffen B, Margaria T, Nagel R, Jörges S, Kubczak C: Model-Driven Development with the jABC. Hardware and Software, Verification and Testing 2006, 92–108.Google Scholar
- jABC Website[http://www.jabc.de]
- Margaria T, Nagel R, Steffen B: jETI: A Tool for Remote Tool Integration. In Tools and Algorithms for the Construction and Analysis of Systems. Volume 3440/2005. LNCS, Springer Berlin/Heidelberg; 2005:557–562.View ArticleGoogle Scholar
- Margaria T, Kubczak C, Steffen B, Naujokat S: The FMICS-jETI Platform: Status and Perspectives. In ISoLA 2nd IEEE-EASST International Symposium On Leveraging Applications of formal methods, verification, and validation, Paphos (CY), Proceedings. IEEE Computer Science Press; 2006:414–418.Google Scholar
- Jörges S, Margaria T, Steffen B: Genesys: Service-Oriented Construction of Property Conform Code Generators. Innovations in System and Software Engineering 2009, 4(4):361–384.View ArticleGoogle Scholar
- Clarke EM, Grumberg O, Peled DA: Model Checking. The MIT Press; 1999.Google Scholar
- Bakera M, Margaria T, Renner CD, Steffen B: Verification, Diagnosis and Adaptation: Tool supported enhancement of the model-driven verification process. ISoLA 2007, 85–97. [(Journal version to appear in ISSE)] [(Journal version to appear in ISSE)]Google Scholar
- Steffen B, Margaria T, Freitag B: Module Configuration by Minimal Model Construction. Tech rep University of Passau; 1993.Google Scholar
- Steffen B, Margaria T, Beeck M: Automatic synthesis of linear process models from temporal constraints: An incremental approach. ACM/SIGPLAN Int. Workshop on Automated Analysis of Software (AAS'97) 1997.Google Scholar
- Margaria T, Steffen B: Backtracking-Free Design Planning by Automatic Synthesis in METAFrame. Fundamental Approaches to Software Engineering 1998, 188.View ArticleGoogle Scholar
- Margaria T, Steffen B: LTL Guided Planning: Revisiting Automatic Tool Composition in ETI. In Proceedings of the 31st IEEE Software Engineering Workshop. IEEE Computer Society; 2007:214–226.Google Scholar
- Pillai S, Silventoinen V, Kallio K, et al.: SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Research 2005, (33 Web Server):W25–8.Google Scholar
- Labarga A, Valentin F, Anderson M, Lopez R: Web Services at the European Bioinformatics Institute. Nucleic Acids Research 2007, (35 Web Server):W6–11.Google Scholar
- Hartmeier S, Krüger J, Giegerich R: Webservices and Workflows on the Bielefeld Bioinformatics Server: Practices and Problems. Proceedings of the Workshop on Network Tools and Applications in Biology (NETTAB), Pisa, Italy 2007.Google Scholar
- Miyazaki S, Sugawara H, Ikeo K, Gojobori T, Tateno Y: DDBJ in the stream of various biological data. Nucleic Acids Research 2004, (32 Database):D31–4.Google Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends in Genetics: TIG 2000, 16(6):276–7.View ArticlePubMedGoogle Scholar
- Zmasek CM, Eddy SR: ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics (Oxford, England) 2001, 17(4):383–4.View ArticleGoogle Scholar
- Shigemoto Y, Kuwana Y, Sugawara H: Blast-ClustalW workflow.2009. [http://xml.nig.ac.jp/workflow/blast_clustal.html]Google Scholar
- Web API for Biology (WABI)[http://xml.nig.ac.jp/]
- Sahoo SS, Sheth A, Henson C: Semantic Provenance for eScience: Managing the Deluge of Scientific Data. IEEE Internet Computing 2008, 12(4):46–54.View ArticleGoogle Scholar
- Aho AV, Lam MS, Sethi R, Ullman JD: Compilers: Principles, Techniques, and Tools. Addison Wesley; 2007.Google Scholar
- Steffen B: Data Flow Analysis as Model Checking. In TACS '91: Proceedings of the International Conference on Theoretical Aspects of Computer Software. Springer-Verlag; 1991:346–365.View ArticleGoogle Scholar
- Schmidt DA, Steffen B: Program Analysis as Model Checking of Abstract Interpretations. In Proceedings of the 5th International Symposium on Static Analysis. Springer-Verlag; 1998:351–380.Google Scholar
- Margaria T: Service is in the Eyes of the Beholder. IEEE Computer 2007.Google Scholar
- Clarke EM, Grumberg O, Peled DA: Model Checking. The MIT Press chap. Temporal Logics; 1999:27–32.Google Scholar
- Manna Z, Wolper P: Synthesis of Communicating Processes from Temporal Logic Specifications. ACM Trans Program Lang Syst 1984, 6: 68–93.View ArticleGoogle Scholar
- Freitag B, Steffen B, Margaria T, Zukowski U: An Approach to Intelligent Software Library Management. In Proceedings of the 4th International Conference on Database Systems for Advanced Applications (DASFAA). World Scientific Press; 1995:71–78.Google Scholar
- Margaria T, Bakera M, Raffelt H, Steffen B: Synthesizing the Mediator with jABC/ABC. In EON. Volume 359. CEUR Workshop Proceedings, Tenerife, Spain: CEUR-WS.org; 2008.Google Scholar
- Kubczak C, Margaria T, Kaiser M, Lemcke J, Knuth B: Abductive Synthesis of the Mediator Scenario with jABC and GEM. In EON. Volume 359. CEUR Workshop Proceedings, Tenerife, Spain: CEUR-WS.org; 2008.Google Scholar
- Kelb P, Margaria T, Mendler M, Gsottberger C: MOSEL: A flexible toolset for monadic second-order logic. PROCEEDINGS OF CAV'97, LNCS 1254 1997, 183–202.Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. Journal of Molecular Biology 1990, 215(3):403–10.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 1994, 22(22):4673–80.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.