BioWMS: a web-based Workflow Management System for bioinformatics
© Bartocci et al; licensee BioMed Central Ltd. 2007
Published: 8 March 2007
An in-silico experiment can be naturally specified as a workflow of activities implementing, in a standardized environment, the process of data and control analysis. A workflow has the advantage to be reproducible, traceable and compositional by reusing other workflows. In order to support the daily work of a bioscientist, several Workflow Management Systems (WMSs) have been proposed in bioinformatics. Generally, these systems centralize the workflow enactment and do not exploit standard process definition languages to describe, in order to be reusable, workflows. While almost all WMSs require heavy stand-alone applications to specify new workflows, only few of them provide a web-based process definition tool.
We have developed BioWMS, a Workflow Management System that supports, through a web-based interface, the definition, the execution and the results management of an in-silico experiment. BioWMS has been implemented over an agent-based middleware. It dynamically generates, from a user workflow specification, a domain-specific, agent-based workflow engine. Our approach exploits the proactiveness and mobility of the agent-based technology to embed, inside agents behaviour, the application domain features. Agents are workflow executors and the resulting workflow engine is a multiagent system – a distributed, concurrent system – typically open, flexible, and adaptative. A demo is available at http://litbio.unicam.it:8080/biowms.
BioWMS, supported by Hermes mobile computing middleware, guarantees the flexibility, scalability and fault tolerance required to a workflow enactment over distributed and heterogeneous environment. BioWMS is funded by the FIRB project LITBIO (Laboratory for Interdisciplinary Technologies in Bioinformatics).
Over the past few years, new high-throughput methods for data collection in life science have greatly increased data generation. Parallel to the rise of biological databases, many bioinformatics tools, to query and analyze biological data, have been made available. As consequence, the traditional scientific process has become computationally intensive and in-silico experiments – described as processes of activities to test hypotheses, derive a summary and search for patterns  – are labouriously executed in a large, distributed and dynamic environment.
Since the execution of an in-silico experiment may simultaneously demand data integration from several application domains (e.g. biology, biochemistry, oncology), tool integration – analysis techniques (e.g. data mining and text mining) computational methods often offered as services that are dynamically updated, added or removed  –, it can be naturally specified as workflows of activities that implement data and control analysis processes in standardized but dynamic environments. The workflow has the advantage to be reproducible, traceable and to reuse intermediate results; fundamental features to validate a scientific experiment.
The software component that "defines, manages and executes workflows through the execution of software whose order of execution is driven by a computer representation of the workflow logic", according to Workflow Management Coalition (WfMC) Reference Model , is named Workflow Management System (WMS). In bioinformatics, several WMSs-like [4–7] have been already developed and adopted to support the daily work of a bioscientist.
Taverna  – a part of MyGrid project  – has mainly the aim to integrate Web Services by workflows specified in a choreography language: XML Simple conceptual unified flow language (XScufl). Being Taverna editor embedded with its engine in a Java stand-alone application, it is quite heavy, for an end-user, to download it.
Biopipe  framework, instead, provides a set of wrappers to directly interface resources like executable programs and data adaptors. It does not support the use of synchronization operators, like fork and join, because a bioinformatics experiment is considered just a sequential pipeline.
Pegasys  system enables bioscientists to create and manage sequence analysis workflows. It includes numerous analytical tools and provides database capacities to maximize information captured during the execution of a workflow.
All the above mentioned WMSs describe, often with different expressive power, an in-silico experiment without using standard workflow specification languages. Only recently, Garcia, in , has proposed a meta-analysis of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Remarkable are also standard process definition languages – i.e XPDL , BPEL  – are well-studied from the control-flow perspective; they allow to describe activities and their execution ordering through different constructs as sequence, choice, parallelism, join synchronization, etc.. Standard languages are also equipped of several tools to graphically edit workflow specifications, characteristic that is useful for a user not expert in programming.
Another significant feature that characterizes most of WMSs is their client/server architectural style, in which the workflow enactment is entrusted to a central component, that acts as a server and is responsible for the correct workflow execution. A monolithic architecture, as that of above mentioned WMSs, does not allow the execution of workflow, or parts of it, over a distributed and heterogeneous environments due to lack of flexibility, scalability and fault tolerance, features required for a distributed cross-organizational workflow.
An attempt to overcome these limitations, has been proposed by Cichocki in  in the Migrating Workflow Model (MWM). In this model, instances of a workflow or parts of it can migrate among sites participating in the workflow execution; it means, the code and the whole execution state, including all data gathered during the execution, might be transferred to another site. This model provides two main benefits: the first regards the migrating workflow with respect to the decrease of traffic network; often it is less heavy to transfer the code implementing workflow specification than the amount of data needed during its execution. The second asset concerns the possibility for the workflow to be executed even in mobile and weekly connected network of devices. But, the MWM requires a suitable middleware that guarantees code mobility support.
In this paper, we present BioWMS, a Workflow Management System which, from the one hand, provides a web-based user interface for the definition, the execution and the results management of an in-silico experiment and from the other, it exploits the agent-oriented technology to implement a Migrating Workflow Model. BioWMS dynamically generates the workflow engine associated with a single user workflow specification. The agent-based technology allows to embed application domain features inside the agents knowledge and proactiveness and mobility inside the agent behaviour. The resulting workflow engine is a multiagent system – a distributed, concurrent system – typically open, flexible, adaptive and mobile.
Results and Discussion
Process definition tools are used to create the process description of a workflow in a computer processable form. The workflow editor is the program that supports the workflow specification by composing activities in a graphical environment. BioWMS provides a web-based editor, called WebWFlow, that enables the definifion of a workflow in the XML Process Definition Language (XPDL) , a WfMC standard. As a consequence, the workflow specification can be edited also by other applications compliant to this standard as for example the JaWE  editor.
Workflow client applications are used to execute existing or previous saved workflows, to check the workflow execution state and to manage the produced results. WebWFlow, the web-based editor of BioWMS, provides also these facilities. Another workflow client application developed for BioWMS is BioWEP , a web portal suitable for the ontology-based selection and enactment of predefined and annotated workflows.
The workflow enactment service provides the run-time environment in which process instantiation and activation occur, utilising one or more workflow management engines, responsible for interpreting and activating part, or all, of the process definition and interacting with the external resources -invoked applications- necessary to process the various activities. The workflow enactment in BioWMS is supported by the Hermes  agent-based middleware. The XpdlCompiler is an Hermes special component that generates a mobile multiagent system – workflow executors – from the user workflow specification. The workflow specification is the coordination model that describes how the generated agents cooperate with each other to reach a particular goal. The middleware provides the necessary software infrastructure that allows the migration of the workflow executors to different sites transparently to users.
The Hermes Graphical User Interface (GUI) is an administration and monitoring tool that allows checking, at any time, of the computational resources available, the memory consumption and the agents that are running.
An example of workflow in BioWMS
To get the global alignment we choose an activity – activity 8 – that receives as parameter a list of sequences in FASTA and calculates the global alignment using ClustalW  tool. For further details of this example we provide in  a videoclip of the workflow definition and execution phases in BioWMS.
Designing a workflow
The activities used in a workflow can be configured with several parameters, in this way it is possible to reuse the same activity for different workflows. In the previous example, the BLAST activity takes in input a sequence and blast parameters such as the blast tool – e.g. blastn – and the database – e.g. ddbj – to choose. This activity can be re-used in each workflow that needs a local alignment of a particular sequence with others in a specific database.
Application domain libraries available
Application domain library
A set of primitive activities to convert a format to XML
A set of primitive and complex activities for sequence alignment
A set of primitive activities to store and to log messages in BioWMS
A set of primitive activities to send emails
A set of primitive activities to exploit Hermes middleware
A set of complex activities wrapping O2I SoapLab services
A set of primitive activities to store data in Taverna format
A set of primitive and complex activities to manage data
A set of primitive activities to invoke a Web Service
A set of primitive and complex activities using XSLT and XQuery
Loading and executing a workflow
Workflows stored in BioWMS can be public or private. While a public workflow is provided by BioWMS and is visible to all users, a private one is available only for the user that has previously defined it. The web-interface allows to view workflow both with a graphical notation and with the corresponding XPDL workflow specification. After a user has defined its workflow and has submitted it, BioWMS compiles it in a multiagent system by exploiting the agent-based middleware. BioWMS supports long-running activities and the execution is independent from the user control.
Conclusions and future work
We have developed BioWMS, a Workflow Management System for the design of an activity-based in-silico experiment. BioWMS, provides a web-based user editor interface and is supported by mobile agent-based technology, it guarantees the flexibility, scalability and fault tolerance required for the workflow enactment over distributed and heterogeneous systems.
As a future work we aim to extend BioWMS with a component called Resourceome  on which we are in parallel working on. Resourceome is a system that keeps "alive" an index of resources in the bioinformatics domain using a specific ontology of resource information. The Resourceome directly assists scientists in the hard navigation in the ocean of bioinformatics resources. The integration of BioWMS with Resourceome might support the final user both in selecting the most suitable activity and to setting-up a new one. The Resourceome itself would dynamically support workflow enactment, providing all the required resources available at runtime. We believe that the workflow technology together with an effective and effcient resource management system could be a good start to face the complexity that surrounds scientist's work and then to help him/her in taking advantage of the huge amount of available resources.
A workflow is a distributed application that involves the coordinated execution of human and system activities, usually, in an heterogeneous environment. We consider a workflow as coordination model for a pool of mobile agents -workflow executors- that implements the workflow engine for a specific workflow instance. In this context agents are autonomous active entities, encapsulate the execution of independent activities and execute their tasks concurrently to the work of the other agents. A collection of agents able to cooperate, in their autonomy, for a common goal forms a multiagent system (MAS) .
In our approach, the generation of the workflow engine is performed by a compiler  in a two phase agent-generation procedure. In the first step a User Level Workflow (ULW), specified by a workflow specification language, is mapped to an Agent Level Workflow (ALW). This mapping is performed by recursively substituting activities of the user-level specification with a workflow of primitive agent-level activities. A User-Level Activity Database (ULAD) maintains the correspondence between user-level activities and ALW. The ALW specifies all entities involved in the execution of a workflow; thus the constraint of spatial and temporal coupling communication can be respected since the compiler knows exactly when communication takes place and which are the receivers and which the senders. In the second step, the compiler concretely generates agents from the ALW specification. To achieve this result, the compiler uses the User- Level Activity Implementation Database (ULAID) and the Database of Skeleton (DoS). The ULAID stores the implementation of agent-level activities, and DoS stores empty implementation of agents (the skeletons). A WE is obtained by plugging the specific behavior into the skeleton. The resulting set of WEs gives rise to an agent-based workflow engine whose role will be compliant to the WMS architecture as before described in Figure 1. The above approach has been implemented on Hermes  architecture whose detailed description is given in the next section.
Hermes [16, 25] is an agent-based mobile middleware, for the design and execution of activity-based applications in distributed environments. It is structured as a component-based, agent-oriented system with a 3-layer software architecture: user layer, system layer and run-time layer. Each layer is customizable and is independent from the others.
Hermes can be configured for specific application domains by adding domain-specific component libraries and thus customizing in a proper way through service agents. It represents a flexible environment suitably designed to support the bioscientist's activities during an in-silico experiment. The main functionalities of BioWMS are provided by Hermes through a set of cooperative bio-service agents (SA). These are shown in Figure 11 and are described as follows:
Data and Tools Integration
AIXO  SA provides a set of wrappers able to access and to present any data source as a collection of XML documents. AIXO (Any Input XML Output) is flexible and modular, it allows to manage many input data sources from HTML to XML, database, flat file, CGI and command line programs;
WSIF SA Service Agent allows other agents to dynamically invoke a Web Service.
Moreover, SoapLab SA  can control a set of Web Services providing programmatic access to many bioinformatics applications on remote computers;
Distributed Annotation System (DAS)
DAS SA enables the access to DAS  sources;
Input and Output Management
BioWMS SA allows the storage of partial results during the execution. Furthermore, Email SA allows user to receive the final and intermediate results by email.
This work is supported by the Investment Funds for Basic Research (FIRB) project Laboratory of Interdisciplinary Technologies in Bioinformatics (LITBIO).
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1.
- Stevens R, Glover K, Greenhalgh C, Jennings C, Pearce S, Li P, Radenkovic M, Wipat A: Performing in silico experiments on the Grid: a users perspective. In Proceedings of UK e-Science All Hands Meeting: 2–4 September 2003; Nottingham Edited by: Cox SJ. 2003, 43–50.Google Scholar
- Stein L: Creating a bioinformatics nation. Nature 2002, 417(6885):119–120. 10.1038/417119aView ArticlePubMedGoogle Scholar
- Hollinsworth D: The Workflow Reference Model. Tech Rep TC00-Workflow Management Coalition 1994.Google Scholar
- Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock M, Wipat A, Li P: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045–3054. 10.1093/bioinformatics/bth361View ArticlePubMedGoogle Scholar
- Hoon S, Ratnapu K, Chia J, Kumarasamy B, Juguang X, Clamp M, Stabenau A, Potter S, Clarke L, Stupka E: Biopipe: a flexible framework for protocol-based bioinformatics analysis. Genome Res 2003, 13(8):1904–1915.PubMed CentralPubMedGoogle Scholar
- Shah S, He D, Sawkins J, Druce J, Quon G, Lett D, Zheng G, Xu T, Ouellette B: Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 2004, 5: 40. 10.1186/1471-2105-5-40PubMed CentralView ArticlePubMedGoogle Scholar
- Tang F, Chua C, Ho L, Lim Y, Issac P, Krishnan A: Wildfire: distributed, Grid-enabled workflow construction and execution. BMC Bioinformatics 2005, 6: 69. 10.1186/1471-2105-6-69PubMed CentralView ArticlePubMedGoogle Scholar
- Stevens R, Robinson A, Goble C: myGrid: personalised bioinformatics on the information grid. Bioinformatics 2003, 19(Suppl 1):i302–4. 10.1093/bioinformatics/btg1041View ArticlePubMedGoogle Scholar
- Carver T, Bleasby A: The design of Jemboss: a graphical user interface to EMBOSS. Bioinformatics 2003, 19(14):1837–1843. 10.1093/bioinformatics/btg251View ArticlePubMedGoogle Scholar
- Garcia Castro A, Thoraval S, Garcia L, Ragan M: Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics 2005, 6: 87. 10.1186/1471-2105-6-87PubMed CentralView ArticlePubMedGoogle Scholar
- XML Process Definition Language[http://xml.coverpages.org/XPDL20010522.pdf]
- OASIS WSBPEL Specification Draft[http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v20-rddl.html]
- Cichocki A, Rusinkiewicz M: Providing Transactional Properties for Migrating Workflows. MONET 2004, 9(5):473–480.Google Scholar
- Enhydra JaWE[http://www.enhydra.org/workflow/jawe/index.html]
- Romano P, Bartocci E, Bertolini G, De Paoli F, Marra D, Mauri G, Merelli E, Milanesi L: Biowep: a workflow enactment portal for bioinformatics applications. BMC Bioinformatics 2006, 8(Suppl 1):S19. 10.1186/1471-2105-8-S1-S19View ArticleGoogle Scholar
- Corradini F, Merelli E: Hermes: agent-base middleware for mobile computing. Mobile Computing, LNCS 2005, 3465: 234–270.Google Scholar
- Ye J, McGinnis S, Madden T: BLAST: improvements for better sequence analysis. Nucleic Acids Res 2006, 34: W6-W9. 10.1093/nar/gkl164PubMed CentralView ArticlePubMedGoogle Scholar
- DNA Data Bank of Japan[http://www.ddbj.nig.ac.jp/]
- FASTA format[http://en.wikipedia.org/wiki/Fasta_format]
- Thompson J, Higgins D, Gibson T: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22(22):4673–4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- A flash presentation of BioWMS[http://litbio.cs.unicam.it/biowms/video.html]
- XSLT: XSL Transformations[http://www.w3.org/TR/xslt]
- Cannata N, Merelli E, Altman R: Time to Organize the Bioinformatics Resourceome. PloS Computational Biology 2005., 1(7): [http://dx.doi.org/10.1371/journal.pcbi.0010076]Google Scholar
- Jennings N, R N: On Agent based Software Engineering. Artificial Intelligence 2000, 117(2):277–296. 10.1016/S0004-3702(99)00107-1View ArticleGoogle Scholar
- Site of HermesV2[http://hermes.cs.unicam.it/]
- Bartocci E, Mariani L, Merelli E: An XML view of the "World". ICEIS (1) 2003, 19–27.Google Scholar
- Senger M, Rice P, Oinn T: Soaplab – a unified Sesame door to analysis tools pages. In Proceedings of UK e-Science All Hands Meeting: 2–4 September 2003; Nottingham Edited by: Cox SJ. 2003, 515–519.Google Scholar
- XQuery 1.0: An XML Query Language[http://www.w3.org/TR/xquery/]
- Dowell R, Jokerst R, Day A, Eddy S, Stein L: The distributed annotation system. BMC Bioinformatics 2001, 2: 7. 10.1186/1471-2105-2-7PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.