We designed biowep, a workflow enactment portal (web based client application, as defined in the WfMC Reference Model), that allows for the selection and execution of a set of predefined, annotated workflows. The system is available on-line [17].
Biowep has been implemented starting from the architecture designed in the Oncology over Internet – O2I project, but it is:
-
not restricted to oncology, all bioinformatics applications can be included,
-
not limited to workflows created by using the Taverna Workbench: two workflows management systems and enactment engines are presently used and more can be added,
-
not limited to internally created workflows, since submission of workflows for insertion in biowep is allowed.
Of course, such generalization has implied changes in the initial architecture. Such changes have been taken into account during the implementation of the portal.
Furthermore, end users support has been implemented and the software has been made available to interested researchers under the GNU Less General Public License (LGPL) license.
Selection of a workflow
In biowep, users are authenticated. The system stores information on workflows executed by each user and it is therefore able to list workflows executed by him/her. Last executed workflows are listed first and for each workflow executed in the past, its version number and the execution date are also shown.
The system also supports retrieval of lists of available workflows on the basis of the role of the user in his/her organization (e.g., researcher, clinician, computer scientist) and of the domains of interest (e.g., mutation analysis, gene prediction). In the latter case, workflows are listed by date (last executed first), while, in the former, they can also be listed by number of executions carried out by all users (i.e., by popularity among users of the system). In figure 1, the web page listing all workflows available in the workflow repository is shown.
The list includes the name and a short description of the workflows, together with their current version number and the last execution date. In these pages, two buttons are always available for enacting the workflow (button 'run') or retrieving its details (button 'details').
In figure 2, details of a workflow are shown. These include its overall annotation and the annotation of its main steps. Also available is a link to a diagram of the workflow.
Search of a workflow through its annotation
Search and identification of workflows of interest can also be achieved by means of the annotation of the workflows. In figure 3, the web page allowing this kind of search is shown. Conditions can be defined on the application domain of the workflow, as well as on its type (the kind of elaboration or analysis that it performs) and the type of its input and output fields. Conditions can be set on each column (see figure 3 again) and they are then combined by using a logical AND. When multiple conditions are put on the same column, these are combined by using a logical OR. An example query could be: find all workflows in the molecular biology domain (application domain) including at least one elaboration step that retrieves (retrieval task) DNA sequences (output) on the basis of a Genbank accession number (input). Of course, end users are not obliged to put conditions on every field: these can be left undetermined. A search that does not impose any condition on any field will result in a list of all annotated steps and workflows.
Results are listed in the same page and include the annotation. Also included is a note that specifies whether the retrieved data refers to the overall task performed by a workflow or to the task performed by a single step in a workflow. In the former case, the workflow can be enacted, while, in the latter, a list of all workflows including that step can be requested.
Workflow enactment
In figure 4, the input form for the execution of a workflow is shown. In this page, input fields are described in details and suggestions for possible input values are reported, so that the required data syntax is clearly shown. Required and optional fields are pointed out.
The enactment of workflows created with Taverna is carried out, as already said, by using Freefluo. In this case, the execution is performed on the server and results are stored in the system and made available to the user as soon as they are available. If the execution takes more time than a predefined period (usually 30 seconds, but this time can be changed by modifying a parameter in the configuration file), the workflow is executed in background and the user is invited to retrieve results later in the results section. In this case, results are also returned by email. Instead, workflows created with BioWMS, are executed by issuing a request to the Hermes server that is available at the University of Camerino. In this case, results are only returned by email.
Visualization and management of results
In biowep, workflow executions and related results can be saved, either temporarily or definitively, stored and later retrieved, analysed and used for further analyses. In figure 5, the web page listing all saved results and allowing for their further visualization is shown. Results can currently be displayed on the computer of the user by using a java library that must be downloaded from the portal and installed locally. A version of the java virtual machine must also be available and running on the user's side. The visualization library is derived from Taverna Workbench and it includes some extra java classes.
Available workflows
Biowep currently includes a set of workflows that are devoted to the retrieval of data from the IARC TP53 Mutation Database [18, 19] and from the CABRI catalogues of biological resources [20, 21]. Some of these workflows were first created in the sphere of the Oncology over Internet – O2I project and have been presented in [22]. Some workflows have been made available both in Scufl and in XPDL formats. More workflows are being created and tested in various application domains.
Support for users and developers
Support for users and developers is available in the associated site [23] from where interested researchers can retrieve all available documentation (user and installation manuals, presentations, papers) and download software, database structure and workflows (see figure 6). Archives of mailing lists are also available at the support web site. Three mailing lists have been created and will soon be announced and started: biowep-announce, biowep-forum and biowep-dev. The first is an announcements list for informing users about availability of new versions of biowep and new workflows. The second is an open discussion list on biowep features, also aimed to answer users' questions. The third list is restricted to developers and it is the depute list for discussions about improvements, new features, bug fixes.
Finally, researchers willing to submit their workflows for inclusion in the biowep repository can upload them through the ad-hoc form. Software download and workflows upload are limited to registered users of the portal. So, a unique registration is requested for accessing the portal and the support web site.
Comparison with workflow engines
Biowep is not a workflow management system itself. It does not allow researchers to create their own workflows. Instead, it allows all researchers to enact predefined workflows. Biowep significantly simplifies access for not skilled researchers to automated in-silico procedures implemented by using external workflow management systems. This allows them to avoid undergoing a deep and continuous training on best WMS, available Web Services and their specific features and requirements. Such a training, indeed, would be needed in order to use either WMS or Web Services directly. Also, since the portal is able to enact workflows defined by different standards (currently, Scufl and XPDL) and created by different WMS (currently, Taverna and BioWMS), it offers researchers the possibility of taking profit from the best features and interoperability capacities of all included WMS.