Wildfire allows the user to visually construct workflows. For execution, Wildfire exports the workflow as a GEL script, and then calls a GEL interpretor to execute it. The GEL interpretor can either run on the same machine as Wildfire, or on a remote compute server. Figure 1 summarises the interaction between Wildfire and GEL.
Wildfire is implemented in Java, and has been tested on Windows and Linux platforms. On a Linux platform, the user can run workflows directly on the same machine: ideal for developing and testing small examples on a laptop, while reserving the multi-processor servers and clusters for running the workflow on real data.
We next describe the two main activities enabled by Wildfire: construction and execution of workflows.
Workflow construction
When constructing workflows, the user does not need to work directly with the syntax of scripting languages such as GEL or perl. Rather, the user is presented with a graphical workflow canvas. On the canvas, a workflow component can be (i) an atomic component, (ii) a subworkflow or (iii) a loop (both parallel and sequential). An atomic component approximately corresponds to an EMBOSS application; in particular, each atomic component has an ACD (Ajax Command Definition [8]) description of its parameters and options. The user can select the atomic components from a customisable list of templates, which by default includes all the EMBOSS 2.8.0 applications (see Availability and requirements section). Components are visually rendered on the canvas as yellow rectangles labelled with the component name (e.g. EMBOSS program name), and a unique numerical identifier which can be used distinguish instances from the same component template. Sequential dependencies between components are created by drawing an arrow between them. By default, components not linked by arrows are assumed to be independent (and so can be executed in parallel).
Double clicking on an atomic component in the workflow will bring up a properties window resembling that of Jemboss (see Fig. 2). Wildfire uses Jemboss code to parse the ACD [8] description of the application to construct the form and provides default values where defined. These forms simplify configuration options by replacing the command-line flags and switches with graphical user interface elements such as drop-down menus. Help text annotations for the input fields save the user the effort of looking up UNIX man- or EMBOSS tfm-pages.
Wildfire extends the Jemboss interface by allowing the user to use expressions (similar to spreadsheet formulae) in the text fields. For example, in Fig. 2, the query file for blastall is = $flie. The first letter is an equals symbol (=) and indicates that this is not a literal string, but an expression. The remainder is the expression meaning "the value of variable $flie". Here the value of $flie is determined by the pforeach container, as shown in the background window, and denotes a parallel composition of blastall instances with $file set to the different files matching *_dice*.fna. The output file is
= $file . ".out"
which is an expression meaning "the value of variable $flie with .out appended". Another example of an expression is
= $f % ".fasta"
which means "the value of variable $f without the .fasta extension". The % and . operators can be mixed, for example
= $f % ".fasta" . ".pep"
which replaces the .fasta extension with .pep.
In addition, the user can add his/her own command-line programs to the list of atomic components by providing a description of its command-line options using an extended ACD syntax. The Wildfire user interface has a facility to help the user write ACD files for new atomic components. The interface shields the user from the complex ACD syntax.
Other than defining the dependencies between components and the invocation arguments, the user can place input files required by the workflow in subdirectories within the workflow directory. Wildfire can instruct GEL to copy files from these subdirectories into the working directory before a component is executed. Any instance can specify input files, thus allowing for files to be staged-in in a just-in-time manner. However, a common workflow pattern is one which specifies all input files to copied only by the first instance in the workflow.
Workflow execution
For execution, Wildfire exports a programmatic description of the workflow, in a scripting language called GEL [7], which is passed to a GEL interpretor for execution. GEL is a scripting language with parallel constructs characterising common parallel workflow execution patterns. It is designed to be a generic parallel scripting language which can be executed on different types of homogeneous and heterogeneous parallel hardware such as shared-memory SMP servers, clusters with a shared disk image, and Grids without a shared disk image. There currently exist interpretors that can run GEL scripts on SMP servers, clusters with Platform LSF, PBS or Sun GridEngine (SGE), and on Condor Grids [9, 10]. GEL is similar to APST [11], NIMROD [12] and DAGMan (part of Condor) but also allows for cyclic dependencies between jobs. The reader is referred to [7] for a more thorough description of GEL.
When developing small workflows, the user can run the workflow on the same machine (see Availability and requirements section). In this way, Wildfire can be used as a stand-alone application without access to the network.
Alternatively, the user can choose to send the workflow to a remote server and run it there. In this case, Wildfire uses the secure shell (SSH) protocol to send the necessary files over to, and then run the GEL interpretor on the remote server (see Fig. 3). The GEL interpretor can execute the atomic components directly if the server has multiple processors. If the server is a cluster, then GEL can submit the atomic components as jobs to the queue manager. In either case, the GEL interpretor will try to use multiple processors where possible. Remote server execution is useful for workflows with large data sets since GEL will make use of multiple processors. It is also useful if the atomic components are not installed on the local machine.
Wildfire and GEL do not require super-user privileges to install: they can be installed in the "home" directory. For the client-server mode of operation, only an SSH service on the server is required; there is no need to configure other services such as SOAP over HTTP or Web-/Grid-Services, and the firewall is only required to allow incoming SSH connections. Most modern UNIX-style configurations already provide an SSH service.
Wildfire can also use GEL to break up the workflow and run parts of it concurrently on different supercomputers using Condor. (Note: GEL 1.0 uses the Globus [13] protocols to provide Grid support. GEL 2.0 uses Condor for Grid execution and future support for Globus Grids will be via Condor-G.) This is useful for very large workflows which require as many compute resources as possible. In practice, it is more useful when not all components are available on any one machine, for example, because of licence availability.
Wildfire monitors execution of the individual atomic components and feedback is provided via annotations on the canvas which are updated in real-time.
The exported GEL script can also be run directly using an interpretor via the command line. This allows a workflow to be run in batch mode independently of Wildfire, and is useful for very long-running workflows or those that have to be run repeatedly.