Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Figure 1

A generic PaPy workflow. Any generic workflow that is expressible as a directed graph can be implemented as a PaPy pipeline (A). As indicated by the pipes linking separate processing streams in (A), workflow construction in PaPy is flexible, not restrictive. Because of the methods that PaPy's NuMap objects can use to parallelize or distribute calculations (text, Table 3), a workflow can utilize a variety of available computational resources, such as threads, multi-processor architectures, and remote resources (B). PaPy's Dagger objects, representing the entire pipeline, are comprised of Piper nodes (colored squares) inter-connected via pipes (black arrows); 'pipes' can, equivalently, be considered as edges that represent data-flow dependencies (gray arrows 'pulling' data through the left branch of (A)). Colors are used to match sample Pipers (A) with their NuMap instances (B), and the conceptual relationship between Piper, Worker, and NuMap concepts is shown in (C). Parallelism is achieved by pulling data through the pipeline in adjustable batches, and overall performance may be improved by collapsing unbranched linear segments into a single node (D).

Back to article page