Skip to main content

Table 1 Comparison of workflow management systems and HTS analysis pipelines based on essential features, particularly regarding reproducibility

From: uap: reproducible and robust HTS data analysis

View full size image

 
  1. Tools were selected from https://github.com/pditommaso/awesome-pipelineretaining only tools that were considered to be actively developed by at least five contributors (latest commit after 31.01.2018), are licensed as open source, where the scope of application is clearly on bioinformatics, and which support cluster batch systems. Modular: Workflows are assembled from reusable steps; Customizable: Configuration of the analysis is separated from the code defining the steps; Flexible: Workflows can easily be altered without programming skills; Failure recovery: Failure during execution is detected and subsequent analyses halted to avoid corrupted results; Reproducibility - Dependencies: WMS enforces that dependencies between analysis steps and intermediate results are correctly maintained. Consistency: WMS safeguards that analysis steps are successfully completed prior to execution of subsequent steps. Linking code and result: WMS ensures consistency between the code defining the analysis and the currently available results. Logging: WMS logs Stdout/Stderr, exit status, tool versions, WMS version, executed commands, execution date, and in-/output files (see corresponding Additional file 1: Tables S1 and S2 for details); Data authenticity WMS records information about creator and creating process(es) of the data; Data integrity: WMS records information that allows to verify the integrity of created data (e.g. hash sums); Supp. platforms - Cluster: WMS supports compute cluster management systems ton run jobs; Docker: WMS can run jobs using Docker images; Cloud: WMS can be deployed in a compute cloud; Supp. CWL: WMS supports common workflow language; Local: WMS can be executed locally without depending on a cluster management system or a (web)server; Results are given as (not met), (partially met), (fully met) and – (not stated). Ratings are based on information provided in the papers, documentations and manuals. A more detailed comparison is presented in Additional file 1: Table S1