Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: PyBDA: a command line tool for automated analysis of big biological data sets

Fig. 1

Using PyBDA. (1) To use PyBDA, the user only requires to create a short config file that lists the different methods to be executed. (2) From the config file, PyBDA creates an abstract Petri net, i.e., a bipartite directed graph with data nodes (gray squares) and operation nodes (analysis methods, green rectangles). (3) PyBDA traverses the net and creates triples, i.e., subgraphs consisting of an input file, an associated analysis method, and an output file. It then uses Snakemake for execution of each triple. The associated method of every triple is implemented as a Python module, each developed against the DataFrame API from Apache Spark. Spark uses a master to chunk a method into several tasks and distributes these on worker nodes on a distributed HPC cluster

Back to article page