Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

Fig. 4

Schematic representation of the deployment strategies adopted in the three applications. a Local/Remote system interaction for the analysis of ENCODE histone marks signal on promotorial regions. The gene dataset is stored in the local file system, the ENCODE BroadPeak database is hosted in the GMQL remote repository, deployed on the Hadoop file system with three slaves. b Configuration for the interactive analysis of the GWAS dataset against the whole set of enhancers from ENCODE. The library interacts directly with the YARN cluster and the data is stored in the Google Cloud File System with a fixed configuration of three slaves, accessed through the Hadoop engine. The gwas.tsv file is downloaded from the web and stored in the file system before executing the query. c Distributed setup for running the TICA query. Three datasets (from ENCODE and GENCODE) are in GDM format and stored in HDFS and the query runs on Amazon Web Services with a variable number of slave nodes, for evaluating the scalability of the system

Back to article page