Skip to main content

Table 1 Mapping between PyGMQL methods and GMQL operators or utilities

From: PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

PyGMQL function

Description

GMQL operator

load_from_path

UTIL, loads a dataset from local repository

SELECT

load_from_remote

UTIL, loads a dataset from remote repository

SELECT

load_from_file

UTIL, loads a bed file from local repository

 

selectreg_selectmeta_select

UNOP, filters samples using region and/or metadata predicates

SELECT

projectreg_projectmeta_project

UNOP, projects (in/out) attributes of regions or metadata. Creates new attributes by means of expressions

PROJECT

extend

UNOP, creates a new metadata attribute by aggregation of region data

EXTEND

covernormal_coverflat_coversummit_coverhistogram_cover

UNOP, collapses regions from several samples into regions of a single sample, based on min/max accumulation indexes

COVER

order

UNOP, orders the samples of a dataset based on regions and/or metadata attributes

ORDER

merge

UNOP, merges all the samples of a dataset into a single one

MERGE

groupmeta_groupreg_group

UNOP, groups regions and/or metadata with the same values

GROUP

join

BINOP, joins the regions of two datasets based on distance-based predicates

JOIN

map

BINOP, computes aggregate values from overlapping regions of two datasets

MAP

union

BINOP, builds the union of regions and metadata of two datasets

UNION

difference

BINOP, keeps the regions of a dataset not intersecting with regions of another one

DIFFERENCE

materialize

UTIL, triggers the query execution for the specified dataset and stores the result after query completion

MATERIALIZE

head

UTIL, Shows the first lines of a dataset

 
  1. For every method we provide a concise explanation (UNOP stands for unary operator, BINOP stands for binary operator and UTIL identifies an utility function)