Skip to main content

Table 1 Mapping between PyGMQL methods and GMQL operators or utilities

From: PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

PyGMQL functionDescriptionGMQL operator
load_from_pathUTIL, loads a dataset from local repositorySELECT
load_from_remoteUTIL, loads a dataset from remote repositorySELECT
load_from_fileUTIL, loads a bed file from local repository 
selectreg_selectmeta_selectUNOP, filters samples using region and/or metadata predicatesSELECT
projectreg_projectmeta_projectUNOP, projects (in/out) attributes of regions or metadata. Creates new attributes by means of expressionsPROJECT
extendUNOP, creates a new metadata attribute by aggregation of region dataEXTEND
covernormal_coverflat_coversummit_coverhistogram_coverUNOP, collapses regions from several samples into regions of a single sample, based on min/max accumulation indexesCOVER
orderUNOP, orders the samples of a dataset based on regions and/or metadata attributesORDER
mergeUNOP, merges all the samples of a dataset into a single oneMERGE
groupmeta_groupreg_groupUNOP, groups regions and/or metadata with the same valuesGROUP
joinBINOP, joins the regions of two datasets based on distance-based predicatesJOIN
mapBINOP, computes aggregate values from overlapping regions of two datasetsMAP
unionBINOP, builds the union of regions and metadata of two datasetsUNION
differenceBINOP, keeps the regions of a dataset not intersecting with regions of another oneDIFFERENCE
materializeUTIL, triggers the query execution for the specified dataset and stores the result after query completionMATERIALIZE
headUTIL, Shows the first lines of a dataset 
  1. For every method we provide a concise explanation (UNOP stands for unary operator, BINOP stands for binary operator and UTIL identifies an utility function)