Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: GenAp: a distributed SQL interface for genomic data

Fig. 2

Our distributed range join architecture. In this picture, distributed table A joins with distributed table B on genomic interval overlapping. Table A goes through a number of transformations to enable the Spark driver to create an interval forest which stores index pointers to the original data. Next it propagates the interval forest to workers which transform table B by performing interval lookups on the forest. The result of this operation is table T1, which contains tuples of data from table B and pointers to data of table A. To materialize this we join it with table A1 on those pointers and we obtain table T, which is the final result of the operation. The text under each table shows the data type of the contents of each table

Back to article page