Fig. 2From: GenAp: a distributed SQL interface for genomic dataOur distributed range join architecture. In this picture, distributed table A joins with distributed table B on genomic interval overlapping. Table A goes through a number of transformations to enable the Spark driver to create an interval forest which stores index pointers to the original data. Next it propagates the interval forest to workers which transform table B by performing interval lookups on the forest. The result of this operation is table T1, which contains tuples of data from table B and pointers to data of table A. To materialize this we join it with table A1 on those pointers and we obtain table T, which is the final result of the operation. The text under each table shows the data type of the contents of each tableBack to article page