Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Fragment assignment in the cloud with eXpress-D

Figure 2

Method overview. The top portion (A) shows the procedure for running the distributed batch EM algorithm with Spark, ignoring sequence-specific bias. First, blocks of partitioned alignments (yellow) and targets (magenta) are distributed to the slave nodes, stored in memory as RDDs, and cached. An initial set of parameter estimates (green with black symbols) are broadcast to the slaves with alignments. The alignments on each slave are probabilistically assigned and new parameter estimates are partially accumulated (green with white symbols) on each node. These are then sent back to the master to be fully combined and re-broadcast for the next round. When sequence-specific bias is enabled, additional processing (B) takes place between some rounds. The parameter estimates are sent to the slaves with targets and the expected sequence bias given the current abundance estimates are accumulated and normalized on the master node to produce updated weights [9]. Auxiliary parameters (error, bias, and fragment lengths) are fixed after 1000 rounds, and the estimation procedure stops when convergence of the abundance parameters is reached.

Back to article page