Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields

Fig. 1

Illustration of the issues when assigning multiplicities based only on a k-mer histogram based cutoff. a A k-mer histogram that consists of a mixture of negative binomial distributions. Each component models the k-mer coverage variability for a particular multiplicity. The two-sided arrows delineate the coverage intervals corresponding to a multiplicity estimate. Note how large areas under the curve of a particular multiplicity fall in an interval of a different estimated multiplicity. b Example genome sequence with corresponding reads and de Bruijn graph. Nodes and arcs are labelled with their read coverage. Nodes are also labelled with their corresponding fragment in the genome sequence. Sequencing errors cause spurious nodes and arcs in the de Bruijn graph, such as node e′ and j. Nodes are encircled according to their true multiplicity (cf. the patterns in Fig. 1a), all correct arcs have true multiplicity 1. c Most likely multiplicity assignment to nodes and arcs based on the k-mer histogram in Fig. 1a. This assignment leads to inconsistencies: nodes where there is a conservation of flow of multiplicity are marked with ✓, nodes where this is violated are marked with ✗. d Multiplicity assignment such that conservation of flow of multiplicity holds in each node. These assignments are correct for all nodes and arcs and reveal sequencing errors (nodes/arcs with multiplicity zero) and the repeat structure of the genome sequence

Back to article page