Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: PAPerFly: Partial Assembly-based Peak Finder for ab initio binding site reconstruction

Fig. 1

A overview of the PAPerFly algorithm. B: illustration of a node-centric de Bruijn graph for \(k=6\). C illustration of bottleneck processing in a single iteration B consider two k-mers \(K_1, K_2\) that are seen in the sequencing data. These k-mers constitute nodes of the graph. The nodes \(K_1, K_2\) are connected by an oriented edge if and only if the two k-mers overlap by \(k-1\) (in this case, 6) characters; furthermore, the sequence constructed by contracting the k-mers \(K_1, K_2\) must occur in the sequencing data. C firstly, the longest path in the DAG is identified using a topological sorting of the vertices in the DAG (step 1, path drawn in blue). On it, we identify a bottleneck (step 2, orange vertex). Depending on the bottleneck k-mer abundance (in this case, 3), we enumerate as many longest paths as possible while the abundance of each path stays a non-zero integer. Here, we enumerate three paths (step 3, depicted in blue, green and magenta) and assign them abundance of one. Finally, the consumed abundances are subtracted from the k-mer counts. The vertices corresponding to zero count k-mer are removed. In every step, at least one such k-mer exists (step 4, removed edges and vertex are drawn in gray). The process is discussed in more detail in Additional file 1: section S1

Back to article page