Let us assume that all local alignments between scaffold and reference genome have been computed, and we have a set of tuples

*V* = {(

*x*,

*y*,

*c*,

*d*)} such that

*T*[

*x*,

*y*] matches

*P*[

*c*,

*d*], where

*T*[1,

*n*] is the genome and

*P*[1,

*m*] the scaffold. In

*co-linear chaining* the goal is to find a sequence of tuples

*S* =

*s*
_{1}
*s*
_{2}⋯

*s*
_{
p
}∈

*V*
^{
p
} such that

*s*
_{
j
}
*.y* >

*s*
_{
j−1}
*.y*,

*s*
_{
j
}
*.d* >

*s*
_{
j−1}
*.d*, for all 1 ≤

*j* ≤

*p*, and |{

*i*|

*i* ∈ [

*s*
_{
j
}
*.c*
*s*
_{
j
}
*.d]* for some 1 ≤

*j* ≤

*p*}| is maximized. That is, find tuples preserving order in both

*T* and

*P* such that the resulting

*ordered* coverage of

*P* is maximized. We review an efficient solution given in
[

5] and extend it for our purposes. First, sort tuples in

*V* by the coordinate

*y* into sequence

*v*
_{1}
*v*
_{2}⋯

*v*
_{
N
}. Then, fill a table

*C*[1…

*N*] such that

*C*[

*j*] gives the maximum ordered coverage of

*P*[1,

*v*
_{
j
}
*.d*] over the choice of any subset of tuples from {

*v*
_{1}
*v*
_{2},…

*v*
_{
j
}}. Hence max

_{
j
}
*C*[

*j*] gives the total maximum ordered coverage of

*P*. Then one can derive the following formulae for

*C*
*j*[

5] depending on the case: (a) Either the previous tuple

does not overlap

*v*
_{
j
} in

*P*; or (b) the previous tuple

overlaps

*v*
_{
j
}in

*P*. For (a) it holds

Then the final value is *C*
[*j*] = max(*C*
^{a}
[*j*],*C*
^{b}
[*j*]). Now we can modify the formulae taking the invariant values out from the maximizations to obtain *range maximum queries*. These can be solved using a search tree
that supports in *O*(log*n*) time operations *Insert*(*v*,
*i*) to add value *v* to the tree with key *i* (if key *i* is already in the tree, replace its value
with max(*v*,
*v*
^{
′
})); *Delete*(*i*) for deleting node with key *i*; and *v* = *Max*(*l*,*r*) to return the maximum value *v* from nodes {*i*} that belong to the interval *l* ≤ *i* ≤ *r*. Since there are two different maximizations, we need to maintain two different search trees. Notice that applying the recurrence directly would yield a trivial *O*(*N*
^{2}) time algorithm, whereas the use of invariant and search tree gives *O*(*N* log *N*) time. The resulting pseudocode (analogous to one in
[5]) is given below.

**Algorithm**
CoLinearChaining(
*V*
sorted by
*y*
-coordinate:
*v*
_{1},

*v*
_{2},…,

*v*
_{
N
}
)
- (1)
;
;

- (2)

- (3)
;

- (4)
;

- (5)
*C*[*j*]←max(*C*
^{a}[*j*],*C*
^{b}[*j*]);

- (6)
;

- (7)
;

- (8)

The alignment given by applying the above algorithm allows arbitrary long gaps, which is not a desirable feature. The gaps between consecutive contigs in scaffolds are restricted by the mate pair insert size, which also tells that in a correct alignment to the genome the gaps should not deviate much from this value. It is easy to modify co-linear chaining to restrict gaps: Replacing
with
at line (3) in the pseudocode restricts the gap in the scaffold by maxgap. To obtain analogous effect simultaneously in the reference genome, is a bit more tricky. Let us first describe a method that works in the special case that *v*
_{
j
}
*.y*−*v*
_{
j
}
*.x* are equal for all *j* and then consider the modifications required to handle the general case. For the special case, one can deploy
as follows: At step *j* of the algorithm, maintain the invariant that
only contains all tuples
having
and *j*
^{
′
}< *j*. This is accomplished by adding the following code between lines (2) and (3) and initializing *j*
^{
′
}= 1:

(3’) **while**
**do**

(3”)
; *j*
^{
′
}←*j*
^{
′
} + 1

(For simplicity of exposition, this assumes values
are unique keys. One can use e.g. tuples
as keys to ensure uniqueness.) The correctness for the special case follows as *v*
_{
j−1}
*.x* ≤ *v*
_{
j
}
*.x* for all *j* > 2, and one can thus delete values incrementally so that the invariant is satisfied. The method fails in the general case since we can have *v*
_{
j−1}
*.x* > *v*
_{
j
}
*.x* and tuples with *y*-coordinate between [*v*
_{
j
}
*.x*−maxgap,*v*
_{
j−1}
*.x*−maxgap] are deleted. To overcome this, one can modify the algorithm as follows. Duplicate tuples and use *x*-coordinate for one copy and *y*-coordinate for the other as the sorting key. Now each tuple has *left* and *right* occurrence in sorted *V* . Apply the above algorithm, but do deletions only on left occurrences. In addition, on left occurrences, compute *C*[*j*] with lines 3-5 in the algorithm above, add the pair (*v*
_{
j
},*C*[*j*]) in a list of *active tuples*
instead of applying lines 6-7 above. On right occurrences, update *C*[*j*] again but before lines 6-7 above, take the maximum of that value and the one stored for active tuple *v*
_{
j
} in
. Then remove *v*
_{
j
} from
and recompute *C*[*j*
^{
′
}] for all active tuples
in
choosing as *C*[*j*
^{
′
}] the maximum of its previous value and the value computed applying lines 3-5 in the algorithm above. The correctness now follows from the facts that (a) when *v*
_{
j
} is added to the active tuples
, *C*[*j*] is the maximum value without overlapping tuples
taken into account, and (b) all the overlapping tuples
with
have their right occurrence before that of *v*
_{
j
} and hence trigger the update of active tuple (*v*
_{
j
},*C*[*j*]).