The CSM problem is a manytomany generalization of the classical mincost bipartite matching problem [12]. We describe the problem in an abstract setting, and cast it to a read alignment problem in the next section.
Consider arbitrary sets X and Y. A manytomany matching (henceforth a matching) between X and Y is a set M of pairs {(x, y) ∈ X × Y} (see Figure 2, (a), (b), (c). The coverage of an element x ∈ X with respect to a matching M is c_{
M
} (x) = {y : (x, y) ∈ M}. Symmetrically, c_{
M
} (y) = {x : (x, y) ∈ M} for y ∈ Y .
A coverage sensitive matching cost function (henceforth a cost function) w for X and Y assigns matching costs w_{
m
} (x, y) for every pair (x, y) ∈ X × Y , and coverage costs w_{
c
} (z, i) for every z ∈ X ∪ Y and every integer i ≥ 0. The cost of a matching M between X and Y with respect to w is given by
w\left(M\right)=\sum _{\left(x,y\right)\in M}{w}_{m}\left(x,y\right)+\sum _{z\in X\cup Y}{w}_{c}\left(z,{c}_{M}\left(z\right)\right)
(1)
The CSM problem
Input: A Matching Instance (X, Y, w) consisting of sets X, Y, and cost function w.
Output: Compute CSM\left(X,Y,w\right)=\underset{M\subseteq X\times Y}{\text{min}}w\left(M\right).
Note that CSM is a generalization of classical problems in combinatorics. For example, consider the problem of finding a maximum (partial onetoone) matching on a bipartite graph G with vertex shores X, Y, and an edge set E. This problem can be solved by solving CSM on the input X, Y using the following costs: set w_{
c
} (z, 0) = w_{
c
} (z, 1) = 0, and w_{
c
} (z, i) = ∞ for all z ∈ X ∪ Y, i > 1; set w_{
m
} (x, y) = 1 for (x, y) ∈ E and otherwise set w_{
m
} (x, y) = ∞. Similarly, CSM can also be used for solving the minimum/maximum weight variants of the bipartite matching problem. However, CSM is NPhard in general (see Additional File 1), and therefore we do not expect to solve the general instance efficiently.
CSM with convex coverage costs
Let (X, Y, w) be a matching instance. We say that w has convex coverage costs if for every element z ∈ X ∪ Y and every integer i > 0, {w}_{c}\left(z,i\right)\le \frac{{w}_{c}\left(z,\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}i1\right)\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{w}_{c}\left(z,\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}i+1\right)}{2}. We show here that CSM with convex coverage costs can be reduced to the polytime solvable mincost integer flow problem [11].
For x ∈ X, denote d_{
x
} = {y : w_{
m
} (x, y) <∞}, and similarly d_{
y
} = {x : w_{
m
} (x, y) <∞} for y ∈ Y . Denote {d}_{X}=\underset{x\in X}{\text{max}}{d}_{x} and {d}_{Y}=\underset{y\in Y}{\text{max}}{d}_{y}. The reduction builds the flow network N = (G, s, t, c, w'), where G is the network graph, s and t are the source and sink nodes respectively, and c and w' are the edge capacity and cost functions respectively. The graph G = (V, E) is defined as follows (Figure 2d).

♦ V = X ∪ Y ∪ C^{X}∪ C^{Y}∪ {s, t}, where the sets {C}^{X}=\left\{{c}_{1}^{X},{c}_{2}^{X},...,{c}_{{d}_{X}}^{X}\right\}, {C}^{Y}=\left\{{c}_{1}^{Y},{c}_{2}^{Y},...,{c}_{{d}_{Y}}^{Y}\right\}, and {s, t} contain unique nodes different from all nodes in X and Y . Note that we use the same notations for elements in X and Y and their corresponding nodes in V, where ambiguity can be resolved by the context.

♦ E = E_{1} ∪ E_{2} ∪ E_{3} ∪ E_{4} ∪ E_{5}, where

{E}_{1}=\left\{\left(s,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{X}\right):{c}_{i}^{X}\in {C}^{X}\right\},

{E}_{2}=\left\{\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{X}\in {C}^{X},\phantom{\rule{2.77695pt}{0ex}}x\in X,\phantom{\rule{2.77695pt}{0ex}}{d}_{x}\le i\right\},

{E}_{3}=\left\{\left(x,\phantom{\rule{2.77695pt}{0ex}}y\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}x\in X,\phantom{\rule{2.77695pt}{0ex}}y\in Y,\phantom{\rule{2.77695pt}{0ex}}{w}_{m}\left(x,\phantom{\rule{2.77695pt}{0ex}}y\right)<\infty \right\},

{E}_{4}=\left\{\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right):y\in Y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\in {C}^{Y},\phantom{\rule{2.77695pt}{0ex}}{d}_{y}\le i\right\},
and
The capacity function c assigns infinity capacities to all edges in E_{1} and E_{5} and unit capacities to all edges in E_{2}, E_{3} and E_{4}. The cost function w' assigns zero costs to edges in E_{1} and E_{5}, costs w_{
c
} (x, i)  w_{
c
} (x, i  1) to edges \left({c}_{i}^{X},x\right)\in {E}_{2}, costs w_{
c
} (y, i)  w_{
c
} (y, i  1) to edges \left(y,\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{c}_{i}^{Y}\right)\in {E}_{4}, and costs w_{
m
} (x, y) to edges (x, y) ∈ E_{3}. For E' ⊆ E, denote w\text{'}\left({E}^{\prime}\right)=\sum _{e\in {E}^{\prime}}w\text{'}\left(e\right). An integer flow in N is a function f : E → {0, 1, 2, . . .}, satisfying that f(e) ≤ c(e) for every e ∈ E (capacity constraints), and \sum _{u:\left(u,v\right)\in E}f\left(u,v\right)=\sum _{u:\left(u,v\right)\in E}f\left(v,u\right) for every v ∈ V \ {s, t} (flow conservation constraints). The cost of a flow f in N is defined by w\text{'}\left(f\right)=\sum _{e\in E}f\left(e\right)w\text{'}\left(e\right).
In what follows, let (X, Y, w) be a matching instance where w has convex coverage costs, and let N be its corresponding network. Due to the convexity requirement, for every x ∈ X and every integer i > 0, \begin{array}{c}w\text{'}\left({c}_{i+1}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)w\text{'}\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)=\left({w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i+1\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i\right)\right)\left({w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i1\right)\right)\\ ={w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i+1\right)+{w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i1\right)2{w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}i\right)\ge 0.\end{array} Similarly, for every y ∈ Y and every integer i > 0, w\text{'}\left(y,{c}_{i+1}^{Y}\right)w\text{'}\left(y,{c}_{i}^{Y}\right)\ge 0, and we get the following observation:
Observation 1. Series of the form w\text{'}\left({c}_{1}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right),w\text{'}\left({c}_{2}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right),\phantom{\rule{0.3em}{0ex}}... and w\text{'}\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{1}^{Y}\right),w\text{'}\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{2}^{Y}\right),\phantom{\rule{0.3em}{0ex}}...are nondecreasing. Consequentially, for every {E}^{\prime}\subseteq \left\{\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}x\in X,\phantom{\rule{2.77695pt}{0ex}}1\le i\le {d}_{x}\right\}and {E}^{\u2033}=\left\{\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}x\in X,\phantom{\rule{2.77695pt}{0ex}}1\le i\le \left{E}^{\prime}\right\right\},w\text{'}\left({E}^{\u2033}\right)\le w\text{'}\left({E}^{\prime}\right), and similarly for {E}^{\prime}\subseteq \left\{\left(y,\phantom{\rule{0.3em}{0ex}}{c}_{i}^{Y}\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}y\in Y,\phantom{\rule{2.77695pt}{0ex}}1\le i\le {d}_{y}\right\}and {E}^{\u2033}=\left\{\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right)\phantom{\rule{2.77695pt}{0ex}}:\phantom{\rule{2.77695pt}{0ex}}y\in Y,\phantom{\rule{2.77695pt}{0ex}}1\le i\le \left{E}^{\prime}\right\right\}.
Given a flow f in N, define the matching M_{
f
} = {(x, y) : (x, y) ∈ E_{3}, f(x, y) = 1}. Denote {E}_{x}^{f}=\left\{\left({c}_{i}^{X},x\right):f\left({c}_{i}^{X},x\right)=1\right\} and {E}_{y}^{f}=\left\{\left(y,\phantom{\rule{0.3em}{0ex}}{c}_{i}^{Y}\right):f\left(y,\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{c}_{i}^{Y}\right)=1\right\}. Since for edges e ∈ E_{1} ∪ E_{5} we have that w'(e) = 0, and since for edges e ∈ E_{2} ∪ E_{3} ∪ E_{4} we have that f(e) ∈ {0, 1} (due to capacity constraints), we can write
\begin{array}{ll}\hfill w\text{'}\left(f\right)& ={\displaystyle \sum _{e\in E}}f\left(e\right)w\text{'}\left(e\right)={\displaystyle \sum _{\begin{array}{c}e\in {E}_{2}\cup {E}_{3}\cup {E}_{4}\\ \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}f\left(\mathsf{\text{e}}\right)=1\end{array}}}w\text{'}\left(e\right)\phantom{\rule{2em}{0ex}}\\ =w\text{'}\left({M}_{f}\right)+{\displaystyle \sum _{x\phantom{\rule{0.3em}{0ex}}\in \phantom{\rule{0.3em}{0ex}}X}}w\text{'}\left({E}_{x}^{f}\right)+{\displaystyle \sum _{y\phantom{\rule{0.3em}{0ex}}\in \phantom{\rule{0.3em}{0ex}}Y}}w\text{'}\left({E}_{y}^{f}\right).\phantom{\rule{2em}{0ex}}\end{array}
(2)
Given a noninfinity cost matching M between X and Y, define the flow f_{
M
} in N as follows:

♦ For every (x, y) ∈ E_{3}, f (x, y) = 1 if (x, y) ∈ M, and otherwise f(x, y) = 0;

♦ For every \left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)\in {E}_{2},\phantom{\rule{2.77695pt}{0ex}}f\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)=1 if c_{
M
}(x) ≤ i, and otherwise f\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)=0;

♦ For every \left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right)\in {E}_{4}, f\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right)=1 if c_{
M
}(y) ≤ i, and otherwise f\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right)=0;

♦ For every \left(s,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{X}\right)\in {E}_{1},\phantom{\rule{2.77695pt}{0ex}}f\left(s,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{X}\right)=\left\left\{x:f\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)=1\right\}\right;

♦ For every \left({c}_{i}^{Y},\phantom{\rule{2.77695pt}{0ex}}t\right)\in {E}_{5},\phantom{\rule{2.77695pt}{0ex}}f\left({c}_{i}^{Y},\phantom{\rule{2.77695pt}{0ex}}t\right)=\phantom{\rule{0.3em}{0ex}}\left\left\{y:f\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{i}^{Y}\right)=1\right\}\right.
It is simple to assert that f_{
M
} is a valid flow in N (satisfying all capacity and flow conservation constraints), and that {M}_{{f}_{M}}=M.
Claim 1. For every flow f in N, w\text{'}\left({f}_{{M}_{f}}\right)\le w\text{'}\left(f\right).
Proof. From flow conservation constraints \left{E}_{x}^{f}\right=\left{E}_{x}^{{f}_{{M}_{f}}}\right={c}_{{M}_{f}}\left(x\right) for every x ∈ X, where in particular by definition we have that {E}_{x}^{{f}_{{M}_{f}}}=\left\{\left({c}_{i}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right):1\le i\le {c}_{{M}_{f}}\left(x\right)\right\} Therefore, it follows from Observation 1 that w\text{'}\left({E}_{x}^{{f}_{{M}_{f}}}\right)\le w\text{'}\left({E}_{x}^{f}\right) for every x ∈ X, and similarly it may be shown that w\text{'}\left({E}_{x}^{{f}_{{M}_{f}}}\right)\le w\text{'}\left({E}_{x}^{f}\right) for every y ∈ Y. Hence,
\begin{array}{ll}\hfill w\text{'}\left({f}_{{M}_{f}}\right)& \phantom{\rule{1em}{0ex}}\stackrel{\mathsf{\text{Eq}}.2}{=}\phantom{\rule{0.3em}{0ex}}w\text{'}\left({M}_{{f}_{{M}_{f}}}\right)+{\displaystyle \sum _{x\in X}}w\text{'}\left({E}_{x}^{{f}_{{M}_{f}}}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{\displaystyle \sum _{y\in Y}}w\text{'}\left({E}_{y}^{{f}_{{M}_{f}}}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\le \phantom{\rule{2.77695pt}{0ex}}w\text{'}\left({M}_{f}\right)+{\displaystyle \sum _{x\in X}}w\text{'}\left({E}_{x}^{f}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{\displaystyle \sum _{y\in Y}}w\text{'}\left({E}_{y}^{f}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\stackrel{\mathsf{\text{Eq}}.2}{=}\phantom{\rule{2.77695pt}{0ex}}w\text{'}\left(f\right).\phantom{\rule{2em}{0ex}}\end{array}
□
Denote \mathrm{\Delta}=\mathrm{\Delta}\left(X,\phantom{\rule{2.77695pt}{0ex}}Y,\phantom{\rule{2.77695pt}{0ex}}w\right)=\sum _{z\in X\cup Y}{w}_{c}\left(z,\phantom{\rule{2.77695pt}{0ex}}0\right), and note that Δ depends only on the instance (X, Y, w) and not on any specific matching.
Claim 2. For every matching M between × and Y, w'(f_{
M
}) = w(M)  Δ.
Proof. For x ∈ X, we have that \begin{array}{c}w\text{'}\left({E}_{x}^{{f}_{M}}\right)=w\text{'}\left({c}_{1}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)+w\text{'}\left({c}_{2}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)+...+w\text{'}\left({c}_{{c}_{M}\left(x\right)}^{X},\phantom{\rule{2.77695pt}{0ex}}x\right)\\ =\left({w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}1\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}0\right)\right)+\left({w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}2\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}1\right)\right)+...+\left({w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(x\right)\right)\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(x\right)1\right)\right)\\ ={w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(x\right)\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}0\right),\end{array} and similarly w\text{'}\left({E}_{y}^{{f}_{M}}\right)={w}_{c}\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(y\right)\right){w}_{c}\left(y,\phantom{\rule{2.77695pt}{0ex}}0\right) for y ∈ Y. Therefore,
\begin{array}{ll}\hfill w\text{'}\left({f}_{M}\right)& \phantom{\rule{1em}{0ex}}\stackrel{\mathsf{\text{Eq}}.2}{=}\phantom{\rule{0.3em}{0ex}}w\text{'}\left(M\right)+{\displaystyle \sum _{x\in X}}w\text{'}\left({E}_{x}^{{f}_{M}}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{\displaystyle \sum _{y\in Y}}w\text{'}\left({E}_{y}^{{f}_{M}}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}=w\text{'}\mathsf{\text{(}}M\mathsf{\text{)}}\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{\displaystyle \sum _{x\in X}}\left({w}_{c}\left(x,\phantom{\rule{0.3em}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(x\right)\right){w}_{c}\left(x,\phantom{\rule{2.77695pt}{0ex}}0\right)\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+{\displaystyle \sum _{y\in Y}}\left({w}_{c}\left(y,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(y\right)\right){w}_{c}\left(y,\phantom{\rule{2.77695pt}{0ex}}0\right)\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}={\displaystyle \sum _{\left(x,y\right)\in M}}{w}_{m}\left(x,\phantom{\rule{2.77695pt}{0ex}}y\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+\phantom{\rule{2.77695pt}{0ex}}{\displaystyle \sum _{z\in X\cup Y}}{w}_{c}\left(z,\phantom{\rule{2.77695pt}{0ex}}{c}_{M}\left(z\right)\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{\displaystyle \sum _{z\in X\cup Y}}{w}_{c}\left(z,\phantom{\rule{2.77695pt}{0ex}}0\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}\stackrel{\mathsf{\text{Eq}}.1}{=}\phantom{\rule{2.77695pt}{0ex}}w\left(M\right)\mathrm{\Delta}.\phantom{\rule{2em}{0ex}}\end{array}
□
Claim 3. Let f* be a minimum cost flow in N. Then, M_{
f*
} is a minimum cost matching between X and Y, and CSM(X, Y, w) = w'(f*) + Δ.
Proof. Since f* is a minimum cost flow in N, \phantom{\rule{2.77695pt}{0ex}}w\text{'}\left({f}^{*}\right)\le w\text{'}\left({f}_{{M}_{{f}^{*}}}\right)\stackrel{\mathsf{\text{Clm}}.1}{\le}w\text{'}\left({f}^{*}\right), thus w\text{'}\left({f}^{*}\right)=w\text{'}\left({f}_{{M}_{{f}^{*}}}\right). Let M be a matching between X and Y. Again, from the optimality of f*, w'(f*) ≤ w'(f_{
M
}) and so w\left({M}_{{f}^{*}}\right)\mathrm{\Delta}\phantom{\rule{0.3em}{0ex}}\stackrel{Clm.2}{=}w\text{'}\left({f}_{{M}_{f*}}\right)=w\text{'}\left({f}^{*}\right)\le w\text{'}\left({f}_{M}\right)\stackrel{Clm.2}{=}w\left(M\right)\mathrm{\Delta}, and in particular w\left({M}_{{f}^{*}}\right)\le w\left(M\right). Thus, {M}_{{f}^{*}} is a minimum cost matching for (X, Y, w), and so CSM\left(X,\phantom{\rule{2.77695pt}{0ex}}Y,\phantom{\rule{2.77695pt}{0ex}}w\right)=w\left({M}_{{f}^{*}}\right)\stackrel{Clm.2}{=}w\text{'}\left({f}^{*}\right)+\mathrm{\Delta}.
□