A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra

Background Top-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis. Results We formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow problem on graphs and propose a graph-based algorithm for the identification and quantification of proteoforms using top-down HomMTM spectra. Conclusions Experiments on a top-down mass spectrometry data set of the histone H4 protein showed that the proposed method identified many proteoform pairs that better explain the query spectra than single proteoforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2273-4) contains supplementary material, which is available to authorized users.


Proof of the NP-hardness of the MEkSF problem
In the decision version of MEkSF problem, we are given a graph G with vertex capacities, a flow f , and a number k, the objective is to determine if there are k splittable flow F such that its flow is f and its error is 0.
Theorem 1. The decision version of the MEkSF problem is NP-complete.
Proof. We reduce the partition problem to the decision version of the MEkSF problem. Given a multiset S of positive integers, the partition problem is to determine if S can be partitioned into two subsets S 1 and S 2 such that the sum of the numbers in S 1 equals the sum of the numbers in S 2 .
For a given instance S = {a 1 , a 2 , . . . , a n } of the partition problem, we construct an instance of the MEkSF problem. Let C = n i=1 a i . The graph contains four layers. The first layer contains only one source vertex s, and the fourth layer contains only one sink vertex t. For each number a i ∈ S, a vertex u 2,i is added to the second layer of the graph and the capacity of u 2,i is a i . Two vertices u 3,1 , u 3,2 are added to the third layer and their capacities are C/2. Next, we add edges to connect vertices in neighboring layers. For each vertex pair v 1 and v 2 such that v 1 is in layer i and v 2 is in layer i + 1 (for 1 ≤ i ≤ 3), an directed edge is added from v 1 to v 2 . The total flow value is set as C and the number k of splittable paths is set as n.
→ If there is a solution S 1 and S 2 to the instance of the partition problem, we can find an n-splittable flow with error 0 as follows. For each number a i ∈ S 1 , we add the path s, u 2,i , u 3,1 , t to the solution to the MEkSF problem; for each number a j ∈ S 2 , we add the path s, u 2,j , u 3,2 , t to the solution to the MEkSF problem. Finally, the flow that goes through u 3,1 is C/2 and the flow that goes through u 3,2 is also C/2.
The total error of the n splittable paths is 0, and the total flow of the paths is C.
1 ← If the instance of the MEkSF problem has a solution such that its total flow value is C and its error is 0, then the partition problem has a solution. Let P = {P 1 , P 2 , . . . , P n }, a set of n paths from s to t, be the solution to the MEkSF problem. Two observations can be obtained: (1) There are no two paths in P that go through the same vertex in layer 2. If there exists such a path pair, then at least one vertex in layer 2 does not appear in any path in P and its flow is 0. As a result, the total error of the n splittable paths is not zero, which is a contradiction. (2) The sum of the flows of the paths that go through u 3,1 is C/2 and the sum of flows of the paths that go through u 3,2 is also C/2. A number a i ∈ S is added to S 1 if P contains a path s, v 2,i , v 3,1 , t; S 2 , otherwise. Based on observation 1, the assignments result in a partition of S. Based on observation 2, the sum of the numbers in S 1 equals to the sum of the numbers in S 2 .