Volume 13 Supplement 3
ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011
Combining automated peak tracking in SAR by NMR with structurebased backbone assignment from ^{15}NNOESY
 Richard Jang^{1},
 Xin Gao^{2} and
 Ming Li^{1}Email author
https://doi.org/10.1186/1471210513S3S4
© Jang et al.; licensee BioMed Central Ltd. 2012
Published: 21 March 2012
Abstract
Background
Chemical shift mapping is an important technique in NMRbased drug screening for identifying the atoms of a target protein that potentially bind to a drug molecule upon the molecule's introduction in increasing concentrations. The goal is to obtain a mapping of peaks with known residue assignment from the reference spectrum of the unbound protein to peaks with unknown assignment in the target spectrum of the bound protein. Although a series of perturbed spectra help to trace a path from reference peaks to target peaks, a onetoone mapping generally is not possible, especially for large proteins, due to errors, such as noise peaks, missing peaks, missing but then reappearing, overlapped, and new peaks not associated with any peaks in the reference. Due to these difficulties, the mapping is typically done manually or semiautomatically, which is not efficient for highthroughput drug screening.
Results
We present PeakWalker, a novel peak walking algorithm for fastexchange systems that models the errors explicitly and performs manytoone mapping. On the proteins: hBcl_{XL}, UbcH5B, and histone H1, it achieves an average accuracy of over 95% with less than 1.5 residues predicted per target peak. Given these mappings as input, we present PeakAssigner, a novel combined structurebased backbone resonance and NOE assignment algorithm that uses just ^{15}NNOESY, while avoiding TOCSY experiments and ^{13}Clabeling, to resolve the ambiguities for a onetoone mapping. On the three proteins, it achieves an average accuracy of 94% or better.
Conclusions
Our mathematical programming approach for modeling chemical shift mapping as a graph problem, while modeling the errors directly, is potentially a time and costeffective first step for highthroughput drug screening based on limited NMR data and homologous 3D structures.
Background
Xray crystallography and NMR spectroscopy are the predominant methods for experimental 3D protein structure determination. The advantage of NMR over any other method is that the protein sample can be studied at atomic resolution in solution, and in special cases even in living cells (incell NMR) [1, 2]. In addition to structure determination, NMR has been used successfully in proteinprotein interaction studies [3], studies on protein dynamics [4], and in drug design and screening [5]. Among the more successful NMR methods for drug design and screening, fragmentbased methods, such as SAR by NMR [6, 7], have found their way in pharmaceutical companies and have resulted in discoveries that are currently undergoing clinical trials [8]. In SAR by NMR and other NMR studies, chemical shift mapping is used to identify the atoms in a target protein that experience chemical shift changes upon introduction of a ligand or upon changes in environmental conditions.
The chemical shift, δ, of an atom is its resonance frequency (in units of ppm) measured by NMR experiments. We consider the chemical shifts of three NMRactive isotopes with focus on the latter two: ^{13}C, ^{15}N and ^{1}H. Among the large variety of NMR spectra, only 2D HSQC, 3D NOESY, and 3D TOCSY will be discussed. Each 2D HSQC peak gives the chemical shifts of an N, H^{ N } group, including backbone amides and side chains with amide groups. Our focus is on the backbone amide chemical shifts, which serves as an identifier for an amino acid residue. Each 3D NOESY peak (NOE) consists of three chemical shifts: N, H^{ N } of an amide group, and another proton that is within a distance of about 5Å from the H^{ N }. Therefore, each NOE corresponds to a H^{ N } H contact. Each 3D TOCSY peak consists of the chemical shifts of an amide group, and a proton within the same amino acid as the amide. Therefore, TOCSY gives the side chain protons. In this work, we consider only H^{ α } and H^{ N }.
Typically, chemical shift mapping is done manually or semiautomatically due to errors, noise, peak overlap, and missing data. This manual work can be tedius and time consuming if the protein is large, if there are many spectra, or if there are many ambiguous mappings. Moreover, results derived manually is naturally biased, so the results can be difficult for others to reproduce. To our knowledge, there are only a few automated methods for this problem, and they all produce onetoone mappings rather than allowing for ambiguity. Nevertheless, automated methods are necessary for highthroughput drug screening.
FELIXAutoscreen [12] formulates the assignment of peaks in the reference spectrum to peaks in a perturbed spectrum as a bipartite graph matching problem, such that the sum of the chemical shift and peak shape differences is minimized. Their approach of optimizing the sum of the distances is better than choosing the peak nearest to each reference peak because the local greedy approach disregards the mappings of other peaks nearby, which results in errors. Dummy peaks were used to handle missing data, and peaks were picked on the fly during the execution of their algorithm. To handle more than one set of perturbed spectra, the bipartite matching algorithm was repeated successively, where the current perturbed spectrum becomes the new reference spectrum once it has been mapped. They tested their approach on a 74residue protein domain in 8 different ligand concentrations, and obtained results similar to their manual efforts. The successive approach, however, is a local greedy approach that does not consider all the spectra simultaneously, so information about potential peak movements in later perturbed spectra are ignored.
NvMap [13] also used a greedy algorithm to successively match perturbed spectra. However, unlike FELIXAutoscreen, when matching the reference to a perturbed spectrum, the sum of the distances was not used. Instead, the pair of reference and perturbed peaks with the shortest distance was chosen and removed from consideration, and then the process was repeated for the next shortest. They tested their method on 97 residues of the SUMO protein on 2 different ligands, each at 6 different ligand concentrations. They obtained an average accuracy of 95%. The main source of error was overlapping peaks within a spectrum, where only one of the peaks was picked and added to the peak list. An older method, MUNIN [14], identifies spectra similarity, but not peak paths. By examining a specific subregion in a mixture of different spectra, where only one had binding, it was able to identify the spectrum with binding present.
For large proteins, ambiguous mappings are inevitable. Rather than finding the unique mapping between peaks in the target to peaks in the reference, we find a set of plausible reference peaks for each target peak, where plausibility is determined by a scoring function. If the residue assignment for the reference is known, then the mappings give a set of possible residues for each target peak; e.g., ILE 3, LEU 27, LEU 78. We want this set to be small, but yet contain the correct amino acid. In this paper, we present a novel peak walking model that describes the movements that peaks can make, and an approach that generates high scoring mappings by enumerating high scoring paths based on this model. Unlike previous methods, errors are modeled explicitly without using dummy peaks. We call our method PeakWalker. We tested PeakWalker on 3 proteins with publicly available peak lists: UbcH5B titrated with Not4 [15]; hBcl_{XL} with BH3I1 [11, 16]; and histone H1 at 2 different temperatures [17]. At 218 residues minus a removed flexible loop region R45 to A84, which was removed from the DNA sequence prior to NMR, hBcl_{XL} is much larger than the proteins tested by other automated methods. The average accuracy on the test set was at least 96%, with an average of less than 1.5 amino acids predicted per target peak. We compare PeakWalker to a greedy approach similar to that used by NvMap, but modified to return multiple mappings. We also tested PeakWalker by varying the number of noise peaks.
In the second half of this paper, we describe our structurebased resonance assignment method, PeakAssigner, which takes the output of PeakWalker as input, and then resolves the mapping ambiguities using 3D ^{15}NNOESY and the 3D structure of a homologous protein. In chemical shift perturbation studies, a 3D structure is often available, such as from the Protein Data Bank (PDB) [18]. It is often the case that the bound structure of the protein is similar across different ligands that can bind to it, so that one bound structure can be used for studying different ligands. Therefore, structurebased resonance assignment methods [19–24] are ideal for disambiguating the mappings. Currently, there are no automated backbone resonance assignment methods that use only a series of ^{15}NHSQC spectra and ambiguous NOEs from ^{15}NNOESY spectra. NOEnet [20] requires unambiguous NOEs, such as from 4D NOESY. The Nuclear Vector Replacement (NVR) [21, 23] approach requires a sparse set of unambiguous NOEs from 3D NOESY, residual dipolar couplings (RDC), and amide exchange rates. The contact replacement (CR) [22] method can handle ambiguous NOEs from 3D ^{15}NNOESY, but it also requires 3D ^{15}NTOCSY, and 3D HNHA.
Our previous work on structurebased resonance assignment [19] has requirements similar to the CR method except that instead of HNHA, it requires a known resonance assignment from a protein mutant, which serves as a reference. We were able to perform a fully automatic backbone resonance assignment from automatically picked peaks for a small protein. However, using 3D ^{15}NTOCSY and a similar resonance assignment limited the practicality of the method. In large proteins, the TOCSY can have many overlapped peaks or many missing peaks if the protein is deuterated. In addition, each reference peak can have many corresponding target peaks, so there can be many ambiguous mappings.
In this work, we no longer use TOCSY. The TOCSY was previously used to identify possible amino acid types for each target peak, and this was used to reduce the number of ambiguous mappings. To reduce ambiguity without TOCSY, a series of perturbed spectra could be used. The TOCSY was also used to obtain the chemical shifts of the H^{ α } atoms for matching against NOESY peaks. Such H^{ α } chemical shifts are available in the NOESY spectrum, but in a more noisy form. We have also added a further improvement. The constraint that each NOESY peak is assigned to at most one contact was not enforced in our previous algorithm. In adding this constraint, our new algorithm not only performs resonance assignment, but also backbone NOE assignment and H^{ α } assignment, simultaneously. Although NOE and H^{ α } assignment is not the main output of our algorithm, we show that by performing them, there is an improvement in resonance assignment accuracy, on average. This is demonstrated with simulated NOESY peaks from the protein structures [PDB:1KA5], [PDB:1EGO], [PDB:1G6J], [PDB:1SGO], and [PDB:1YYC]. On hBcl_{XL}, UbcH5B, and histone H1, PeakAssigner achieves an average accuracy of over 94%.
At the end of this paper, we briefly consider the slowexchange case. In slow exchange, the peaks for both the free and bound state may appear in the spectra at the same time, with the intensity of the peak signals proportional to the concentration of each state. If the protein in Figure 1 undergoes slow exchange, only the red and magenta peaks would be present. In the unbound protein, only the red peaks are present. As the ligand concentration increases, for residues undergoing chemical exchange, magenta peaks will appear at increasing peak intensities relative to the corresponding red peaks, which disappear in the fully saturated case.
Results and discussion
This section will describe the mathematical model used by PeakWalker and PeakAssigner, followed by the test results. The test data is described in detail in the Methods.
Peak walking problem
where δ_{ N }(h) is the function that returns the N chemical shift of h, ${\delta}_{{H}^{N}}\left(h\right)$ the H^{ N } chemical shift of h, and the 10 comes from the gyromagnetic ratio of ^{1}H and ^{15}N. Euclidean distance and various types of weightings can also be used to measure chemical shift change [28]. For peaks h ∈ T_{ i } and h' ∈ T_{i+1}, an edge is drawn between them if Δδ_{ N }(h, h') ≤ t_{ N } and $\mathrm{\Delta}{\delta}_{{H}^{N}}\left(h,{h}^{\prime}\right)\le {t}_{{H}^{N}}$, where t_{ N } and ${t}_{{H}^{N}}$ are userspecified thresholds. For UbcH5B and histone H1, 1.0 ppm and 0.2 ppm were used for t_{ N } and ${t}_{{H}^{N}}$, respectively. This is comparable to the thresholds used by FELIXAutoscreen [12]. Smaller thresholds of 0.75 ppm and 0.125 ppm were used for hBcl_{XL} because it has more perturbed spectra, so the chemical shift changes are expected to be more gradual. Edges are not drawn between vertices within the same peak list, so the T_{ i }'s are disjoint.
Definition 1. The maximum weighted kdimensional matching on instance T ⊆ T_{0} × T_{1}× ... × T_{k1}, where the T_{ i }'s are disjoint, is the set of paths M ⊆ T that maximizes some scoring function on M subject to the constraint that for any pair of paths x, y ∈ M, x and y have no vertices in common.
Definition 2. The mappings for peak h_{ i } ∈ T_{k1}is the set of its possible residues R(h_{ i }). If R(h_{ i }) > 1, or if R(h_{ i }) = 1 and R(h_{ i }) ∩ R(h_{ l }) ≠ ∅ for h_{ l } ≠ h_{ i }, then R(h_{ i }) is ambiguous. This set is obtained by first finding M, the maximum weighted kdimensional matching on the graph defined by the above peak walking model that allows for subpaths and vertices to be shared. Let S be the amino acid sequence of the protein, and onetoone function f_{0} : T_{0} → S be the known reference assignment. For paths in M that end in some h_{ j } ∈ T_{0} and h_{ i } ∈ T_{k1}, add f_{0}(h_{ j }) to R(h_{ i }).
The optimal and near optimal sets of paths are generated to obtain different mappings per peak. This is done by modeling the problem as a binary integer linear program (BIP) and using the onetree algorithm [29] to generate multiple solutions that are guaranteed to be within a given percentage of the optimal solution. This percentage, called the gap, is an input to the BIP solver. We used CPLEX^{®} as the solver.
Mathematical model for peak walking
A linear objective function is maximized subject to linear constraints and binary variables.
Binary variables
The variables indicate the transitions and peak states.

X_{hih'}Equals to 1 if peak h ∈ T_{ i }transitions to h' ∈ T_{i+1}. This variable represents a consecutive transition.

X_{ hi }Equals to 1 if h ∈ T_{ i }is a single unambiguous peak. Equals to 0 if it is an ambiguous peak. This variable represents peak state.

D_{ hi }Equals to 1 if h ∈ T_{ i }is missing its peaks in T_{ j }, ∀j >i. This represents a peak that disappears and no longer reappears.

J_{ hih }' Equals to 1 if h ∈ T_{ i }is missing in T_{i+1}, but transitions to h' ∈ T_{i+2}. This represents a jump.

N_{ hi }Equals to 1 if h ∈ T_{ i }has no associated peaks in T_{ j }, ∀_{ j }<i. This represents a new peak.
Objective function coefficients
The objective function coefficients score the transitions and peak states, so the sum of the coefficients multiplied by their corresponding variables gives the score of the paths. Ideally, if a database of peak lists and chemical shift mappings are available, these coefficients could be obtained through training with machine learning techniques, so that the manual mapping process could be modeled. Unfortunately this database does not exist, so we used our best judgement to scale the scores relative to each other.

$C\left({X}_{hi{h}^{\prime}}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}^{\prime},h\right),0,\mathsf{\text{tolN}}\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}^{\prime},h\right),0,\mathsf{\text{tolHN}}\right)$. This is the score of a consecutive transition, where Φ(x, m, s) = 2 × (1  cdf(x, m, s)). cdf is the cumulative distribution function of a normally distributed variable with mean m and standard deviation s. tolN and tolHN were set to values, such that t_{ N }and ${t}_{{H}^{N}}$, respectively, correspond to 2 standard deviations from a mean value of 0. The score is a number between 0 and 1, with small chemical shift changes being closer to 1 (because x is positive, so cdf returns a value of at least 0.5).

$C\left({X}_{hi}\right)=2\times \left(ki1\right)\times \left(\mathrm{\Phi}\left(\frac{3{t}_{N}}{4},0,\mathsf{\text{tolN}}\right)+\mathrm{\Phi}\left(\frac{3{t}_{{H}^{N}}}{4},0,\mathsf{\text{tolHN}}\right)\right)$. This score penalizes ambiguous peaks by rewarding unambiguous peaks. We require ambiguous peaks to have at least 2 paths of compensating transitions from i to k  1. The reward decreases with increasing i because there are fewer transitions available. The $\frac{3}{4}$ inside Φ encourages the compensating transitions to have scores better than this.

$C\left({D}_{hi}\right)=\mathrm{\Phi}\left({t}_{N},0,\mathsf{\text{tolN}}\right)+\mathrm{\Phi}\left({t}_{{H}^{N}},0,\mathsf{\text{tolHN}}\right)$. This is the score for disappearing peaks. We give such peaks a positive score similar to a consecutive transition with a chemical shift change of t_{ N }and ${t}_{{H}^{N}}$.

$C\left({J}_{hi{h}^{\prime}}\right)=0.75\times \left(\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}^{\prime},h\right),0,\mathsf{\text{tolN}}\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}^{\prime},h\right),0,\mathsf{\text{tolHN}}\right)\right)$. This is the score for jumps. The 0.75 encourages consecutive transitions over jumps of the same chemical shift change.

$C\left({N}_{hi}\right)=\left(ki1\right)\times \left(\mathrm{\Phi}\left(\frac{3{t}_{N}}{4},0,\mathsf{\text{tolN}}\right)+\mathrm{\Phi}\left(\frac{3{t}_{{H}^{N}}}{4},0,\mathsf{\text{tolHN}}\right)\right)$. This is the score for new peaks. The score is negative to ensure that there must exist compensating transitions from i to k  1.

Peaks corresponding to noise have no transitions, and they get set to unambiguous because we are maximizing and the unambiguous score is nonnegative.
Constraints
 1.
For each peak (ambiguous or unambiguous), the number of inedges is equal to the number of out edges. Even if a peak disappears permanently (an outedge), the peak must have come from a previous transition or be a new peak, which is considered an intransition. From Figure 2, we can see that this constraint is ∀i ∈ [1, k  2], ∀h ∈ T_{ i }, ${\sum}_{{h}^{\prime}}{X}_{{h}^{\prime}\left(i1\right)h}+{\sum}_{{h}^{\prime}}{J}_{{h}^{\prime}\left(i2\right)h}+{N}_{hi}={\sum}_{{h}^{\prime}}{X}_{hi{h}^{\prime}}+{D}_{hi}+{\sum}_{{}_{{h}^{\prime}}}{J}_{hi{h}^{\prime}}$.
 2.
Ambiguous peaks are limited to only consecutive transitions. To get rid of jumps, define the reified constraint ${J}_{hi}=1\leftrightarrow {\sum}_{{h}^{\prime}}{J}_{{h}^{\prime}\left(i2\right)h}\ge 1$, ∀i ∈ [2, k  1], ∀h ∈ T_{ i }, where J_{ hi } is a binary variable. Then jumps are removed with J_{ hi } ≤ X_{ hi } since if X_{ hi } = 0 (ambiguous), then J_{ hi } = 0 and ${\sum}_{{h}^{\prime}}{J}_{{h}^{\prime}\left(i2\right)h}=0$. Disappearing and new peaks are handled similarly. Reified constraints allow one to get the truth value of a logical condition. Such conditions can be combined to form logical constraints, such as AND, OR, NOT, IF THEN, and even the absolute value of a linear expression. Reified constraints and logical constraints can be expressed as linear constraints using auxiliary binary variables and techniques from operations research [30].
 3.
For each unambiguous peak, the number of intransitions is bounded above by 1; similarly for outtransitions. Define the reified constraints ${I}_{hi}=1\leftrightarrow {\sum}_{{h}^{\prime}}{X}_{{h}^{\prime}\left(i1\right)h}+{\sum}_{{h}^{\prime}}{J}_{{h}^{\prime}\left(i2\right)h}+{N}_{hi}\le 1$, and ${O}_{hi}=1\leftrightarrow {\sum}_{{h}^{\prime}}{X}_{hi{h}^{\prime}}+{D}_{hi}+{\sum}_{{h}^{\prime}}{J}_{hi{h}^{\prime}}\le 1$. Then the constraint is expressed as I_{ hi } = X_{ hi } and O_{ hi } = X_{ hi }. This, combined with Constraint 2, also handles, for ambiguous peaks, the constraint that the number of consecutive intransitions is greater than 1 and the number of consecutive outtransitions is greater than 1.
 4.
Consecutive transitions generally do not zigzag. That is, peaks typically do not take a large step in one direction and then take a large step in the reverse direction. To enforce this, let h ∈ T_{ i }, h' ∈ T_{i+1}, h″ ∈ T_{i+2}. If 0.5 ≤ Δδ_{ N }(h, h') ≤ t_{ N }, $0.05\le \mathrm{\Delta}{\delta}_{{H}^{N}}\left(h,{h}^{\prime}\right)\le {t}_{{H}^{N}}$, 0.5 ≤ Δδ_{ N }(h', h″) ≤ t_{ N }, $0.05\le \mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}^{\prime},{h}^{\u2033}\right)\le {t}_{{H}^{N}}$, then consider the following vectors: ${V}_{h{h}^{\prime}}=\left({\delta}_{N}\left({h}^{\prime}\right){\delta}_{N}\left(h\right),10\left({\delta}_{{H}^{N}}\left({h}^{\prime}\right){\delta}_{{H}^{N}}\left(h\right)\right)\right)$ and ${V}_{{h}^{\prime}{h}^{\u2033}}=\left({\delta}_{N}\left({h}^{\u2033}\right){\delta}_{N}\left({h}^{\prime}\right),10\left({\delta}_{{H}^{N}}\left({h}^{\u2033}\right){\delta}_{{H}^{N}}\left({h}^{\prime}\right)\right)\right)$. The consecutive transitions h to h' to h″ zigzag if the angle between V_{hh'}and V_{h'h″}, θ_{hh'h″}, is between 105 and 180 degrees. When h transitions to h', transitions from h' to h″ that result in zigzag are prevented by adding the constraint X_{hih'}≤ Z_{h'(i+1)}, where we have the reified constraint ${Z}_{{h}^{\prime}\left(i+1\right)}=1\leftrightarrow \left({\sum}_{{h}^{\u2033}{\theta}_{h{h}^{\prime}{h}^{\u2033}}\in \left[105,180\right]}{X}_{{h}^{\prime}\left(i+1\right){h}^{\u2033}}=0\right)$. Thus, if X_{hih'}= 1, then all consecutive transitions from h' to h″ that cause zigzag are prevented because the sum is forced to 0.
Number of solutions
The number of solutions generated is dependent on the gap tolerance provided to CPLEX. Unless specified otherwise, a gap of 1% was used. To determine the number of solutions that should be generated, various numbers were tested to determine their effect on the average number of residues predicted per peak. We observed that as the number of solutions increased, the average number of residues plateaus, so we used the value at the start of the plateau as the number of solutions. Likely, no new mappings were generated because paths containing these mappings caused a violation of the gap optimality criteria.
Greedy peak walking
For comparison purposes, we implemented the greedy approach in NvMap, but also added no zigzagging as described above, and jump handling of arbitrary length by allowing unmatched peaks in T_{ i } to be matched to peaks in T_{ j } for any j >i. The same chemical shift thresholds as those used by PeakWalker were used. None of the existing approaches deal directly with ambiguous mappings. To generate these without generating many mappings per peak, we used a greedy approach. For h_{ i } ∈ T_{k1}, where h_{ i } is matched to h_{ j } ∈ T_{0}, in increasing order of Δδ_{ NH }(h_{ j }, h_{ b }) for any h_{ b } ≠ h_{ i } in T_{k1}, add f_{0}(h_{ j }) to R(h_{ b }) until a maximum number of additional mappings have been added. Various values for the maximum were tested.
Resonance assignment
Some definitions are needed before we can formally define this problem and present our algorithm.
Definition 3. A NOESY peak p (δ_{ N }(p), ${\delta}_{{H}^{N}}\left(p\right)$, δ_{ H }(p)) induces an H^{ α } peak for HSQC peak $h\left({\delta}_{N}\left(h\right),{\delta}_{{H}^{N}}\left(h\right)\right)$ if Δδ_{ N }(p, h) ≤ σ_{ N }, $\mathrm{\Delta}{\delta}_{{H}^{N}}\left(p,h\right)\le {\sigma}_{{H}^{N}}$, and δ_{ H }(p) matches within 3 standard deviations of the mean value of ${\delta}_{{H}^{\alpha}}\left(T\left(a\right)\right)$ of at least one amino acid a ∈ R(h), where T(a) is the amino acid type of a. The mean and standard deviations of each amino acid type were obtained from the Biological Magnetic Resonance Data Bank (BMRB) [31]. σ_{ N }, ${\sigma}_{{H}^{N}}$ are match tolerances. We used 0.5, 0.05 ppm. Since the intensity of NOESY peaks is inversely proportional to the distance of the underlying protons in contact, and intraresidue H^{ N }, H^{ α } 's are relatively close, we can expect the intensity of intraresidue H^{ N }H^{ α } NOESY peaks to be large. Among the 8 closest (by Δδ_{ NH }(p, h)) NOESYinduced H^{ α } peaks of HSQC peak h, we took the 4 most intense peaks as a possible induced ${\delta}_{{H}^{\alpha}}\left(h\right)$.
Definition 4. A contact c consists of $c\left[0\right]={H}_{a}^{N}$, which is the amide proton of one amino acid denoted by a, and $c\left[1\right]={H}_{b}^{N}$or ${H}_{b}^{{}^{\alpha}}$, the amide or alpha proton of another amino acid denoted by b. For H^{ α }, it is possible that a = b. Let P(c) be the proton type (H^{ N }or H^{ α }) of c[1].
Definition 5. A NOESY peak match n consists of $n\left[0\right]=\left({\delta}_{N}\left(s\right),{\delta}_{{H}^{N}}\left(s\right)\right)$ of HSQC peak s, $n\left[1\right]={\delta}_{{H}^{N}}\left(t\right)$ or an induced ${\delta}_{{H}^{\alpha}}\left(t\right)$ of HSQC peak t, $n\left[2\right]=\left({\delta}_{N}\left(p\right),{\delta}_{{H}^{N}}\left(p\right),{\delta}_{H}\left(p\right)\right)$ of some NOESY peak p; where, Δδ_{ N }(s, p) ≤ σ_{ N }, $\mathrm{\Delta}{\delta}_{{H}^{N}}\left(s,p\right)\le {\sigma}_{{H}^{N}}$, and Δδ_{ H }(t, p) ≤ σ_{ H }. We used 0.05 ppm for σ_{ H }. For H^{ α }, it is possible that s = t. Let P(n) be the proton type of n[1].
Definition 6. Amino acid a matches HSQC peak h if a ∈ R(h).
Definition 7. Contact c matches NOESY peak match n if i) a ∈ R(s), where amino acid a ∈ c[0] and peak s ∈ n[0], ii) b ∈ R(t), where b ∈ c[1]and t ∈ n[1], and iii) P(c) = P(n).
Definition 8. Let C be the set of all contacts from the 3D structure of a homologous protein, let P be the set of all NOESY peaks, and let S be the amino acid sequence of the protein. The resonance assignment is a onetoone function g_{1} : T_{k1}→ S, where g_{1}(h_{ i }) ∈ R(h_{ i }) for all h_{ i } ∈ T_{k1}, and the NOE assignment is a onetoone function g_{2} : P → C, such that the scoring function $\left({\sum}_{{h}_{i}\in {T}_{k1}}{\sum}_{{g}_{1}\left({h}_{i}\right)\in S}{w}_{1}\left({h}_{i},{g}_{1}\left({h}_{i}\right)\right)\right)+\left({\sum}_{p\in P}{\sum}_{{g}_{2}\left(p\right)\in C}{w}_{2}\left(p,\phantom{\rule{2.77695pt}{0ex}}{g}_{2}\left(p\right)\right)\right)$ is maximized. The functions ${w}_{1}:{T}_{k1}\times S\to \mathbb{R}$and w_{2} : P × C → ℝ weigh each individual resonance and NOE assignment, respectively.
The BIP from our previous work [19] was modified to support the NOE assignment of H^{ N }H^{ N } and H^{ N }H^{ α } contacts without TOCSY.
Binary variables
The variables indicate individual resonance and NOE assignments. Note that each NOESY peak will be assigned to at most one NOESY peak match and vice versa. Therefore, assigning contacts to NOESY peak matches is equivalent to assigning contacts to NOESY peaks. If there is only one possible NOESY peak match for a given NOESY peak, then that peak is unambiguous.

X_{a,h}Equals to 1 if amino acid a is assigned to HSQC peak h, where a matches h.

X_{c,n}Equals to 1 if contact c is assigned to NOESY peak match n, where c matches n.
Objective function coefficients
A linear objective function is maximized. The coefficients are the weights of the assignments, and they are nonnegative.

${w}_{1}\left({X}_{a,h}\right)=3\times \left(1\frac{\mathrm{\Delta}{\delta}_{NH}\left({f}_{0}^{1}\left(a\right),h\right)min\left(h\right)}{max\left(h\right)min\left(h\right)}\right)$. This is the score of assigning amino acid a with reference peak ${f}_{0}^{1}\left(a\right)$ to target peak h. min(h) and max(h) is the smallest and largest, respectively, Δδ_{ NH }among the amino acids in R(h).

${w}_{2}\left({X}_{c,n}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left(p,s\right),0,\frac{{\sigma}_{N}}{2}\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left(p,s\right),0,\frac{{\sigma}_{{H}^{N}}}{2}\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{H}\left(p,t\right),0,\frac{{\sigma}_{H}}{2}\right)+F\left(c\right)$. This is the score of assigning contact c to NOESY peak match n, where HSQC peak s ∈ n[0], HSQC peak t ∈ n[1], and NOESY peak p ∈ n[2]. F(c) is a weight on the type of contact. In the absence of missing NOESY peaks, contacts involving adjacent amino acids should have a NOESY peak match, so it is natural for adjacent amino acid contacts to have higher weight than nonadjacent. Φ is the same as the one defined in the peak walking mathematical model.
Constraints
 1.
Each amino acid a is assigned to at most one HSQC peak. This is ${\sum}_{h}{X}_{a,h}\le 1$.
 2.
Each HSQC peak h is assigned to at most one amino acid. This is ${\sum}_{a}{X}_{a,h}\le 1$.
 3.
Each contact c is assigned to at most one NOESY peak match. This is ${\sum}_{n}{X}_{c,n}\le 1$.
 4.
Each NOESY peak p ∈ n[2] of NOESY peak match n is assigned to at most one contact. This is ${\sum}_{c,n\left[0\right],n\left[1\right]}{X}_{c,n}\le 1$.
 5.
Each pair of HSQC peaks n[0], n[1] of NOESY peak match n has at most one NOESY peak. This is ${\sum}_{c,n\left[2\right]}{X}_{c,n}\le 1$.
 6.Contact c is assigned to NOESY peak match n if and only if amino acid c[0] is assigned to HSQC peak n[0], and c[1] is assigned to n[1]. This constraint is similar to the if and only if constraint in our previous work.
 (a)
$\forall c,\forall h,{\sum}_{nh\in n\left[0\right]}{X}_{c,n}\le {X}_{c\left[0\right],h}$
 (b)
$\forall c,\forall h,{\sum}_{nh\in n\left[1\right]}{X}_{c,n}\le {X}_{c\left[1\right],h}$
 (a)
 7.
Each H^{ α } proton, z_{ a } of amino acid a, is assigned to at most one induced H^{ α } peak, y_{ h } of HSQC peak h. Let ${b}_{{z}_{a,yh}}=1\leftrightarrow {\sum}_{c,n\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}{z}_{a}\in c\left[1\right],{y}_{h}\in n\left[1\right]}{X}_{c,n}\ge 1$ be a reified constraint, where ${b}_{{z}_{a,yh}}=1$ if z_{ a } is assigned to y_{ h }. The summation is over all X_{ c,n } that contain z_{ a } and y_{ h }. The constraint is then $\forall {z}_{a},{\sum}_{{y}_{h}}{b}_{{z}_{a},yh}\le 1$.
 8.
Each induced H^{ α } peak, y_{ h } of HSQC peak h, is assigned to at most one H^{ α } proton. This constraint is $\forall {y}_{h},{\sum}_{{z}_{a}}{b}_{{z}_{a},yh}\le 1$.
Multiple assignment possibilities
Similar to PeakWalker, multiple solutions corresponding to different assignment possibilities were generated. From the multiple solutions, a consensus assignment was generated by running the above BIP with w_{1}(X_{ a,h }) equal to the number of times amino acid a was assigned to peak h and w_{2}(X_{ c,n }) equal to the number of times contact c was assigned to NOESY peak match n.
PeakWalker results
Comparison between Greedy and PeakWalker
Protein  Method  Num Correct  Num Correct Range  Acc (%)  Acc Range (%)  Avg Num Res/Peak 

hBcl_{XL}  Greedy  110.9  110111  95.7  9496.5  1 
Greedy  111.1  111112  90.7  89.591.7  1.7  
PeakW  116.3  116117  96.8  95.997.5  1.4  
Ubch5b  Greedy  114.6  113115  94.2  91.596.7  1 
Greedy  116.9  116118  94.4  93.595.2  1.2  
Greedy  120.8  120122  97.2  9698.4  1.5  
PeakW  120.4  119123  98.1  96.099.2  1.2  
Histone H1^{ U }  Greedy  78.1  7683  91.4  89.493.3  1 
Greedy  83.0  8383  95.5  94.396.5  1.5  
PeakW  85.1  8586  99.3  97.7100  1.3  
Histone H1^{ A }  Greedy  72.0  7272  82.8  80.983.7  2.0 
PeakW  76.0  7676  88.8  87.489.4  1.3 
The peak lists of hBcl_{XL} contained the most errors among the proteins. Out of 136 peaks in the reference, only 114 had a complete path without any missing peaks between the reference and target. 12 residues did not have any peak in the reference list, but had peaks in the other lists. There was one residue with a jump of length 2, and 3 residues with a jump of length 3. There were no jumps longer than 3. Despite not explicitly modeling jumps of length 3, on average PeakWalker got 2.4 of those mappings correct. For UbcH5B, all the target peaks had corresponding peaks in the reference. There were 2 jumps of length 2, and 4 jumps of length 3. On average, PeakWalker got 3.2 out of those 4 correct. There were no jumps in histone H1.
We also tested hBcl_{XL} using only 6 peak lists instead of 11 by taking every other list. This corresponds to performing fewer NMR experiments. The accuracy decreased slightly to 95.7% with 114.9 correct predictions. hBcl_{XL} was also tested with no overlapped peaks merged in the input. This corresponds to the result if all overlapped peaks could be predicted. For this test, at a cost of optimality, the gap was set to 4% to keep the run time to less than 5 mins per trial on an Intel Core 2 Duo T9300 laptop with 3 GB RAM. Nevertheless, the accuracy was 98.7% with an average number of correct mappings of 138.0 (an increase of over 21), at an average of 1.7 residues per peak. This indicates that peak overlap can hide many peak mappings, which can be a problem if these residues are involved in binding. However, binding residues tend to have chemical shift changes upon binding, so to completely hide such a residue, every time it moves there must exist at least another peak with similar chemical shift to overlap it. In the case of hBcl_{XL}, peak overlap masked only the target peak of one known binding residue with significant shift changes, but the residue's other peaks were not masked.
Results for PeakWalker on hBcl_{XL} with various noise levels
Noise (%)  Num Correct  Num Correct Range  Acc (%)  Acc Range (%) 

0  116  116116  96.7  96.796.7 
10  116.3  116117  96.8  95.997.5 
20  115.8  115117  95.8  9597.5 
30  115.5  114116  94.9  91.997.5 
40  115.2  114116  95.3  93.496.6 
50  115.2  113117  93.5  91.195.1 
PeakAssigner results
To compare the combined NOE and resonance assignment approach of PeakAssigner with the method in our previous work, we ran both on data simulated from the structures [PDB:1KA5], [PDB:1EGO], [PDB:1G6J], [PDB:1SGO], and [PDB:1YYC], which were part of the test set in our previous work. Rather than using the simulated data provided by the authors of the CR method, which was done in our previous work, we simulated the data ourselves so that we could trace the results back to the data. In this test, the mappings of all peaks contained the correct residue and the H^{ α } assignments were known. NOESY peaks were simulated using chemical shift data from the protein's BMRB entry: [BMRB:2030] for 1KA5, [BMRB:491] for 1EGO, [BMRB:5387] for 1G6J, [BMRB:6052] for 1SGO, and [BMRB:6515] for 1YYC.
Comparison between the old assignment method in [19] and the new method
PDB ID  1KA5  1EGO  1G6J  1SGO  1YYC 

Noise (X)  5.6  5.3  3.8  8.2  7.6 
Acc New (%)  100  95.6  93.5  87.9  95.2 
Range New (%)  100  89.9100  93.194.4  81.899.3  9099.4 
Acc Old (%)  100  92.8  94.4  86.7  89.4 
Range Old (%)  100  89.997.5  94.494.4  76.796.3  75.596.8 
NOE Acc (%)  92.6  89.0  94.2  86.6  89.8 
Range NOE (%)  91.094.0  79.494.7  92.396.2  81.893.5  84.994.2 
Onetoone resonance assignment results from PeakWalker input
Protein  UbcH5B  Histone H1  hBcl_{XL}  hBcl_{XL}* 

Num Correct  119.5  66.2  101.9  99.8 
Num Correct Range  119120  6567  101103  99101 
Acc (%)  98.0  94.7  95.6  94.5 
Acc Range (%)  97.598.4  92.995.7  94.497.2  93.595.2 
Num H^{ N }  H^{ N } Correct  157  114  116.1  118.3 
Acc H^{ N }  H^{ N } (%)  92.7  90.9  90.0  86.3 
Num H^{ N }  H^{ α } Correct  168.2  104.9  128.6  0 
Acc H^{ N }  H^{ α } (%)  75.6  65.4  63.1  0 
Despite using ambiguous induced H^{ α }chemical shift assignments for each HSQC peak, the accuracies of the H^{ N }H^{ α } assignments are over 60%, even with a 5% H^{ α } missing rate. Nevertheless, the results for hBcl_{XL} that used only H^{ N }H^{ N } indicate that resonance assignment accuracy is not necessarily impacted significantly if H^{ α } is not used.
Onetoone resonance assignment results for hBcl_{XL} with different input manytoone mappings
Num Correct Input  111/123  111/123  111/115  111/115 

Avg Num Res/Peak  2.3  3.3  2  3 
Num Correct Output  92.2  86.3  95.1  94.3 
Num Correct Range  9095  8093  9397  9396 
Acc (%)  91.8  86.5  94.6  94.3 
Acc Range (%)  89.195.0  79.293.0  92.196.9  92.196.0 
Even with NOE information, a onetoone mapping for all residues is not always possible. Our approach, however, facilitates an iterative semiautomated approach. Once assignments and paths have been verified, perhaps using additional information, the variables corresponding to those peaks and residues can be removed from the BIPs, and then PeakWalker and PeakAssigner can be rerun. Both programs can return multiple nearoptimal solutions to account for ambiguity.
Conclusions
We also tested our method on the protein calmodulin to test the slow exchange case, and to identify problems for future work. Currently, we are not aware of any automated methods for slow exchange. In general, peak tracking is more difficult here because there are no intermediate peaks to track peak movements in increments, and the number of peaks in the spectra can be almost double the number in fast exchange.
We generated peak lists using chemical shifts in [BMRB:6541] (free form) and [BMRB:15624] (bound form). Four peak lists with saturation levels 0:1, 1:4, 3:4, and 1:1 were generated. Residues with backbone N and H^{ N } chemical shifts in 6541 and 15624 within 0.5 ppm and 0.05 ppm were assumed to have only one peak rather than two. Peak intensities were generated based on the saturation levels. For the case with no noise peaks and no errors, except for 3 residues present in 6541, but not in 15624, the Greedy method performed poorly at less than 100 residues correct out of 143 with 7.6 peaks/residue. PeakWalker performed even worst, which is expected since both methods do not model slow exchange. Cutoffs of 2.0 ppm for the N chemical shifts and 0.4 ppm for H^{ N } were used.
In the Methods section, we describe a mathematical model for slow exchange, which uses the peak intensities. Using an intensity cutoff of a 15% difference from the expected intensity ratio, the method got 132 correct at 5.7 peaks/residue. The 11 incorrect had chemical shift changes outside the 2.0, 0.4 ppm cutoffs. Unfortunately, calmodulin undergoes a large conformational change upon binding its target peptide (hinge motion in a long helix), and those 11 residues are important for binding and conformational change. A 4.0, 0.8 ppm cutoff would be needed to cover the chemical shift changes of all residues, but this will result in a prohibitive number of possible peaks per residue. Preliminary results of using an iterative approach of using both PeakWalker and PeakAssigner was successful only for the case with no errors, no noise, and no missing NOESY peaks (137 correct onetoone mappings). Here, we used contact information from the free form structure [PDB:1EXR], we fixed residuepeak assignments supported by NOESY peaks and paths that occurred frequently, and we increased the cutoffs in increments. For the case with errors, which is the norm, additional NMR data, such as NOESY data for the contacts between the protein and ligand, will likely be needed to reduce ambiguity.
It would be ideal to automate 3D structure determination of the bound protein for proteins that can undergo conformational change upon binding under either fast or slow exchange using limited NMR data, and a 3D structure of the free form or a homologous structure of the bound form. Currently this is a very challenging computational problem that involves protein folding and flexible proteinligand docking, while satisfying constraints derived from limited experimental data.
Drug screening is expensive in terms of both time and money. Although much progress remains to be made, our mathematical modeling approaches for automating chemical shift mapping using limited data are steps towards highthroughput NMR studies.
Availability
The Java source code is available by request to the corresponding author.
Methods
PeakWalker and PeakAssigner were tested on hBcl_{XL}, UbcH5B, and histone H1. This section describes the test data and the errors that were introduced. The mathematical model for the slow exchange case is also given.
Peak lists
The hBcl_{XL} data set consisted of 11 peak lists. The reference peak list contained 148 peaks, while the target contained 142. UbcH5B consisted of 5 peak lists. The reference contained 127 peaks, while the target also contained 127. Histone H1 consisted of 2 peak lists. The reference contained 97 peaks, while the target contained 86. Unlike the other proteins, the assignment for Histone H1 was unknown, so we performed the chemical shift mapping manually to obtain a reference solution. Due to ambiguities inherent with chemical shift mapping, especially using only 2 peak lists, we produced both an ambiguous mapping, and for testing purposes, our best guess unambiguous mapping.
The peak lists of hBcl_{XL}, UbcH5B, and histone H1 were edited to introduce errors. To obtain errors due to overlapped peaks, peaks within the same peak list that have Δδ_{ N } ≤ 0.1 ppm and $\mathrm{\Delta}{\delta}_{{H}^{N}}\le 0.01$ ppm were merged into a single peak. Such peaks would likely appear as a single peak when viewing the spectra. Multiple peaks could be merged into a single peak. Such merges in the target list will result in at most only one of the peaks being mapped. In hBcl_{XL}, 5 residues had identical chemical shifts in the target list. After merging, hBcl_{XL} had 136 peaks in the reference list and 122 in the target. UbcH5B had 127 in the reference and 123 in the target. There were no changes to the Histone H1 lists. To simulate noise peaks, in each peak list, we introduced noise peaks in the range of the N and H^{ N } chemical shifts, 99133 ppm and 6.2510.75 ppm, respectively. Unless stated otherwise, the number of noise peaks added to each peak list is equal to 10% of its size prior to the addition.
NOESY peak simulation
NOESY peaks were simulated using the contacts in the 3D structure (within 4.5Å), N and H^{ N } chemical shift values from the target peak list, and H^{ α } chemical shift values from either ShiftX predictions [32] or from the BMRB. For hBcl_{XL}, we used the protein threading server LOMETS [33], to obtain a 3D structure. The structure chosen among the possibilities returned by LOMETS was the one that used [PDB:1LXL] as the threading template. It consisted of 178 residues after the flexible loop region was removed. ShiftX was used to obtain the H^{ α } chemical shift values. For Ubch5b, we used the structure named "ubch5bnot4_1.pdb" that was provided with the peak lists, and ShiftX for the H^{ α } chemical shifts. The structure consisted of 147 residues. For histone H1, we used [PDB:1UST] for the structure and [BMRB:6161] for the H^{ α }chemical shifts. It consisted of 92 residues.
A global offset to calibrate the N, H^{ N } chemical shifts of the NOESY against the same shifts in the target HSQC is assumed to have already been obtained from a calibration step, so we simulated only local calibration errors. Local calibration noise, randomly distributed between 0 and 0.15 ppm for N, 0 and 0.015 ppm for H^{ N }, were introduced to NOESY peaks. Compared to resonance assignment, global calibration can be performed manually relatively quickly. Similar to our previous work, missing interresidue contacts were introduced with the following probabilities (0, 0.05, 0.21, 0.41, 0.51) for contacts within the following distances (1.0, 2.0, 3.0, 4.0, 4.5)Å, respectively. Missing intraresidue H^{ N }H^{ α }contacts were introduced with probability 0.05. With size 10% of the number of NOESY peaks, NOESY peaks corresponding to noise were added in the range 99133 ppm for N, 6.2510.75 ppm for H^{ N }, and 26 ppm for H^{ α }.
Protein 3D structures
The 3D structure for NOESY peak simulation shall be referred to as the target structure. This structure corresponds to the NMR data and is unknown to the assignment algorithm. The homologous structure used as input to the assignment algorithm shall be referred to as the template structure. The homologymodeling server SWISSMODEL [34–36] was used to obtain the templates. Reduce [37] was used to add the coordinates of hydrogen atoms to the templates. As input to SWISSMODEL, the template used for hBcl_{XL} was [PDB:3FDL]. It consisted of 154 residues. Residues 27 to 82 were not present in the file. The 3D superposition between the target and template is 13.6Å. However, if only residues 85194 are considered, the structure alignment is 2.3Å according to the program CE [38]. The template for Ubch5b was [PDB:2ESK], which consisted of 147 residues. The superposition is 2.4Å, where all residues are aligned. The template for histone H1 was [PDB:1YQA], which consisted of 85 residues. The superposition is 4.9Å, but the structure alignment is 2.0Å, using residues 982.
Mathematical model for slow exchange peak tracking
Similar to the fast exchange case, we model slow exchange as a kdimensional matching problem. The difference is that we allow vertices in the graph to represent two peaks in addition to one; and in the scoring function, we consider for a pair of peaks their intensities relative to the concentration ratio of the protein and ligand.
Similar to the fast exchange case, a linear objective function is maximized subject to linear constraints and binary variables.
Binary variables
The variables represent the transitions/edges between vertices, where each vertex represents a peak or a pair of peaks in some state and from some peak list.

X_{ hish's' }Equals to 1 if peak h ∈ T_{ i }in state s , where s ∈{free, freebound, bound} or a pair of peaks h_{ a }, h_{ b }for s = split, transitions to peak h' ∈ T_{i+1} in state s' or a pair of peaks ${h}_{a}^{\prime},{h}_{b}^{\prime}$ for s' = split, where s' ∈ {free, freebound, bound, missing}. For s' = missing, h' is empty.
Objective function coefficients
The scores of the transitions depends on the states.

$C\left({X}_{hi\left[\mathsf{\text{free}}\right]{h}^{\prime}\left[\mathsf{\text{free}}\right]}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}^{\prime},h\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}^{\prime},h\right),0,0.025\right)$, where Φ is the same as the one defined in the mathematical model for fast exchange.

$C\left({X}_{hi\left[\mathsf{\text{free}}\right]{{h}^{\prime}}_{a}{{h}^{\prime}}_{b}\left[\mathsf{\text{freebound}}\right]}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}_{a}^{\prime},h\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}_{a}^{\prime},h\right),\phantom{\rule{2.77695pt}{0ex}}0,0.025\right)+\mathrm{\Phi}\left(\frac{I\left({{h}^{\prime}}_{a}\right)}{I\left({{h}^{\prime}}_{a}\right)+I\left({{h}^{\prime}}_{b}\right)}{R}_{i},0,0.15\right)$, where I(·) gives the intensity of the given peak, R_{ i }is the expected intensity ratio based on the concentration ratio of ligand to protein, and ${h}_{a}^{\prime}$ is closer to h than ${h}_{b}^{\prime}$ is to h based on Δδ_{ NH }.

C(X_{hi[free]h'[freebound]}) = 0.001. Since the chemical shift of h' can be very different from h for a given residue, we set this score to be a small constant.

$C\left({X}_{{h}_{a}{h}_{b}\left[\mathsf{\text{freebound}}\right]{{h}^{\prime}}_{a}{{h}^{\prime}}_{b}\left[\mathsf{\text{freebound}}\right]}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}_{a}^{\prime},{h}_{a}\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}_{a}^{\prime},{h}_{a}\right),0,0.025\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}_{a}^{\prime},{h}_{b}\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}_{b}^{\prime},{h}_{b}\right),0,0.025\right)+\mathrm{\Phi}\left(\frac{I\left({{h}^{\prime}}_{a}\right)}{I\left({{h}^{\prime}}_{a}\right)+I\left({{h}^{\prime}}_{b}\right)}{R}_{i},0,0.15\right)$, where ${h}_{a}^{\prime}$ is closer to h_{ a }than to h_{ b }.

$C\left({X}_{{h}_{a}{h}_{b}\left[\mathsf{\text{freebound}}\right]{{h}^{\prime}}_{b}\left[\mathsf{\text{bound}}\right]}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}_{b}^{\prime},{h}_{b}\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}_{b}^{\prime},{h}_{b}\right),0,0.025\right)$, where ${h}_{b}^{\prime}$ is closer to h_{ b }than to h_{ a }.

$C\left({X}_{{h}_{b}\left[\mathsf{\text{bound}}\right]{{h}^{\prime}}_{b}\left[\mathsf{\text{bound}}\right]}\right)=\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{N}\left({h}_{b}^{\prime},{h}_{b}\right),0,0.25\right)+\mathrm{\Phi}\left(\mathrm{\Delta}{\delta}_{{H}^{N}}\left({h}_{b}^{\prime},{h}_{b}\right),0,0.025\right)$
Constraints

Define the following auxiliary variables for each vertex. ${O}_{his}={\sum}_{{h}^{\prime}{s}^{\prime}}{X}_{his{h}^{\prime}{s}^{\prime}}$, which represents the sum of the variables corresponding to the outedges from vertices that contain peak h ∈ T_{ i }in state s. ${I}_{his}={\sum}_{{h}^{\prime}{s}^{\prime}}{X}_{{h}^{\prime}\left[i1\right]{s}^{\prime}hs}$, which represents the sum of the variables corresponding to the inedges into vertices that contain peak h ∈ T_{ i }in state s.

The number of inedges, and the number of outedges is bounded by one to prevent path overlap. This is I_{ his }≤ 1 and O_{ his }≤ 1, respectively.

Analogous to the fastexchange case, we have the number of inedges equal to the number of outedges. This is O_{ his }= I_{ his }.

Define the following auxiliary variables for each peak. ${O}_{hi}={\sum}_{s{h}^{\prime}{s}^{\prime}}{X}_{his{h}^{\prime}{s}^{\prime}}$, which represents the sum of the variables corresponding to the outedges from vertices that contain peak h ∈ T_{ i }in any state. ${I}_{hi}={\sum}_{{h}^{\prime}{s}^{\prime}s}{X}_{{h}^{\prime}\left[i1\right]{s}^{\prime}hs}$, which represents the sum of the variables corresponding to the inedges into vertices that contain peak h ∈ T_{ i }in any state.

Since a vertex can contain more than one peak, to ensure that each peak gets assigned to at most one state and path, we have I_{ hi }≤ 1, O_{ hi }≤ 1, and O_{ hi }= I_{ hi }.
Declarations
Acknowledgements
We would like to thank Babak Alipanahi, Thorsten Dieckmann, Yay Duangkham, Guy Guillemette, and Mike Piazza for thoughtful discussions. This work is partially supported by NSERC Grant OGP0046506, China's MOST 863 Grant 2008AA02Z313, Canada Research Chair program, MITACS, an NSERC Collaborative Grant, Premier's Discovery Award, SHARCNET, the Cheriton Scholarship, and a grant from King Adbullah University of Science and Technology.
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 3, 2012: ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/13/S3.
Authors’ Affiliations
References
 Sakakibara D, Sasaki A, Ikeya T, Hamatsu J, Hanashima T, Mishima M, Yoshimasu M, Hayashi N, Mikawa T, Wälchli M, Smith BO, Shirakawa M, Güntert P, Ito Y: Protein structure determination in living cells by incell NMR spectroscopy. Nature. 2009, 458 (7234): 102105. 10.1038/nature07814.View ArticlePubMedGoogle Scholar
 Serber Z, Corsini L, Durst F, Dötsch V: Incell NMR spectroscopy. Methods Enzymol. 2005, 394: 1741.View ArticlePubMedGoogle Scholar
 Zuiderweg ERP: Mapping proteinprotein interactions in solution by NMR spectroscopy. Biochemistry. 2002, 41: 17. 10.1021/bi011870b.View ArticlePubMedGoogle Scholar
 Mittermaier A, Kay LE: New tools provide new insights in NMR studies of protein dynamics. Science. 2006, 312 (5771): 224228. 10.1126/science.1124964.View ArticlePubMedGoogle Scholar
 Pellecchia M, Bertini I, Cowburn D, Dalvit C, Giralt E, Jahnke W, James TL, Homans SW, Kessler H, Luchinat C, Meyer B, Oschkinat H, Peng J, Schwalbe H, Siegal G: Perspectives on NMR in drug discovery: a technique comes of age. Nat Rev Drug Discov. 2008, 7 (9): 738745. 10.1038/nrd2606.PubMed CentralView ArticlePubMedGoogle Scholar
 Hajduk PJ: SAR by NMR: putting the pieces together. Mol Interv. 2006, 6 (5): 266272. 10.1124/mi.6.5.8.View ArticlePubMedGoogle Scholar
 Shuker SB, Hajduk PJ, Meadows RP, Fesik SW: Discovering highaffinity ligands for proteins: SAR by NMR. Science. 1996, 274 (5292): 15311534. 10.1126/science.274.5292.1531.View ArticlePubMedGoogle Scholar
 Hajduk PJ, Greer J: A decade of fragmentbased drug design: strategic advances and lessons learned. Nat Rev Drug Discov. 2007, 6 (3): 211219. 10.1038/nrd2220.View ArticlePubMedGoogle Scholar
 Alipanahi B, Gao X, Karakoc E, Donaldson L, Li M: PICKY: a novel SVDbased NMR spectra peak picking method. Bioinformatics. 2009, 25 (12): 268275. 10.1093/bioinformatics/btp225.View ArticleGoogle Scholar
 Pellecchia M, Sem DS, Wüthrich K: NMR in drug discovery. Nat Rev Drug Discov. 2002, 1 (3): 211219. 10.1038/nrd748.View ArticlePubMedGoogle Scholar
 Krishnamoorthy J, Yu VCK, Mok YK: AutoFACE: an NMR based binding site mapping program for fast chemical exchange proteinligand systems. PLoS One. 2010, 5 (2): e894310.1371/journal.pone.0008943.PubMed CentralView ArticlePubMedGoogle Scholar
 Peng C, Unger SW, Filipp FV, Sattler M, Szalma S: Automated evaluation of chemical shift perturbation spectra: New approaches to quantitative analysis of receptorligand interaction NMR spectra. J Biomol NMR. 2004, 29 (4): 491504.View ArticlePubMedGoogle Scholar
 Fukui L, Chen Y: NvMap: automated analysis of NMR chemical shift perturbation data. Bioinformatics. 2007, 23 (3): 378380. 10.1093/bioinformatics/btl585.PubMed CentralView ArticlePubMedGoogle Scholar
 Damberg CS, Orekhov VY, Billeter M: Automated analysis of large sets of heteronuclear correlation spectra in NMRbased drug discovery. J Med Chem. 2002, 45 (26): 56495654. 10.1021/jm020866a.View ArticlePubMedGoogle Scholar
 Utrecht NMR Research Group: Analysis of NMR titration data and docking results in the study of biomolecular complexes. 2011, [http://www.nmr.chem.uu.nl/~abonvin/tutorials/TitrationData/titration.html]Google Scholar
 Mok YK: AutoFACE download. 2010, [http://www.dbs.nus.edu.sg/staff/henry.htm]Google Scholar
 Stevens T: CcpNmr analysis tutorials. 2011, [http://www.ccpn.ac.uk/ccpn/software/ccpnmranalysis/tutorials/threedaycourse]Google Scholar
 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
 Jang R, Gao X, Li M: Towards fully automated structurebased NMR resonance assignment of 15Nlabeled proteins from automatically picked peaks. J Comput Biol. 2011, 18 (3): 347363. 10.1089/cmb.2010.0251.View ArticlePubMedGoogle Scholar
 Stratmann D, Guittet E, van Heijenoort C: Robust structurebased resonance assignment for functional protein studies by NMR. J Biomol NMR. 2010, 46 (2): 157173. 10.1007/s1085800993903.PubMed CentralView ArticlePubMedGoogle Scholar
 Apaydin M, Catay B, Patrick N, Donald B: NVRBIP: nuclear vector replacement using binary integer programming for NMR structurebased assignments. The Computer Journal. 2010, bxp120Google Scholar
 Xiong F, Pandurangan G, BaileyKellogg C: Contact replacement for NMR resonance assignment. Bioinformatics. 2008, 24 (13): i205i213. 10.1093/bioinformatics/btn167.PubMed CentralView ArticlePubMedGoogle Scholar
 Langmead C, Yan A, Lilien R, Wang L, Donald B: A polynomialtime nuclear vector replacement algorithm for automated NMR resonance assignments. J Comput Biol. 2004, 11 (23): 277298. 10.1089/1066527041410436.View ArticlePubMedGoogle Scholar
 BaileyKellogg C, Widge A, Kelley JJ, Berardi MJ, Brushweller JH, Donald BR: The NOESY jigsaw: automated protein secondary structure and mainchain assignment from sparse, unassigned NMR data. J Comput Biol. 2000, 7: 537558. 10.1089/106652700750050934.View ArticlePubMedGoogle Scholar
 Kann V: Maximum bounded 3dimensional matching is MAX SNPcomplete. Inf Process Lett. 1991, 37: 2735. 10.1016/00200190(91)90246E.View ArticleGoogle Scholar
 Zuckerman D: On unapproximable versions of NPcomplete problems. SIAM J Comput. 1996, 25 (6): 12931304. 10.1137/S0097539794266407.View ArticleGoogle Scholar
 Kuhn HW: The Hungarian method for the assignment problem. 2010View ArticleGoogle Scholar
 Schumann FH, Riepl H, Maurer T, Gronwald W, Neidig KP, Kalbitzer HR: Combined chemical shift changes and amino acid specific chemical shift mapping of proteinprotein interactions. J Biomol NMR. 2007, 39 (4): 275289. 10.1007/s108580079197z.View ArticlePubMedGoogle Scholar
 Danna E, Fenelon M, Gu Z, Wunderling R: Generating multiple solutions for mixed integer programming problems. Integer Programming and Combinatorial Optimization. 2007, 280294.View ArticleGoogle Scholar
 Williams HP: Model Building in Mathematical Prog. 1999, WileyGoogle Scholar
 Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H, Markley JL: BioMagResBank. Nucleic Acids Res. 2008, 36 (Database issue): D402D408.PubMed CentralPubMedGoogle Scholar
 Neal S, Nip AM, Zhang H, Wishart DS: Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biol NMR. 2003, 26 (3): 215240. 10.1023/A:1023812930288.View ArticleGoogle Scholar
 Wu S, Zhang Y: LOMETS: a local metathreadingserver for protein structure prediction. Nucleic Acids Res. 2007, 35 (10): 33753382. 10.1093/nar/gkm251.PubMed CentralView ArticlePubMedGoogle Scholar
 Arnold K, Bordoli L, Kopp J, Schwede T: The SWISSMODEL workspace: a webbased environment for protein structure homology modelling. Bioinformatics. 2006, 22 (2): 195201. 10.1093/bioinformatics/bti770.View ArticlePubMedGoogle Scholar
 Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T: The SWISSMODEL Repository and associated resources. Nucleic Acids Res. 2009, 37 (Database issue): D387D392.PubMed CentralView ArticlePubMedGoogle Scholar
 Peitsch MC: Protein modeling by email. Nat Biotechnol. 1995, 13 (7): 658660. 10.1038/nbt0795658.View ArticleGoogle Scholar
 Word JM, Lovell SC, Richardson JS, Richardson DC: Asparagine and glutamine: using hydrogen atom contacts in the choice of sidechain amide orientation. J Mol Biol. 1999, 285: 17351747. 10.1006/jmbi.1998.2401.View ArticlePubMedGoogle Scholar
 Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998, 11 (9): 739747. 10.1093/protein/11.9.739.View ArticlePubMedGoogle Scholar
 Kozakov D, Hall DR, Beglov D, Brenke R, Comeau SR, Shen Y, Li K, Zheng J, Vakili P, Paschalidis IC, Vajda S: Achieving reliability and high accuracy in automated protein docking: ClusPro, PIPER, SDU, and stability analysis in CAPRI rounds 1319. Proteins. 2010, 78 (15): 31243130. 10.1002/prot.22835.PubMed CentralView ArticlePubMedGoogle Scholar
 Kozakov D, Brenke R, Comeau SR, Vajda S: PIPER: an FFTbased protein docking program with pairwise potentials. Proteins. 2006, 65 (2): 392406. 10.1002/prot.21117.View ArticlePubMedGoogle Scholar
 Mukherjee S, Zhang Y: MMalign: a quick algorithm for aligning multiplechain protein complex structures using iterative dynamic programming. Nucleic Acids Res. 2009, 37 (11): e8310.1093/nar/gkp318.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.