Skip to main content
  • Methodology article
  • Open access
  • Published:

Optimization algorithms for functional deimmunization of therapeutic proteins

Abstract

Background

To develop protein therapeutics from exogenous sources, it is necessary to mitigate the risks of eliciting an anti-biotherapeutic immune response. A key aspect of the response is the recognition and surface display by antigen-presenting cells of epitopes, short peptide fragments derived from the foreign protein. Thus, developing minimal-epitope variants represents a powerful approach to deimmunizing protein therapeutics. Critically, mutations selected to reduce immunogenicity must not interfere with the protein's therapeutic activity.

Results

This paper develops methods to improve the likelihood of simultaneously reducing the anti-biotherapeutic immune response while maintaining therapeutic activity. A dynamic programming approach identifies optimal and near-optimal sets of conservative point mutations to minimize the occurrence of predicted T-cell epitopes in a target protein. In contrast with existing methods, those described here integrate analysis of immunogenicity and stability/activity, are broadly applicable to any protein class, guarantee global optimality, and provide sufficient flexibility for users to limit the total number of mutations and target MHC alleles of interest. The input is simply the primary amino acid sequence of the therapeutic candidate, although crystal structures and protein family sequence alignments may also be input when available. The output is a scored list of sets of point mutations predicted to reduce the protein's immunogenicity while maintaining structure and function. We demonstrate the effectiveness of our approach in a number of case study applications, showing that, in general, our best variants are predicted to be better than those produced by previous deimmunization efforts in terms of either immunogenicity or stability, or both factors.

Conclusions

By developing global optimization algorithms leveraging well-established immunogenicity and stability prediction techniques, we provide the protein engineer with a mechanism for exploring the favorable sequence space near a targeted protein therapeutic. Our mechanism not only helps identify designs more likely to be effective, but also provides insights into the interrelated implications of design choices.

Background

The majority of all therapeutic proteins elicit an anti-biotherapeutic immune response (aBIR) in human patients receiving treatment [1]. The clinical effects of such a response may include various rapidly manifested anaphylactic responses, a reduction of therapeutic efficacy, and in rare cases cross-reactivity of anti-drug antibodies with endogenous patient proteins resulting in a form of induced autoimmunity [2]. Wide concern over these issues has focused biopharmaceutical researchers on the immunogenicity of protein therapeutics, and has driven the search for strategies to detect, assess, and ameliorate potentially deleterious immune responses [3–5].

While there exists a variety of factors that influence a protein therapeutic's immunogenicity [6, 7], we focus here on the effect of a protein's origins. Specifically, non-human proteins exhibit a disproportionately high frequency of immunogenicity in humans as a result of the classical immune response [8]. In contrast, proteins of human origin are more likely to be recognized as "self," or to meet the "criteria of continuity" [9]. The goal is thus to engineer variants of the foreign protein that also are recognized as "self." For therapeutic antibodies, whose structure and function are well understood, immunogenicity reduction may be realized by rational grafting of key functional residues from an exogenous therapeutic antibody onto a human antibody framework [10–14]. The resulting chimeric antibody maintains the binding specificity and affinity of the exogenous therapeutic candidate, but the majority of the protein is comprised of human-derived amino acid sequences, thereby reducing the propensity for aBIR. The prevalence of chimeric and humanized antibodies among FDA approved therapeutics [15] as well as a detailed meta-analysis [16] provide overwhelming evidence for the efficacy of this approach as a whole. However, there remains a considerable empirical, trial-and-error component, even in "rational" approaches [17]. Rational grafting techniques require a precise knowledge of structure-function relationships, as well as a modular structure common to the exogenous therapeutic candidate and a homologous human protein. With the advanced state of knowledge for immunoglobulin proteins, therapeutic antibodies inherently satisfy these prerequisites. However, exogenous enzymes, signaling peptides, and other classes of non-human proteins represent a potentially massive pool of biotherapeutic agents. To effectively tap this reservoir of next generation drugs, more advanced deimmunization strategies are required to address the fact that many of these candidates do not possess common modular structures and frequently have no homologous human counterpart.

One alternative to humanization by rational grafting is the identification and modification of immunogenic peptide fragments of a protein, or T-cell epitopes, that drive the aBIR. These peptides are derived from proteolytic processing of protein that has been internalized by antigen presenting cells (Trombetta and Mellman [18] provide a detailed review). The peptide fragments are bound within the groove of type II major histocompatibility complex proteins (MHC II), which are then transported to the surface of the immune cell where the peptide-MHC II complex is displayed to the extracellular environment. Should the displayed peptides constitute immunogenic sequences, they will form ternary peptide-MHC II-T-cell receptor complexes with surface receptors of cognate white blood cells. The resulting signaling cascade leads to a coordinated immune response against the offending protein. To avoid such a response, it is sometimes possible to identify the most immunogenic peptide fragments of a candidate protein, and to subsequently mutagenize one or more of the corresponding residues so as to disrupt the peptide fragment's capacity to complex with the MHC II and/or T-cell receptors. This process has been successfully applied to numerous therapeutic candidates including staphylokinase [19], factor VIII [20], and a β-lactamase [21]. Deimmunization by epitope deletion suffers from the limitation of being exceptionally time and resource intensive. Traditionally, the approach entails synthesizing and testing the immunogenicity of large panels of peptides from the native protein, performing alanine scanning mutagenesis on the most immunogenic fragments to pinpoint critical MHC II binding residues, incorporating deimmunizing mutations into the full length protein, and finally testing the functionality and immunogenicity of the engineered protein variants, only a small fraction of which are likely to retain high activity and/or constitute globally deimmunized candidates. More advanced implementations of this strategy exchange functionally relevant mutations for alanine mutations, but only late in the experimental cycle.

Computational methods have been employed to aid the identification of mutations that can effectively eliminate MHC II binding. Often computational analyses are performed on only a small subpopulation of peptides that have been preselected from a much larger pool of possibilities [22, 23]. These approaches also typically focus on a minimal set of only the most immunogenic peptides (typically 1-3 peptides), and therefore cannot be guaranteed to provide globally optimal sequences. Alternatively, numerous computational tools have been developed for immunogenicity prediction for an entire protein, based on its amino acid sequence [24], and the efficacies of several alternative methods have been evaluated in head-to-head comparisons [25, 26]. Some such algorithms have been used to identify immunogenic peptides in practical biotherapeutics [27, 28]; our goal is to integrate such immunogenicity analyses within optimization algorithms that reduce predicted immunogenicity while accounting for structural and functional consequences.

In order to address the shortcomings of earlier approaches, this paper presents a novel protein design method in which protein sequences are computationally optimized to produce variants that are more likely to exhibit both low inherent immunogenicity and high level functionality. These are two competing concerns - mutations introduced to reduce immunogenicity may produce unstable or inactive proteins. We establish as our primary optimization objective reduction of immunogenicity, according to predicted T-cell epitopes within the sequence [25]. In order to also address the concern of stability/activity, we identify for each residue position those mutations that are deemed acceptable according to sequence and/or structure-based analyses. A dynamic programming approach then finds globally optimal and near-optimal sets of these acceptable mutations that minimize the occurrence of predicted epitopes.

Our methods provide a number of significant extensions to the state of the art. They are not limited to deimmunization of antibodies (as are simple rational grafting techniques), but can also be applied in engineering immunotolerant versions of more complex proteins, such as therapeutic enzymes. Our approach seamlessly integrates immunogenic peptide identification, mutagenic deimmunization, and functional/structural analysis of potential mutations, employing well-established and effective tools for prediction of epitopes and for evaluation of stability changes. Our dynamic programming-based algorithms are guaranteed to find globally optimal sets of mutations, avoiding the pitfall of making a mutation to mitigate one epitope but inadvertently introducing a new overlapping epitope. We provide the protein engineer with flexibility in setting a desired threshold for immunogenicity, limiting the number of mutations to consider, and in targeting specific MHC alleles. Finally, in contrast to traditional experimental and computational techniques, our methods preferentially guide mutations to the most promiscuous immunogenic amino acids, i.e., those that are elements of two or more overlapping immunogenic peptides (Fig. 1).

Figure 1
figure 1

Deimmunization overview. We employ T-cell epitope predictors to score each 9-mer peptide for potential immunogenicity. In this example (staphylokinase residues 71-87; see the Results section), four peptides are deemed immunogenic, as they are predicted to be recognized by sever-al of the 8 most representative MHC II alleles. We employ sequence and structure analysis to identify for each position which residues are acceptable; only a few examples are shown. Our algorithms select a specified number of mutations (here two mutations, underlined in the variant) from the acceptable ones, so as to minimize the resulting epitope score. Note that a single substitution at a "promiscuous" amino acid can reduce recognition of multiple overlapping epitopes, and need not be at the so-called "anchor" position.

We apply our methods to optimize variants of several different protein therapeutics that have previously been targeted for deimmunization by other approaches. We characterize the space of sequences near these targets, identifying variants that are predicted to be less immunogenic than wild-type but still stable, i.e., deleting some predicted epitopes while using only conservative substitutions. We find a number of variants that, in comparison to earlier designs, contain fewer predicted epitopes for a given number of substitutions, or, viewed the other way, use fewer substitutions to delete a similar number of epitopes. Our approach targets many of the same immunogenic regions as identified by experimental studies, even when not specifically focused. Furthermore, by restricting substitutions to be relatively conservative (as assessed under several different models), our variants are likely to maintain greater thermodynamic stability.

Methods

Our overall goal is to select, from the mutations deemed acceptable, a set that efficiently reduces the occurrence of predicted T-cell epitopes. We now formalize this problem; Fig. 1 illustrates.

Problem 1 (Deimmunization) Given a protein sequence S of length n, determine a variant S' minimizing, such that ∀i: S'[i] ∈ M(i), where

  • e : A9 → [0, 1] returns the epitope score for a peptide (we assume a 9-mer; see below) in the range of 0 to 1, where lower is better

  • M : {1, 2, ..., n} → 2Aprovides the allowed residues, indicating which amino acids (including at least the wild-type) may be considered at each residue position

Here and throughout, we use A = {A,C, ..., Y} for the set of amino acids; sequences are 1-indexed; and the notation Xi..jindicates the substring of X from position i to j, inclusive.

A number of experimentally-validated bioinformatics tools exist to predict immunogenicity (as encoded in e) and changes in stability (M). Our current implementation supports several state-of-the-art tools [29, 30], but is modular and can readily support others [31–33].

Immunogenicity evaluation

T-cell epitope predictors encapsulate the underlying specific recognition of an epitope by an MHC II protein. We focus here on the human leukocyte antigen group DR (HLA-DR) of MHC II proteins, since they are the predominant isotype. HLA-DR proteins have a recognition groove whose pockets form energetically favorable interactions with specific side-chains of peptides approximately 9 residues in length. Numerous methods are available for epitope prediction, and they have been shown to be predictive of immunogenicity [25]. For the results, we employ two quite different and complementary methods.

ProPred

Sturniolo et al. [34] experimentally measured the binding affinity between individual residues and individual pockets of the MHC II binding groove on a limited set of alleles. They then created binding profiles for untested alleles through sequence and structure alignment with tested alleles. In this "pocket profile" method, TEPITOPE, the sum of position-specific weights for each residue in a 9-mer provides a score that is compared against a threshold to determine whether or not the peptide is in a given percentile of the best-recognized peptides. The approach was experimentally validated by comparing its predictions against HLA-DR selected and nonselected peptide repertoires; up to 80% of the selected peptides were correctly predicted at a threshold that yielded < 5% false positives. Singh and Raghava then built a tool, ProPred, to expand the scope of TEPITOPE and make it more easily accessible and applicable [29]. In a recent independent evaluation [25], ProPred did quite well in epitope prediction, achieving an average 0.73 area under the curve (AUC) across 14 different alleles. ProPred has also been successfully employed in a number of different studies; e.g., it has recently helped identify antigenic sites on a mosquito midgut glycoprotein, immunoreactive peptides in prostatic acid phosphatase, and promiscuous T-cell epitopes of three major secreted antigens of Mycobacterium tuberculosis [35–37]. In all three of these examples, ProPred facilitated the rapid identification of potential vaccine targets that were then experimentally characterized in detail. In our case study of Erythropoeitin (see Results), we found a quite striking match between ProPred predictions and published ELISPOT assay immunogenicity results.

SMM-align

Nielsen et al. [30] pursued a different approach to epitope prediction, developing the SMM-align method by applying machine learning techniques to large curated databases of experimentally validated epitopes: the Immune Epitope Database IEDB [38] and SYFPEITHI [39]. While ProPred uses data from single residues binding to single MHC II pockets, SMM-align uses data from whole peptides. Furthermore, while ProPred is based on sequence and structure alignment, SMM-align is uses Gibbs sampling and a regulated least squares regression to develop position specific scoring matrices that predict the binding affinity between an epitope and MHC II allele. In the independent evaluation mentioned above [25], SMM-align also achieved a mean 0.73 AUC (SMM-align and ProPred were the top two methods).

While there are over 50 different HLA-DR alleles, we have focused on 8 common alleles (DRB1*0101, DRB1*0301, DRB1*0401, DRB1*0701, DRB1*0801, DRB1*1101, DRB1*1301, and DRB1*1501) that represent the majority of human populations world-wide [40]. Thus our epitope score is the fraction of these 8 alleles predicted to recognize a peptide. In order to evaluate the potential for finding an epitope, we scored each of the 209 possible 9-mer peptides under ProPred at a 10% threshold. We found that 1.4 · 1011 (26.63%) are predicted to be recognized by one or more alleles, including 5.7 · 109(1.12%) by all 8 alleles; see Fig. 2 for a complete histogram.

Figure 2
figure 2

Possible epitopes. Number of 9-mer peptides (out of 209 possible) recognized by exactly the number of the eight common alleles we use for epitope scoring, relative to a 10% threshold (see text).

Stability evaluation

Evaluating the effects of mutations on a protein's stability and activity is at the heart of all rational protein engineering techniques. For the results, we consider three different methods using different sources of information to determine acceptable residues likely to maintain wild-type qualities.

BLOSUM

Given sequence alone, standard substitution tables such as BLOSUM [41] can evaluate the overall acceptability of a mutation, according to substitutions in sets of natural sequences. We compute a "relative" BLOSUM-62 score - the difference between the wild-type/wild-type score (diagonal) and the wild-type/mutant score. We obtain a reasonably conservative set of acceptable residues by only taking those with score differences of at most 4.

Conservation

A set of sequences related to the target protein reveals which positions are highly conserved, and to which amino acid(s), vs. which are more variable. In turn, this is indicative of which residues are riskier to mutate and which ones are safer. The utility of sequence alignments in engineering thermostablilized and functional protein variants has been proven in numerous experimental studies [42–46]. We use a multiple sequence alignment and phylogenetic tree to compute position-specific amino acid frequencies in a family. To avoid over-counting highly-related sequences, we weight sequences using a bottom-up tree-based algorithm [47]. The weighted position-specific score for amino acid a at position i, according to a multiple sequence alignment F of sequences s with (non-normalized) weights w s is then:

(1)

We permit residues such that Ï•i, aexceeds a user-specified threshold, defaulting to -log 0.05 (i.e., 5% weighted frequency)

FoldX

When a structure is available, we employ the FoldX ΔΔG° predictor [48] to evaluate the change in free energy for each possible substitution. FoldX was demonstrated to achieve of 0.83 correlation between predicted and experimental ΔΔG° over 95% of a database after outlier removal. FoldX has since been successfully used to aid protein design, e.g., for custom DNA kinases and potential anticancer drugs [49, 50]. It is important to note that our method does not need precise ΔΔG° prediction, but only an indication of whether a possible substitution is relatively "safe" (destabilizing by at most a little bit). We allow those residues whose predicted ΔΔG° values are at most a user-specified threshold, defaulting to 0.25 kcal/mol, more than the wild-type value (i.e., the mutant is nearly as good as, or even slightly better than, the wild-type).

Our problem specification treats substitutions independently of each other. While this is certainly a simplification, as residue interactions do affect stability and activity, it enables us to more quickly generate a number of solutions that are optimal (or near-optimal) with respect to epitope score. These solutions can then be subjected to more expensive analyses for non-additive effects.

Dynamic programming algorithms

Given immunogenicity and stability predictions, represented in an epitope score e and set of allowed residues M, our goal (Problem 1) is to choose a set of mutations to minimize the total epitope score. In order to solve this problem by dynamic programming, let us define T [i, X] as the best possible total epitope score for the prefix of S ending at position i, such that the last 8 amino acids form the string X. T can be defined recursively:

(2)
(3)

where · represents concatenation.

Optimal substructure holds: the best score ending at some position with some string must extend the best score ending at the previous position with a compatible string. Thus we can solve the recurrence by dynamic programming. Ultimately we want to find the minimum value in the last column (i.e., min X T[|S|, X]), and trace back to reconstruct the sequence. One small note of practical importance: when there is a tie for the minimum in Eq. 3, we should of course keep the wild-type amino acid.

The calculation for each cell requires constant time, and in the worst case there are n · 208 cells. However, in practice we only need to fill in the entries that use allowed substitutions; if these are reasonably conservative, the table is much smaller. In the BLOSUM-based approach described above, there are an average of 3.2 amino acids to consider for each position. The Results section provides position-by-position details for a specific protein, using BLOSUM, conservation, and FoldX.

In order to restrict the total number of substitutions made, an additional column can be added to the dynamic programming table. Now define R[i, X, s] as the best possible total epitope score for the prefix of S ending at position i, such that the last 8 amino acids form the string X, and that exactly s substitutions have been made from S. R can be defined recursively:

(4)
(5)

where I{} is the indicator function, returning 1 if the predicate is true and 0 if it is false. Here we ensure that the s index of R counts the total number of substitutions, starting in the base case with the number in the N-terminal 8-mer, and then in the recursive case adding 1 iff the most C-terminal residue of X is different from the corresponding wild-type residue. The extension only affects the size of the table (scaled by a factor of n, unless s is restricted a priori); the cost for computing each cell remains constant. We can readily extend this approach to calculate an (integer) substitution score for each mutation, using s to track the total substitution score rather than the number of mutations.

While a standard dynamic programming backtrace returns a single optimal solution, there may in fact be multiple variants with the same score. It may also be beneficial to consider near-optimal variants, as it is unlikely that our epitope score and evaluation of mutations are perfect, and thus near-optimal variants are worth considering. Upon finding the set of optimal and near-optimal solutions, we can subject them to further analysis, e.g., to model the effects of multiple substitutions, or to consider the ease of construction. Furthermore, by comparing and contrasting the good variants, we can better assess the robustness of a variant (do similar substitution patterns show up among the good ones?), as well as the general utility of a substitution (does it show up in many good variants?).

The problem of extracting multiple optimal and near-optimal solutions in dynamic programming has been extensively studied, from the early days of the field [51]. It has also received attention specifically in the bioinformatics community, as dynamic programming is at the heart of sequence alignment (among other significant problems). For example, Waterman and Byers [52] modified the standard dynamic programming backtracing procedure to produce near-optimal solutions, Naor and Brutlag [53] presented an alternative approach for representing (rather than enumerating) all alignments whose score is within a factor of optimal, and Gusfield [54] explicitly accounted for the objective function parameters that yield different optimal solutions.

Our current implementation employs the approach described by Waterman and Byers [52] in order to generate multiple possible variants.

Implementation

We have implemented our method in platform-independent Java code. The program takes as input a target protein sequence, along with specifications of how to evaluate stability and immunogenicity. As discussed above, the program can evaluate stability with BLOSUM, conservation (given the family multiple sequence alignment and phylogenetic tree), or FoldX (given the position-specific ΔΔG° values output from that program), and immunogenicity with ProPred (at a user-specified 1-10% threshold) or SMM-align (at a user-specified IC50 from 50-5000). The user must indicate which methods to employ, along with any necessary inputs (MSA and tree, or FoldX output) and can adjust the thresholds for acceptable stability scores (defaults are provided as described above). The program outputs all tied-for-optimal and near-optimal variants up to a user-specified limit, along with stability and immunogenicity evaluations of each variant according to the various predictors.

The software can be freely obtained for academic use by request from the authors. A demonstration web-based version is available at http://www.cs.dartmouth.edu/~cbk/deimm/.

Results and Discussion

We demonstrate our approach by applying it to a number of proteins that have been the object of previous deimmunization efforts. We explore the favorable sequence space of these proteins by evaluating epitope score under the ProPred method at a 10% threshold, and considering allowed residues under one of BLOSUM, conservation, or FoldX. We then independently assess each variant under SMM-align for epitope score and each of the other measures for stability.

In presenting stability predictions, we separately sum the value of each metric (BLOSUM, conservation, FoldX) over all the chosen substitutions. This enables assessment of a plan under different and potentially complementary measures; developing a consensus method in the future might yield even better results. The BLOSUM score for each substitution is either 0 (allowed) or 1 (disallowed). The negative-log conservation score for a substitution ranges from roughly 0.01 to 4.61 (99% to 1% weighted frequency), with a maximum of roughly 3 (5% weighted frequency) for allowed substitutions. For FoldX, the score for a substitution ranges from roughly -3 to 3 (negative implies stabilizing), with a maximum of 0.25 for allowed substitutions.

Staphylokinase (SakSTAR)

Warmerdam et al. [19] sought to deimmunize the fibrin-selective thrombolytic agent staphylokinase, specifically the SakSTAR wild-type variant derived from a lysogenic S. aureus strain. They targeted the C3 region, spanning residues 71-87, which was recognized by 90% of the T-cells cloned from a set of donors. Based on results from alanine scanning mutagenesis, sets of 2-4 alanine substitutions were selected to produce new variants designed to reduce immunogenicity.

We applied our approach to the original wild-type 71-87 peptide, using the Staphylokinase/Streptokinase family (Pfam accession PF02821) for conservation statistics and SakSTAR crystal structure (pdb id 2SAK) for FoldX calculations. Fig. 3 shows the amount of freedom in planning, in terms of the number of allowable residues at each position under our three evaluation methods. BLOSUM is typically more conservative and is overall more uniform; conservation depends on the position-specific diversity in the family; and FoldX allows more mutations when analysis of the structure at hand indicates that they would not be too destabilizing. On average, BLOSUM permits 4.2 residues per position, conservation 6.4, and FoldX 6.9. Table 1 summarizes some of our optimized variants, one per allowed residue predicate (BLOSUM, conservation, and FoldX). Our objective function is the number of ProPred-predicted epitopes, so this number naturally decreases with the number of substitutions, though it is worth noting that each substitution actually deletes several predicted epitopes. Furthermore, the independent predictor SMM-align (not part of the objective function) likewise trends downward with an increasing number of substitutions. Since ProPred was derived from pocket profiles and sequence alignments, while SMM-align was trained on specific experimentally identified epitopes, they provide complementary assessments of immunogenicity, and their agreement suggests that we are indeed likely to be deleting actual epitopes. By comparing results for the different allowed residue predicates, we can gain insights into how best to delete these epitopes, from a stability-preservation viewpoint. For example, we see that V79 was chosen for the first substitution under all three approaches. With BLOSUM, the conservative V79T was chosen; with conservation, D79 was recognized as sufficiently common in the sequence record; and with FoldX, K79 was predicted to maintain stability. On the other hand, the three-substitution conservation-based variant eliminates all epitopes (and of course looks good from a conservation analysis), but incurs a large ΔΔG° penalty relative to the solutions from the other metrics. It is worth noting that currently only the epitope score is the objective function (though we could readily employ a linear combination with a substitution score), and the goal is to delete as many epitopes as possible using substitutions allowed by a particular predicate. Thus, for example, in order to delete more epitopes, a conservation-based design may actually end up with a larger conservation penalty than a BLOSUM-based design, by using less common substitutions (but ones still meeting the weighted 0.05% frequency threshold) that are not allowed by BLOSUM. Further insights can be gained by considering all tied-for-optimal variants (Additional file 1, Table S1). For example, we can identify commonly selected mutations, e.g., V79T and V79K, and might consider variants incorporating them to be of higher quality.

Figure 3
figure 3

Position-specific allowed residues in SakSTAR peptide. Number of allowed residues for each position of SakSTAR 71-87 by BLOSUM, conservation, and FoldX.

Table 1 SakSTAR 71-87 Peptide.

Our method identifies the favorable region of the sequence space, but a natural question is what portion of the space is favorable. In other words, are many or most variants likely to be good anyway? Fig. 4 shows the distribution of epitope scores for all 2-mutation variants of SakSTAR, using all acceptable mutations according to the BLOSUM evaluation. (Of course, with larger numbers of mutations and longer sequence, the exhaustive approach would not be feasible.) The figure makes clear that most variants have scores much worse than the optimal ones designed by our approach: the median score is 16 and only 5 of the 1338 sequences (0.37%) achieve the optimal score of 5. Thus experiment planning techniques are required, as stochastic methods are unlikely to produce high-quality variants.

Figure 4
figure 4

Exhaustive 2-mutation search scores for SakSTAR peptide. Histogram of predicted epitope scores for all 2-mutation variants of SakSTAR 71-87 under BLOSUM.

Our designs show dramatic reduction in predicted T-cell epitope content (under both ProPred and SMM-align) compared to the variants chosen by Warmerdam et al. Their variants minimally decrease, or even introduce new predicted T-cell epitopes, due in part to limitations in their selection of amino acids (using only alanine for the 2- and 3-substitution variants).

While Warmerdam et al. focused effort on the C3 region, our method is able to globally optimize an entire protein and thereby address a weakness identified in the earlier method: the "vast majority of humans recognize additional immunogenic SakSTAR regions" [19]. Fig. 5 profiles a 6-mutation full-protein variant identified by our method. Notice that even though it was not specifically targeted, the C3 immunogenic region was addressed with substitution V79D. In addition, mutations were selected in five other regions of high predicted immunogenicity. Each mutation deletes an average of 6.5 epitopes, overlapping the substituted position and/or for different MHC-II alleles. Furthermore (Table 2), all substitutions are to amino acids with weighted frequency greater than .05 at those positions in the staphylokinase family. Table 2 and Additional file 1, Table S2 detail a number of the full-protein variants for different numbers of mutations. Again the SMM-align epitope evaluation correlates very well with the optimized ProPred score, trending downward with increasing numbers of substitutions. The different allowed residue predicates all hit the C3 region (71-87) within the first few substitutions (again often picking V79), but also delete epitopes in a number of other predicted immunogenic regions (see again Fig. 5). The designs compare favorably with the Warmerdam designs in terms of both epitope predictors. The conservation-based variants tend to be particularly aggressive in deleting epitopes by choosing other residues represented in the family, but sacrifice more in predicted stability under FoldX.

Figure 5
figure 5

Full-length SakSTAR variant profile. Optimized 6-substitution full-length SakSTAR variant with ProPred epitope scoring and conservation-based substitutions. x-axis: starting position of each 9-mer; y-axis: predicted number of alleles recognizing the 9-mer. Thin black bars indicate wild-type scores and thick orange bars indicate variant scores. Note: wild-type epitope scores are always greater than or equal to corresponding variant ones; i.e., we never introduce new epitopes. Blue ellipses indicate mutated positions (refer to Table 2).

Table 2 Full-length SakSTAR.

ProPred Threshold

Epitope predictors employ thresholds in deciding to label peptides as MHC-II binders or non-binders. To illustrate our algorithm, we have employed the "loosest" ProPred threshold of 10%, erring on the side of predicting spurious epitopes instead of on the side of missing epitopes. We also evaluated plans for SakSTAR based on a tighter threshold of 5%. As expected, with the 5% threshold, ProPred predicts fewer epitopes than with the 10% threshold: SakSTAR 71-87 has 16 predicted epitopes at 10% but only 8 at 5%. At 5% our algorithm finds completely deimmunized variants for the peptide within 4 substitutions (Additional file 1, Table S3). The substitution V79T eliminates 75% of the epitopes predicted in the 71-87 peptide at the 5% threshold and 50% of those predicted at the 10% threshold (Fig. 6). For full-length SakSTAR, both thresholds predict the same regions to be immunodominant (Fig. 5 and Additional file 1, Fig. S1). Changing the threshold from 10% to 5% seems to evenly attenuate the epitope signal across the protein. Of particular significance, we note that our optimization algorithm selects exactly the same full-length 6-substitution conservation-based variant with the 5% threshold (Additional file 1, Table S4) as it did for 10% (Table 2). The plan eliminates a strikingly high proportion of epitopes, 66% at the 10% threshold and 88% at 5%.

Figure 6
figure 6

SakSTAR peptide with ProPred 5% and 10% thresholds. Optimized SakSTAR peptide variant with ProPred epitope scoring at 5% (top) and 10% (bottom) thresholds. x-axis: starting position of each 9-mer; y-axis: predicted number of alleles recognizing the 9-mer. Thin black bars indicate wild-type scores and thick orange bars indicate variant scores. Note: wild-type epitope scores are always greater than or equal to corresponding variant ones; i.e., we never introduce new epitopes. The blue ellipse indicates BLOSUM-based substitution V79T.

Allele Analysis

A detailed analysis of predicted SakSTAR epitopes by binding allele shows that our 6-substitution conservation-based variant eliminates some of the epitopes predicted for each different allele (Figs. 7 and 8). At the ProPred 5% threshold, our design eliminates all epitopes predicted to bind to alleles HLA-DRB1*0101 and HLA-DRB1*1501. Total allele elimination does not occur at the 10% threshold, although in the variant, alleles 0101 and 1501 are predicted to bind only 1 and 2 epitopes, respectively. The plots further underscore the observation that the 5% and 10% thresholds yield similar epitope profiles across the whole protein both by sequence and by allele. As mentioned above, the optimal deimmunizing mutations are identical for plans under both thresholds, but a greater percentage of predicted epitopes are eliminated at the 5% threshold. In general, it is easier to eliminate an epitope that lies between the 5% and 10% threshold than one that exceeds the 10% threshold. For example, in the 5% plan, the V79T mutation eliminates 3 of 4 epitopes beginning at residue 76, but none of these four epitopes are eliminated at the 10% threshold.

Figure 7
figure 7

Full-length SakSTAR epitope analysis by allele with ProPred 5% threshold. Optimized 6-substitution SakSTAR variant with ProPred epitope scoring (5% threshold) and conservation-based substitutions. x-axis: sequence position; y-axis: HLA-DRB1* allele recognizing the 9-mer. Lines: 9-residue extent of epitopes in the wild-type; cross-hatched lines: epitopes remaining in the variant. Blue ellipses indicate mutated positions (refer to Additional File 1 Table S4).

Figure 8
figure 8

Full-length SakSTAR epitope analysis by allele with ProPred 10% threshold. Optimized 6-substitution SakSTAR variant with ProPred epitope scoring (10% threshold) and conservation-based substitutions. x-axis: sequence position; y-axis: HLA-DRB1* allele recognizing the 9-mer. Lines: 9-residue extent of epitopes in the wild-type; cross-hatched lines: epitopes remaining in the variant. Blue ellipses indicate mutated positions (refer to Table 2).

Erythropoietin (Epo)

Tangri et al. [22] focused on two regions in the protein therapeutic erythropoietin (Epo), residues 101-115 and 136-150, which they experimentally determined to be immunogenic during an intensive analysis of peptide fragments spanning the entire length of the protein. They engineered four variants targeting the anchor residues of identified T-cell epitopes in these regions: L102P/S164D (named G2), T107D/S146D (G3), L102G/T107D/S146D (G4), and L102S/T107D/S146D (G5).

We applied our methods to explore the favorable sequence space of Epo, using the Erythropoietin/thrombopoietin family (Pfam accession PF00758) for the conservation statistics and the crystal structure of human Epo (pdb id 1EER) for the FoldX analysis. As demonstrated above for SakSTAR, our method is not restricted to optimizing only targeted regions, but can instead seek to delete epitopes throughout the protein. Since both the ProPred and SMM-align epitope predictors and Tangri et al.'s in vitro assays showed that there are many immunogenic regions in Epo, we performed full-protein optimization, rather than restricting the allowed substitutions to the 101-115 and 136-150 regions. Fig. 9 illustrates a 10-mutation BLOSUM-based variant. The black line is an experimentally determined immunogenicity plot from Tangri et al. [22] and trends well with the ProPred model of immunogenicity. Some deviations may be explained by the difference in alleles tested (we share 6 of their alleles), and by the fact that they analyze 15-mers at every 5 positions while we analyze 9-mers at every position. Nonetheless, the correlation is quite striking, as is the ability of our design to target most of the highly immunogenic regions with only a small number of substitutions. Each substitution is quite effective, deleting an average of 6.3 epitopes.

Figure 9
figure 9

Full-length Epo variant profile. Optimized 10-substitution Epo variant with ProPred epitope scoring and BLOSUM-based substitutions. x-axis: starting position of each 9-mer; y-axis: predicted number of alleles recognizing the 9-mer. Thin black bars indicate wild-type scores and thick orange bars indicate variant scores. Note: wild-type epitope scores are always greater than or equal to corresponding variant ones; i.e., we never introduce new epitopes. Blue ellipses indicate mutated positions (refer to Table 3). The line plot, from Tangri et al. [22] displays wild-type Epo antigenicity using ELISPOT assays, with black squares giving the number of alleles bound to overlapping 15-mers.

Table 3 summarizes a number of our optimized variants, as with SakSTAR listing just one for each allowed residue predicate (see Additional file 1, Table S5 for a full list). The first substitution made under BLOSUM and conservation is to R110, in the 101-115 region, while that under FoldX is to N147, in the 136-150 region, though neither of these regions was specifically targeted. As more substitutions are added, other predicted epitopes are deleted, including more in those regions. Thus our objective function, the ProPred score, continues to decrease; the trend is roughly the same for both the Tangri variants and ours. In some cases the independent SMM-align score fluctuates more than others, e.g., BLOSUM alternates between using S164D or not. This observation highlights the fact that some substitutions may be particularly good for the SMM-align score but not as important for the ProPred objective function.

Table 3 Full-length Epo.

As with SakSTAR, comparison of the different predicates yields insights into positions and substitutions that appear to be good in general; e.g., V82T under BLOSUM, V82A under conservation, and V82E under FoldX, deleting 7, 6, and 7 epitopes respectively. Notably, none of the V82 mutations eliminates the epitope anchored at L80 on allele HLA-DRB*0401. Otherwise V82T and V82A eliminate all of the epitopes in the region overlapping position 82. Our global optimization recognizes diminishing returns at this area on the protein. While adding additional mutations in the region may eliminate the final regional epitope at L80; it is only one epitope, and mutations elsewhere eliminate more epitopes.

Therapeutic Antibodies

Lazar et al. [23] introduced the concept of "human string content," or the percent identity between peptides derived from a test antibody sequence and corresponding peptides taken from a multiple sequence alignment of homologous human antibodies. We applied our methodology to anti-Her2/neu antibody 4D5, the anti-EGFR antibody 225, and the anti-EpCAM antibody 17-1A. At the 16-substitution level, we are able to reduce epitope score by about 70-90%; this compares favorably to the previous work, which required more than four times that many substitutions. See Additional file 1 for a more detailed description of the case study and our results.

Conclusions

We have shown that dynamic programming can address the problem of designing protein variants predicted to have reduced immunogenicity while maintaining stability. Our method found a number of variants that compare favorably to those developed in previous efforts. In many cases, our designs delete more epitopes than previous efforts, as measured by the ProPred pocket profile method and independently assessed with the SMM-align method. At the same time, the capacity of our algorithm to integrate stability analysis with deimmunization resulted in variants predicted to maintain greater thermodynamic stability. We further showed our optimization methods to be highly efficient, eliminating on average over 6 epitopes per mutation. Finally, one of the most powerful features of our methods is that we achieve global deimmunization as opposed to targeted deletion of a single epitope regardless of other immunogenic or functional consequences.

The algorithm guarantees that our variants are provably optimal with respect to the epitope and stability predictors, but this does not guarantee optimal properties in vivo. Instead, our algorithm should be viewed as a way to suggest variants worth studying experimentally. It provides a tool for the protein designer to explore the space of designs and focus in what appears to be a beneficial region, according to the best available predictions.

Future experimental work will focus on selection of one or more therapeutic targets that will be subjected to an exhaustive optimization under several mutational loads. Based on the resulting plans, small libraries of candidate variants will be constructed, expressed and purified, tested for functionality, and experimentally evaluated for immunogenic potential. Further computational work will develop other classes of optimization algorithms for incorporating properties not strictly local in terms of the primary sequence, such as residues that covary in the sequence record or form strong interactions in the three-dimensional structure.

Availability and requirements

Project name: DP2: dynamic programming for deimmunizing proteins

Project home page: http://www.cs.dartmouth.edu/~cbk/deimm/

Operating system(s): Platform independent

Programming language: Java

Other requirements: Java 1.6 or higher

License: GNU GPL

Any restrictions to use by non-academics: Please contact the authors before non-academic use.

References

  1. Koren E, Zuckerman LA, Mire-Sluis AR: Immune responses to therapeutic proteins in humans - clinical significance, assessment and prediction. Current Pharmaceutical Biotechnology 2002, 3: 349–360. 10.2174/1389201023378175

    Article  CAS  PubMed  Google Scholar 

  2. Schellekens H: Immunogenicity of therapeutic proteins: Clinical implications and future prospects. Clinical Therapeutics 2002, 24: 1720–1740. 10.1016/S0149-2918(02)80075-3

    Article  CAS  PubMed  Google Scholar 

  3. Chirino AJ, Ary ML, Marshall SA: Minimizing the immunogenicity of protein therapeutics. Drug Discovery Today 2004, 9: 82–90. 10.1016/S1359-6446(03)02953-2

    Article  CAS  PubMed  Google Scholar 

  4. Kessler M, Goldsmith D, Schellekens H: Immunogenicity of biopharmaceuticals. Nephrology, Dialysis, Transplantation 2006, 21: v9–12. 10.1093/ndt/gfl476

    Article  CAS  PubMed  Google Scholar 

  5. Shankar G, Pendley C, Stein KE: A risk-based bioanalytical strategy for the assessment of antibody immune responses against biological drugs. Nature Biotech 2007, 25: 555–561. 10.1038/nbt1303

    Article  CAS  Google Scholar 

  6. Schellekens H: Bioequivalence and the immunogenicity of biopharmaceuticals. Nature Reviews Drug Discovery 2002, 1: 457–462. 10.1038/nrd818

    Article  CAS  PubMed  Google Scholar 

  7. De Groot AS, Scott DW: Immunogenicity of protein therapeutics. Trends in Immunology 2007, 28: 482–490. 10.1016/j.it.2007.07.011

    Article  CAS  PubMed  Google Scholar 

  8. Schellekens H: Factors influencing the immunogenicity of therapeutic proteins. Nephrology, Dialysis, Transplantation 2005, 20: vi3–9. 10.1093/ndt/gfh1092

    CAS  PubMed  Google Scholar 

  9. Pradeu T, Carosella ED: On the definition of a criterion of immunogenicity. PNAS 2006, 103: 17858–17861. 10.1073/pnas.0608683103

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Morrison SL, Johnson MJ, Herzenberg LA, Oi VT: Chimeric human antibody molecules: Mouse antigen-binding domains with human constant region domains. PNAS 1984, 81: 6851–5. 10.1073/pnas.81.21.6851

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Jones PT, Dear PH, Foote J, Neuberger MS, Winter G: Replacing the complementarity-determining regions in a human antibody with those from a mouse. Nature 1986, 321: 522–525. 10.1038/321522a0

    Article  CAS  PubMed  Google Scholar 

  12. Winter G, Harris WJ: Humanized antibodies. Trends in Pharmacological Sciences 1993, 14: 139–143. 10.1016/0165-6147(93)90197-R

    Article  CAS  PubMed  Google Scholar 

  13. Lo BKC: Antibody humanization by CDR grafting. Methods Mol Biol 2003, 248: 135–160.

    Google Scholar 

  14. Kashmiri SVS, De Pascalis R, Gonzales NR, Schlom J: SDR grafting - a new approach to antibody humanization. Methods 2005, 36: 25–34. 10.1016/j.ymeth.2005.01.003

    Article  CAS  PubMed  Google Scholar 

  15. Presta LG: Selection, design, and engineering of therapeutic antibodies. J Allergy and Clinical Immunology 2005, 116: 731–736. 10.1016/j.jaci.2005.08.003

    Article  CAS  Google Scholar 

  16. Hwang WYK, Foote J: Immunogenicity of engineered antibodies. Methods 2005, 36: 3–10. 10.1016/j.ymeth.2005.01.001

    Article  CAS  PubMed  Google Scholar 

  17. Almagro JC, Fransson J: Humanization of antibodies. Front Biosci 2008, 13: 1619–1633.

    CAS  PubMed  Google Scholar 

  18. Trombetta ES, Mellman I: Cell biology of antigen processing in vitro and in vivo. Annual Review of Immunology 2005, 23: 975–1028. 10.1146/annurev.immunol.22.012703.104538

    Article  CAS  PubMed  Google Scholar 

  19. Warmerdam PAM, Plaisance S, Vanderlick K, Vandervoort P, Brepoels K, Collen D, Maeyer MD: Elimination of a human T-cell region in staphylokinase by T-cell screening and computer modeling. J Thromb Haemost 2002, 87: 666–673.

    CAS  Google Scholar 

  20. Jones TD, Phillips WJ, Smith BJ, Bamford CA, Nayee PD, Baglin TP, Gaston JSH, Baker MP: Identification and removal of a promiscuous CD4+ T cell epitope from the C1 domain of factor VIII. J Thromb Haemost 2005, 3: 991–1000. 10.1111/j.1538-7836.2005.01309.x

    Article  CAS  PubMed  Google Scholar 

  21. Harding FA, Liu AD, Stickler M, Razo OJ, Chin R, Faravashi N, Viola W, Graycar T, Yeung VP, Aehle W, Meijer D, Wong S, Rashid MH, Valdes AM, Schellenberger V: A beta-lactamase with reduced immunogenicity for the targeted delivery of chemotherapeutics using antibody-directed enzyme prodrug therapy. Mol Cancer Ther 2005, 4: 1791–1800. 10.1158/1535-7163.MCT-05-0189

    Article  CAS  PubMed  Google Scholar 

  22. Tangri S, Mothe BR, Eisenbraun J, Sidney J, Southwood S, Briggs K, Zinckgraf J, Bilsel P, Newman M, Chesnut R, LiCalsi C, Sette A: Rationally engineered therapeutic proteins with reduced immunogenicity. J Immunol 2005, 174: 3187–3196.

    Article  CAS  PubMed  Google Scholar 

  23. Lazar GA, Desjarlais JR, Jacinto J, Karki S, Hammond PW: A molecular immunology approach to antibody humanization and functional optimization. Mol Immunol 2007, 44: 1986–1998. 10.1016/j.molimm.2006.09.029

    Article  CAS  PubMed  Google Scholar 

  24. De Groot AS, Moise L: Prediction of immunogenicity for therapeutic proteins: State of the art. Curr Opin Drug Discov Devel 2007, 10: 332–340.

    CAS  PubMed  Google Scholar 

  25. Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comp Biol 2008, 4: e1000048. 10.1371/journal.pcbi.1000048

    Article  Google Scholar 

  26. De Groot AS, Martin W: Reducing risk, improving outcomes: Bioengineering less immunogenic protein therapeutics. Clinical Immunology 2009, 131: 189–201. 10.1016/j.clim.2009.01.009

    Article  CAS  PubMed  Google Scholar 

  27. De Groot AS, Knopp PM, Martin W: De-immunization of therapeutic proteins by T-cell epitope modification. Dev Biol (Basel) 2005, 122: 171–94.

    CAS  Google Scholar 

  28. Koren E, De Groot AS, Jawa V, Beck KD, Boone T, Rivera D, Li L, Mytych D, Koscec M, Weeraratne D, Swanson S, Martin W: Clinical validation of the "in silico" prediction of immunogenicity of a human recombinant therapeutic protein. Clinical Immunology 2007, 124: 26–32. 10.1016/j.clim.2007.03.544

    Article  CAS  PubMed  Google Scholar 

  29. Singh H, Raghava G: ProPred: prediction of HLA-DR binding sites. Bioinformatics 2001, 17: 1236–1237. 10.1093/bioinformatics/17.12.1236

    Article  CAS  PubMed  Google Scholar 

  30. Nielsen M, Lundegaard C, Lund O: Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 2007, 8: 238. 10.1186/1471-2105-8-238

    Article  PubMed  PubMed Central  Google Scholar 

  31. Guang LZ, Khan AM, Srinivasan KN, August JT, Brusic V: MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucl Acids Res 2005, 33: W172-W179. 10.1093/nar/gki506

    Article  Google Scholar 

  32. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 2005, 57: 304–314. 10.1007/s00251-005-0798-y

    Article  CAS  PubMed  Google Scholar 

  33. Schirle M, Weinschenk T, Stevanovic S: Combining computer algorithms with experimental approaches permits the rapid and accurate identification of T cell epitopes from defined antigens. J Immunological Methods 2001, 257: 1–16. 10.1016/S0022-1759(01)00459-8

    Article  CAS  PubMed  Google Scholar 

  34. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, Braxenthaler M, Gallazzi F, Protti MP, Sinigaglia F, Hammer J: Generation of tissue-specific and promiscuous HLA ligand database using DNA microarrays and virtual HLA class II matrices. Nature Biotechnol 1999, 17: 555–561. 10.1038/9858

    Article  CAS  Google Scholar 

  35. Dinglasan RR, Kalume DE, Kanzok SM, Ghosh AK, Muratova O, Pandey A, Jacobs-Lorena M: Disruption of Plasmodium falciparum development by antibodies against a conserved mosquito midgut antigen. PNAS 2007, 104: 13461–13466. 10.1073/pnas.0702239104

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Klyushnenkova EN, Kouiavskaia DV, Kodak JA, Vandenbark AA, Alexander RB: Identification of HLA-DRB1*1501-restricted T-cell epitopes from human prostatic acid phosphatase. Prostate 2007, 67: 1019–1028. 10.1002/pros.20575

    Article  CAS  PubMed  Google Scholar 

  37. Mustafa AS, Shaban FA: ProPred analysis and experimental evaluation of promiscuous T-cell epitopes of three major secreted antigens of Mycobacterium tuberculosis. Tuberculosis 2006, 86: 115–124. 10.1016/j.tube.2005.05.001

    Article  CAS  PubMed  Google Scholar 

  38. Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, Nemazee D, Ponomarenko JV, Sathiamurthy M, Schoenberger S, Stewart S, Surko P, Way S, Wilson S, Sette A: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol 2005, 3: e91. 10.1371/journal.pbio.0030091

    Article  PubMed  PubMed Central  Google Scholar 

  39. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 1999, 50: 213–219. 10.1007/s002510050595

    Article  CAS  PubMed  Google Scholar 

  40. Southwood S, Sidney J, Kondo A, del Guercio MF, Appella E, Hoffman S, Kubo RT, Chesnut RW, Grey HM, Sette A: Several common HLA-DR types share largely overlapping peptide binding repertoires. J Immunol 1998, 160: 3363–3373.

    CAS  PubMed  Google Scholar 

  41. Henikoff S, Henikoff JG: Amino acid substitutions from protein blocks. PNAS 1992, 89: 10915–10919. 10.1073/pnas.89.22.10915

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ohage E, Steipe B: Intrabody construction and expression. I. The critical role of VL domain stability. J Mol Biol 1999, 291: 1119–1128. 10.1006/jmbi.1999.3019

    Article  CAS  PubMed  Google Scholar 

  43. Nikolova PV, Henckel J, Lane DP, Fersht AR: Semirational design of active tumor suppressor p53 DNA binding domain with enhanced stability. PNAS 1998, 95: 14675–14680. 10.1073/pnas.95.25.14675

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Wang Q, Buckle AM, Fersht AR: Stabilization of GroEL minichaperones by core and surface mutations. J Mol Biol 2000, 298: 917–926. 10.1006/jmbi.2000.3716

    Article  CAS  PubMed  Google Scholar 

  45. Lehmann M, Pasamontes L, Lassen SF, Wyss M: The consensus concept for thermostability engineering of proteins. Biochim Biophys Acta 2000, 1543: 408–415.

    Article  CAS  PubMed  Google Scholar 

  46. Lehmann M, Loch C, Middendort A, Studer D, Lassen SF, Pasamontes L, van Loon APGM, Wyss M: The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng 2002, 15: 403–411. 10.1093/protein/15.5.403

    Article  CAS  PubMed  Google Scholar 

  47. Gerstein M, Sonnhammer ELL, Chothia C: Volume changes in protein evolution. J Mol Biol 1994, 236: 1067–1078. 10.1016/0022-2836(94)90012-4

    Article  CAS  PubMed  Google Scholar 

  48. Guerois R, Nielsen JE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320: 369–387. 10.1016/S0022-2836(02)00442-4

    Article  CAS  PubMed  Google Scholar 

  49. Fajardo-Sanchez E, Stricher F, Paques F, Isalan M, Serrano L: Computer design of obligate heterodimer meganucleases allows efficient cutting of custom DNA sequences. Nucl Acids Res 2008, 36: 2163–2173. 10.1093/nar/gkn059

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Sloot AM, Tur V, Szegezdi E, Mullally MM, Cool RH, Samali A, Serrano L, Quax WJ: Designed tumor necrosis factor-related apoptosis-inducing ligand variants initiating apoptosis exclusively via the DR5 receptor. PNAS 2006, 103: 8634–8639. 10.1073/pnas.0510187103

    Article  PubMed  PubMed Central  Google Scholar 

  51. Bellman R, Kalaba R: On the K th best policies. J SIAM 1960, 8: 582–588.

    Google Scholar 

  52. Waterman MS, Byers TH: A dynamic programming algorithm to find all solutions in a neighborhood of the optimum. Math Biosci 1985, 77: 179–188. 10.1016/0025-5564(85)90096-3

    Article  Google Scholar 

  53. Naor D, Brutlag D: On near-optimal alignments in biological sequences. J Comp Biol 1994, 1: 349–366. 10.1089/cmb.1994.1.349

    Article  CAS  Google Scholar 

  54. Gusfield D, Balasubramanian K, Naor D: Parametric optimization of sequence alignment. Algorithmica 1994, 12: 312–326. 10.1007/BF01185430

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by US NSF grant IIS-0444544 and an Alfred P. Sloan Foundation Fellowship to CBK, and a Neukom Institute CompX grant to CBK and KEG.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Karl E Griswold or Chris Bailey-Kellogg.

Additional information

Authors' contributions

ASP, KEG, and CBK developed the approach; ASP, WZ, and CBK designed the algorithms, ASP implemented the algorithms and collected the results; ASP, KEG, and CBK analyzed the results and wrote the paper. All authors read and approved the final manuscript.

Electronic supplementary material

12859_2009_3637_MOESM1_ESM.PDF

Additional file 1: Additional variants. The file includes additional variants for SakSTAR 71-87, full-length SakSTAR, and full-length Epo, as well as an additional case study for Abs 4D5, 225, and 17-1A. (PDF 185 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Parker, A.S., Zheng, W., Griswold, K.E. et al. Optimization algorithms for functional deimmunization of therapeutic proteins. BMC Bioinformatics 11, 180 (2010). https://doi.org/10.1186/1471-2105-11-180

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-11-180

Keywords