- Methodology article
- Open Access
HTJoinSolver: Human immunoglobulin VDJ partitioning using approximate dynamic programming constrained by conserved motifs
© Russ et al.; licensee BioMed Central. 2015
Received: 20 January 2015
Accepted: 22 April 2015
Published: 23 May 2015
Partitioning the human immunoglobulin variable region into variable (V), diversity (D), and joining (J) segments is a common sequence analysis step. We introduce a novel approximate dynamic programming method that uses conserved immunoglobulin gene motifs to improve performance of aligning V-segments of rearranged immunoglobulin (Ig) genes. Our new algorithm enhances the former JOINSOLVER algorithm by processing sequences with insertions and/or deletions (indels) and improves the efficiency for large datasets provided by high throughput sequencing.
In our simulations, which include rearrangements with indels, the V-matching success rate improved from 61% for partial alignments of sequences with indels in the original algorithm to over 99% in the approximate algorithm. An improvement in the alignment of human VDJ rearrangements over the initial JOINSOLVER algorithm was also seen when compared to the Stanford.S22 human Ig dataset with an online VDJ partitioning software evaluation tool.
HTJoinSolver can rapidly identify V- and J-segments with indels to high accuracy for mutated sequences when the mutation probability is around 30% and 20% respectively. The D-segment is much harder to fit even at 20% mutation probability. For all segments, the probability of correctly matching V, D, and J increases with our alignment score.
Immunoglobulins (Ig) are a family of proteins that identify and bind foreign pathogens, e.g., bacteria and viruses. Diversity in the antigen-binding region of Ig provides an appropriate immune response to the wide array of pathogens confronting individuals. This diversity is generated by VDJ recombination, which joins a Variable (V) gene segment, a Diversity (D) gene segment, and a Joining (J) gene segment from distant regions of DNA to potentially create about 10 billion different antibodies, each of which binds to a distinct epitope.
During recombination, nucleotide excision of the germline gene termini and the addition of nontemplated N nucleotides by terminal deoxynucleotidyl transferase (TdT) at the V to D and D to J junctions provide additional diversification. Furthermore, during germinal center reactions B cell receptors undergo further changes including somatic hypermutation as well as nucleotide insertions and/or deletions (indels) of various length which creates a larger potential repertoire of antibodies.
Previously the JOINSOLVER  algorithm was used successfully to compare an unknown VDJ rearrangement against a set of V-, D-, and J-germline sequences to provide information about gene utilization in the Ig repertoire in a wide variety of conditions including: S. aureus immune evasion , CDR3 characteristics and VH mutations in systemic lupus erythematosus , immunological memory in chronic granulomatous disease , rheumatoid arthritis, , CDR3H characterization of the fetus and neonates ; X-linked HyperIgM , and the analysis of the neutralizing HIV antibodies . Unfortunately, JOINSOLVER was not designed to handle indels.
This paper addresses the challenge of both accurately aligning heavily mutated Ig rearrangements, potentially with indels, to the nearest matching V, D, and J germline gene and identifying junctional N nucleotides. We introduce a sequence alignment algorithm that approximates the results of a dynamic programming (DP) algorithm, which can save up to 98% of the computational time.
Dynamic programming algorithms have been used to align sequences since 1970 [9,10]. Typically, DP alignment algorithms align sequences by creating a matrix with the rows corresponding to the bases of one sequence, and columns corresponding to the bases of second sequence. Matrix element (i, j) is the best alignment up to the ith base of the first sequence and the jth base of the second sequence. Dynamic programming algorithms have rules that define how to initialize the matrix, how to fill matrix elements after initialization, and where to find the highest score. A traceback is kept to mark the path of the best alignment through the matrix starting from the highest score matrix element. Most DP algorithms rules allow for insertions, deletions, mismatches, and matches. A match or mutated base is a one base step forward in both sequences, corresponding to a diagonal step in the DP matrix. Insertions and deletions increment one sequence, but not the other, corresponding to a right or downward step, respectively. Typically DP algorithms have poor performance, of order O(NM) where N and M are the lengths of the two sequences being matched . Durban  provides an outstanding, in depth explanation of the use of DP algorithms for sequence matching.
Previous work suggests that banding, or working along a diagonal band in the DP matrix, improves the performance of DP algorithms . In the same spirit, our method uses prior biological knowledge to lock down the alignment at highly conserved motifs in V- and J-germline genes and traverses along the diagonal of the DP matrix to significantly improve the speed and accuracy of alignments. When the motifs are not found, the algorithm falls back to a more traditional DP algorithm.
The alignment of the V-segment accounts for most of the computational work. The amount of work is related to the length of the segments and the number of sequences being compared against the segment. In germline database used by JOINSOLVER, there are many more V-germline genes (289) than D- or J-germline genes (84 and 12 respectively), and the V-germline genes have the longest germline sequences (~285 nucleotides) used in the analysis. The algorithm balances the need for aligning these irregular V-segments with the need to analyze large numbers of sequences provided by next generation sequencing. A new desktop application, HTJoinSolver, is provided as an implementation of the new partitioning method.
Partitioning sequences using conserved motifs
Next the partitioning method identifies the JH segment using a highly conserved J motif in a process similar to the VH alignment (Figure 1c). The remaining unaligned sequence is split just before the J-motif. A DP algorithm is used to align this fragment to J germline genes and to identify the 5’ start of the J-segment. The V-end algorithm is used to align the 3’ end of the J-segment. The two J fragment alignments are merged to produce a fully aligned J-segment. Finally, the D-segment is matched using a specialized local DP algorithm (Figure 1d) to produce a fully partitioned and aligned rearrangement (Figure 1e). Those nucleotides that are not partitioned with the VH, D, or JH segment are considered N addition nucleotides.
The approximate backwards algorithm
Unlike the DP algorithms commonly used for sequence alignment, the approximate backwards algorithm starts from a known point and works backwards to the start of the matrix formed by the germline gene and the unknown or query sequence. In this study, the scoring rules are +5 for a match and −4 for a mismatch; gap opening (indel formation) is penalized −30, and gap extension (continuation of indel) is penalized −1. The numbers in the matrix elements are the scores of the best alignment up to the bases corresponding to the column and row. The algorithm starts at the location of the 5’ T of the conserved V-motif in both the germline sequence and the query Ig sequence. The score is initialized to zero. The algorithm steps one base backwards, toward the 5’ end in both sequences. A traceback, which points to the previous matrix element, is kept to mark the alignment. As long as the bases match, the algorithm continues stepping in the 5’ direction.
If the bases don’t match, the algorithm considers the possibility of a mutation or an indel. The algorithm steps forward 10 bases (toward the 3’ end) and calculates a small 40 × 40 DP matrix using an algorithm similar to the Needleman-Wunsch algorithm [9,10] to find the best alignment in this small section of the sequences. The small matrix is initialized by assuming inserts, which fill the elements in last column, and deletions, which fill the bottom row. Every other element in the small matrix is the maximum of a match, a mismatch, an insertion, or a deletion using the scoring rules. The tracebacks maintain the alignment through the small matrix. By calculating all elements of the small matrix, the algorithm determines the best alignment through the matrix taking insertions, deletions, mutations, and matches into account. The algorithm continues aligning from the highest scoring entry in first row or column of the matrix.
After completing the alignment, sequences with indels are re-aligned from 5’ end of the alignment to the 3’ end of the V-motif. The two alignments are compared for consistency. This quality control step catches suboptimal alignments caused by local maxima.
3’ V-end alignment algorithm
After aligning the 5’ end of the V-segment, we align the 3’ of the V-segment starting at the conserved VH-motif. The V-end alignment algorithm, which is a type of overlap DP algorithm , begins at the first “T” in the VH-motif for both the query and germline sequences. A full matrix calculation is required, however the matrix is fairly small because only a few nucleotides occur after the V-motif in the V-germline genes. The 3’ end of the V-segment is defined by the maximum score, which can occur anywhere in the matrix. The 5’ J-end alignment uses the same algorithm, starting at the “C” from the conserved JH-motif.
J-start alignment algorithm
The start of the J-segment is aligned using a DP algorithm similar to a Needleman-Wunsch algorithm; however, the maximum score can be anywhere in the DP matrix, and the algorithm is run backwards from the J-motif. The maximum score corresponds to the start of the J-segment. The alignment is determined by tracing back from the highest scoring matrix entry.
D matching alignment algorithm
The D alignment uses a local alignment algorithm similar to a Smith-Waterman algorithm. A match is given a score of +1, however a mismatch sets the score back to 0. When the score = 3, the algorithm looks back 4 steps to check for a potential mutation, if the score before the match start is greater than 3, the current score is increased by the previous score. This allows a mismatch to be recognized as a mutation in the D-segment if and only if there are at least 3 consecutively matching bases on both sides of the mutation. The probability of randomly matching 3 bases is around 1.5% or 3% on either side of the D-segment, and 3 was chose to keep the probability of random matches below 5%. The termination condition is the maximum score anywhere in the matrix. These modifications to the Smith-Waterman algorithm maintain the basis of the original JOINSOLVER, matching D-segments based on consecutive matching bases.
Falling back to other algorithms
In the event, the V-motif is not found in the sequence, the algorithm looks for the motif with one mutation. If the motif with a single mutation is found, the approximate backwards algorithm is run using the mutated motif location. Because the J-motif is shorter, mutations are not allowed in the J motif.
When the location of the V-motif is wrong, the alignment is very poor resulting in a low score. Setting a score threshold of 3.1/base effectively catches an incorrect motif location. Before relinquishing to a slower algorithm, one last attempt is made to identify the location of the motif. The sequence is aligned to the first germline sequence in our database (IGHV1-18*01). The motif location from this alignment is used to align the query sequence to all other germline genes to determine the best match or alignment. If the highest scoring alignment is still below 3.1/base, we run an overlapping sequence DP algorithm for the V or a Smith-Waterman algorithm for the J alignments. Our only modification is that the end of the V- or J-segment occurs at the maximum score anywhere in the matrix, not just in the last row or column.
Default parameter used in simulations, unless otherwise mentioned
Random (usually 1)
Two sets of simulations were performed. In the first simulation, 10,000 rearrangements were generated with mutation probabilities ranging from 0% to 95% in increments of 5% throughout the VDJ sequence. Using these rearrangements, the approximate backwards algorithm was compared against a complete DP algorithm, to calculate the success rate. The second set of simulations with 1,000 rearrangements engineered with a mutation probability of 3.5% was performed to compare HTJoinSolver with the original JOINSOLVER using the original JOINSOLVER germline database. The results of an alignment are considered a successful match to a germline if the algorithm returns the correct gene; the exact allele is not required.
Comparison with a standard dataset of biological VDJ rearrangements
Additionally 13,153 human sequences from the Stanford.S22 dataset  were analyzed to compare HTJoinSolver with several other frequently employed algorithms. Briefly, the Stanford_S22 data were produced by 454 sequencing of peripheral blood mononuclear cells from a single donor. The S22 individual genotype was determined using an individual analysis of iHMMune-align  results. The Stanford_S22 dataset and the S22 genotype form a standard dataset to evaluate the performance of VDJ partitioning algorithms. An online evaluation tool, Evaluation of IGH partitioning tools (, http://www.emi.unsw.edu.au/~ihmmune/IGHUtilityEval/evalForm.html) compares the performance of VDJ partitioning tools on the Stanford_S22 dataset. As a performance metric, the tool reports the percentage of V, D, or J assignments to germlines that are not present in the predetermined S22 genotype.
The germline genes used in HTJoinSolver are IMGT reference sequences [18,19], www.imgt.org/download/GENE-DB/IMGTGENEDB-ReferenceSequences.fasta-nt-WithGaps-F+ORF+inframeP, downloaded July 30, 2014). HTJoinSolver provides methods to download and reformat germline genes from IMGT. These methods allow the user to maintain an up-to-date library of germline genes usable by HTJoinSolver. When comparing HTJoinSolver with JOINSOLVER, we used the JOINSOLVER germline database, which was derived from IMGT and included pseudogenes.
Access to HTJoinSolver
HTJoinSolver is available for public use, and can be downloaded from https://dcb.cit.nih.gov/HTJoinSolver.
The results of the HTJoinSolver approximate algorithm were then compared to the original JOINSOLVER V-alignment algorithm. Although the JOINSOLVER algorithm was not designed to identify indels, large portions of the rearrangement can align correctly up to the position of the indel, but the remaining V-segment was mismatched. Unfortunately, the results were not usable for mutation analysis because the offset alignment appeared as a region of high mutation instead of a single indel. Adding V-segment indels to our simulations hurt the overall performance of the original JOINSOLVER. With a 3.5% mutation probability and simulated indels, the original algorithm selected the correct V-germline 61% of the time, whereas the new algorithm selected the correct V-germline over 99% of the time. The percent of success for the old vs. new algorithms were 73% vs. 91% and 98% vs. 99% for the D- and J-germlines, respectively. These simulations show that the approximate backwards algorithm performs as well or better than the original JOINSOLVER algorithm.
The D-segment, which is relatively short, is the most difficult segment to identify and align. The D-segment nucleotides must be found within the CDR3, which is the most complex region of the VDJ rearrangement. In order to locate the CDR3 5’ and 3’ boundaries the algorithm must first correctly identify the 3’ end of the V-segment and the 5’ end of the J-segment whose nucleotide sequences may differ from the germline gene if nucleotide excision and/or mutation has occurred. N addition may be present or absent and varies in the number of nucleotides flanking the D-segment. Furthermore, D-segment mutations decrease the number of consecutively matching D-nucleotides. Thus, the D alignment success rate drops off almost immediately as the mutation probability increases. The curve in Figure 4 labeled “D Success Rate” levels off at around 4%, which is around what is expected by randomly matching D-germlines. The success rate drops from 93% at a mutation probability of zero to 89% at a mutation probability of 5%. The solid line with ‘X’ marks labeled “All Success Rate” corresponds to matching the full VDJ rearrangement. This curve strongly resembles the D success curve, reinforcing the fact that the D alignment is the most difficult and performance-limiting step.
The second column in Figure 5 is the success rate for the D-segment. The solid vertical line is at alignment score equals nine, marking the shortest acceptable D-score suggested in the original JOINSOLVER paper . The effect of mutation is pronounced. As the mutation probability increases, fewer sequences have scores that cross the 9-base threshold needed to avoid random matches. When the mutation probability gets high, even though random matches of length 9 are rare, they occur as likely as real matches. Therefore, the success rate drops even at scores as high as 9. The J-segment success rates are shown in the right column of Figure 5. For low mutation probabilities, the J-segment is well identified. As the mutation probability increases to 20%, the success rate drops significantly for low scoring alignments.
Evaluation of IGH partitioning using inferred genotypes tool
D ≥ 9
D ≥ 9
High-throughput sequencing of B-cell receptor VDJ rearrangements produces vast numbers of sequences and requires extremely fast algorithms for VDJ alignment. The original JOINSOLVER algorithm was fast and accurate, but was not designed to handle indels. One of the main reasons for creating HTJoinSolver was to handle alignments with indels, while maintaining or improving the speed. Aligning a sequence to approximately 300 VH-germlines takes about 1.7 second on a 2.93 GHz Intel Core i7 iMac. Our approximate algorithm takes around 207 ms. Falling back to standard methods can take over 7x longer. One million lightly mutated sequences (simulation parameters in Table 1) took approximately 6 hours to run on 40 threads on a 4 × 12 core Opteron 6172 2.1 GHz computer.
The increase in speed occurs because we use the available biological information to avoid unnecessary calculations. We know that the highly conserved motif “TAT TAC TGT” begins at codon 102, and we know that insertions or deletions occur in less than 5% of the expressed repertoire, and are usually less than 10 bases . Given these pieces of information, the algorithm calculates only a small fraction of the total DP matrix, even at high mutation probabilities. Our approximate solution was successful at replicating the score of a complete DP matrix.
When the motif is not found, the algorithm makes several attempts to align the sequence before falling back to the complete overlapping DP algorithm. However, without knowing the position of the motif, we lose the increased efficiency that comes with the use of this prior information.
A potential problem can occur if a sequence has a long stretch of mismatches that is longer than the rectangular block matrix used to determine if a mismatch is a mutation or an indel. The algorithm may incorrectly fall into a local maximum, (i.e. not seeing enough of the sequence to find the actual best alignment) and may be incorrectly aligned. Usually, the score threshold will catch the incorrect alignment, and fall back to the overlap DP algorithm. If it is known beforehand that long stretches of indels or mutation will occur, increasing the block size may improve the performance. The larger the block size, the less likely the algorithm will fall into a local maximum requiring a fall back algorithm. However, the price for a large block size is increased computation time.
We were not surprised by the comparison between the original JOINSOLVER and HTJoinSolver. We wanted to show that the HTJoinSolver can perform complete alignments of VDJ rearrangements containing indels. Therefore, all of our simulations included indels, which hindered JOINSOLVER’s performance. The increased success rate of the improved algorithm is due to the ability to identify the size and position of indels. The D and J results were very similar because the DP algorithms used to align the D and J regions resemble the original JOINSOLVER algorithms, which were designed to prevent random D matching at the 95% confidence level .
The ability to successfully match a sequence to a germline is heavily affected by the length of the V-, D-, or J-segment and the mutation probability. Matching V-segments is easiest because longer germline sequences are available for comparison. As the mutation probability increases, the success rate drops. Both of these cases can be explained by the loss of information from the original germlines by truncation and mutation. Additional studies on whether the success rate can be explained by a noisy-channel model could give additional insights into the theoretical performance of any alignment algorithm.
Germline sequence similarity presents an additional challenge for HTJoinSolver, especially for aligning heavily processed sequences. The problem is compounded by the fact that the D-germline sequences can be similar to the 5’ end of the J-germline sequences. A small fraction of our simulations show a VH-JH distance of zero, even though we know a D-germline is present along with VD and DJ junctions. This ambiguity cannot be resolved without additional information.
When compared against the original JOINSOLVER algorithm using the Evaluation of IGH partitioning using inferred genotypes tool, we see an improvement in the V, D, J, and overall results. However, the improvement is small because the sequences are minimally mutated. The median mutation frequency for the S22 dataset is 0% and the mean is 0.9%.
The results of the evaluation tool are highly dependent on the set of germlines used in the comparison. When compared against other partitioning tools using the original JOINSOLVER database, the results show that HTJoinSolver makes fewer V, D, and J assignment to germlines and alleles not present in the S22 genotype than the tools in the study by . When the germline database was updated, the resulting evaluation changed. More V-alleles not in the S22 genotype where matched, hurting V and total performance.
The development of an evaluation tool with a standard dataset is an important first step, but without a better method of assessing the quality of results, and a standard library of germlines, comparison between partitioning tools remains difficult. We have shown that for the original JOINSOLVER® database, always including the highest scoring D alignment, even when we would not recommend using the alignment, improves the partitioning performance according to the evaluation tool. Furthermore, providing a D-segment that occurs in the S22 genome will always improve the performance as measured by the evaluation tool, even if the D-segment is very likely to be wrong. Relaxing the criteria for including a match (e.g., allowing low quality D-matches) should not have improved results of the evaluation. After updating the germline database, the performance for the V-segment dropped. With more allelic variants to choose from, the system is more likely to match previously unknown allelic variants causing the evaluated performance to decrease. Improvements in the D alignment can be explained by the removal of the reverse germlines from the database when we switched from the JOINSOLVER germline database to the IMGT reference database for HTJoinSolver. Interestingly, always including the highest scoring D-germline, regardless of whether the score is high enough, does not improve the D-segment performance using the updated database. Evaluation results for the same algorithm changed because of differences in the germline database and not on algorithmic differences.
These results suggest that evaluating a partitioning tool based solely on whether or not a germline occurs in an assumed genotype is not sufficient. Consider a partitioning algorithm that only assigned IGHV1-3*01 to all query sequences. Since IGHV1-3*01 is in the S22 genome, 0% of the sequences would come from outside the S22 genome. The partitioning algorithm would be evaluated as perfect. However, this algorithm would not make a useful tool. Whereas the evaluation tool is a good idea, currently the tool is too sensitive to similarity between the S22 genome and the germlines present in the partitioning tool’s database.
All of the simulations and the comparison with the partitioning tool evaluation software show that HTJoinSolver can identify V-, D-, and J-segment to very high accuracy for reasonably mutated sequences. As the mutation probability increases, alignment becomes more difficult but problematic only for unnaturally highly mutated or truncated sequences. The V-segment is identified very accurately for most biologically relevant cases. The J-segment is fairly well identified from 0-20% mutation probability but above 20% the success rate falls below 90%. At the 20% mutation probability, the D-segment has about a 55% chance of being correct. Regardless of the partitioning software, biologists should be skeptical of short D-matches to highly mutated sequences, even if mutations are not seen in the D.
The authors wish to thank Dr. John Mascola for the opportunity to develop HTJoinSolver. This work was supported by the Intramural Research Programs of the Center for Information Technology [DER, K-YH], National Institutes of Health and the Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health [NSL].
- Souto-Carneiro MM, Longo NS, Russ DE, Sun HW, Lipsky PE. Characterization of the human Ig heavy chain antigen binding complementarity determining region 3 using a newly developed software algorithm, JOINSOLVER. J Immunol. 2004;172(11):6790–802.View ArticlePubMedGoogle Scholar
- Viau M, Longo NS, Lipsky PE, Zouali M. Staphylococcal protein A deletes B-1a and marginal zone B lymphocytes expressing human immunoglobulins: An immune evasion mechanism. J Immunol. 2005;175(11):7719–27.View ArticlePubMedGoogle Scholar
- Sims GP, Ettinger R, Shirota Y, Yarboro CH, Illei GG, Lipsky PE. Identification and characterization of circulating human transitional B cells. Blood. 2005;105(11):4390–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Moir S, De Ravin SS, Santich BH, Kim JY, Posada JG, Ho J, et al. Humans with chronic granulomatous disease maintain humoral immunologic memory despite low frequencies of circulating memory B cells. Blood. 2012;120(24):4850–8.View ArticlePubMed CentralPubMedGoogle Scholar
- Souto-Carneiro MM, Mahadevan V, Takada K, Fritsch-Stork R, Nanki T, Brown M, et al. Alterations in peripheral blood memory B cells in patients with active rheumatoid arthritis are dependent on the action of tumour necrosis factor. Arthritis Res Ther. 2009;11(3):R84.View ArticlePubMed CentralPubMedGoogle Scholar
- Souto-Carneiro MM, Sims GP, Girschik H, Lee JS, Lipsky PE. Developmental changes in the human heavy chain CDR3. J Immunol. 2005;175(11):7425–36.View ArticlePubMedGoogle Scholar
- Longo N, Lugar P, Yavuz S, Zhang W, Krijger P, Russ D, et al. Analysis of somatic hypermutation in X-linked hyper-IgM syndrome shows specific deficiencies in mutational targeting. Blood. 2009;113(16):3706–15.View ArticlePubMed CentralPubMedGoogle Scholar
- Huang J, Ofek G, Laub L, Louder MK, Doria-Rose NA, Longo NS, et al. Broad and potent neutralization of HIV-1 by a gp41-specific human antibody. Nature. 2012;491(7424):406–12.View ArticlePubMedGoogle Scholar
- Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443–53.View ArticlePubMedGoogle Scholar
- Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982;162(3):705–8.View ArticlePubMedGoogle Scholar
- Durbin R. Biological sequence analysis : probabalistic models of proteins and nucleic acids. Cambridge, UK New York: Cambridge University Press; 1998.View ArticleGoogle Scholar
- Chao KM, Pearson WR, Miller W. Aligning two sequences within a specified diagonal band. Comput Appl Biosci. 1992;8(5):481–7.PubMedGoogle Scholar
- Lefranc MP. Unique database numbering system for immunogenetic analysis. Immunol Today. 1997;18(11):509.View ArticlePubMedGoogle Scholar
- Peled JU, Kuang FL, Iglesias-Ussel MD, Roa S, Kalis SL, Goodman MF, et al. The biochemistry of somatic hypermutation. Annu Rev Immunol. 2008;26:481–511.View ArticlePubMedGoogle Scholar
- Boyd S, Gaeta B, Jackson K, Fire A, Marshall E, Merker J, et al. Individual Variation in the Germline Ig Gene Repertoire Inferred from Variable Region Gene Rearrangements. J Immunol. 2010;184(12):6986–92.View ArticlePubMed CentralPubMedGoogle Scholar
- Gaëta BA, Malming HR, Jackson KJL, Bain ME, Wilson P, Collins AM. iHMMune-align: Hidden Markov model-based alignment and identification of germline genes in rearranged immunoglobulin gene sequences. Bioinformatics. 2007;23(13):1580–7.View ArticlePubMedGoogle Scholar
- Jackson KJ, Boyd S, Gaëta BA, Collins AM. Benchmarking the performance of human antibody gene alignment utilities using a 454 sequence dataset. Bioinformatics. 2010;26(24):3129–30.View ArticlePubMedGoogle Scholar
- Lefranc MP, Lefranc G. The immunoglobulin factsbook: Academic Press; San Diego, 2001.Google Scholar
- Giudicelli V, Duroux P, Ginestoux C, Folch G, Jabado-Michaloud J, Chaume D, et al. IMGT/LIGM-DB, the IMGT comprehensive database of immunoglobulin and T cell receptor nucleotide sequences. Nucleic Acids Res. 2006;34(Database issue):D781–4.View ArticlePubMed CentralPubMedGoogle Scholar
- Mascola JR, Haynes BF. HIV-1 neutralizing antibodies: understanding nature’s pathways. Immunol Rev. 2013;254(1):225–44.View ArticlePubMed CentralPubMedGoogle Scholar
- Briney BS, Willis JR, Crowe JE. Location and length distribution of somatic hypermutation-associated DNA insertions and deletions reveals regions of antibody structural plasticity. Genes Immun. 2012;13(7):523–9.View ArticlePubMed CentralPubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.