ImmunExplorer (IMEX): a software framework for diversity and clonality analyses of immunoglobulins and T cell receptors on the basis of IMGT/HighV-QUEST preprocessed NGS data
© Schaller et al. 2015
Received: 10 March 2015
Accepted: 28 July 2015
Published: 12 August 2015
Today’s modern research of B and T cell antigen receptors (the immunoglobulins (IG) or antibodies and T cell receptors (TR)) forms the basis for detailed analyses of the human adaptive immune system. For instance, insights in the state of the adaptive immune system provide information that is essentially important in monitoring transplantation processes and the regulation of immune suppressiva. In this context, algorithms and tools are necessary for analyzing the IG and TR diversity on nucleotide as well as on amino acid sequence level, identifying highly proliferated clonotypes, determining the diversity of the cell repertoire found in a sample, comparing different states of the human immune system, and visualizing all relevant information.
We here present IMEX, a software framework for the detailed characterization and visualization of the state of human IG and TR repertoires. IMEX offers a broad range of algorithms for statistical analysis of IG and TR data, CDR and V-(D)-J analysis, diversity analysis by calculating the distribution of IG and TR, calculating primer efficiency, and comparing multiple data sets. We use a mathematical model that is able to describe the number of unique clonotypes in a sample taking into account the true number of unique sequences and read errors; we heuristically optimize the parameters of this model. IMEX uses IMGT/HighV-QUEST analysis outputs and includes methods for splitting and merging to enable the submission to this portal and to combine the outputs results, respectively. All calculation results can be visualized and exported.
IMEX is an user-friendly and flexible framework for performing clonality experiments based on CDR and V-(D)-J rearranged regions, diversity analysis, primer efficiency, and various different visualization experiments. Using IMEX, various immunological reactions and alterations can be investigated in detail. IMEX is freely available for Windows and Unix platforms at http://bioinformatics.fh-hagenberg.at/immunexplorer/.
Immune repertoire is a term that is commonly used in immunology to describe the level of diversity and clonality of B and T cell antigen receptors, the immunoglobulins (IG) or antibodies and T cell receptors (TR). These cells encode an humongous variety of receptors that are capable of recognizing any organic macromolecule of biological relevance. The main process for the generation of the antigen receptors is called receptor rearrangement and is very similar for B and T cells: Every antigen receptor consists of two different chains that are responsible for antigen recognition, namely the α (TRA) and β (TRB) chain, and γ (TRG) and δ (TRD) for α β and γ δ TR, the immunoglobulin heavy chain (IGH), and one of two different immunoglobulin light chains (IGK, IGL) for the immunoglobulins or antibodies. IGH and TRB V domains are encoded by three different gene segments: variable (V), diversity (D) and joining (j); IGK, IGL and TRA V domains are encoded by two gene types, V and J . A human genome in germline confirmation comprises alleles for every gene . During B and T cell development the cells rearrange the genes so that there is only one V gene and one J gene per rearrangement (and usually one D for IGH and TRB, but several for TRD), and J element per functional exon. An important principle called allelic exclusion ensures that only one receptor specificity is expressed per B or T cell.
The human adaptive immune system has a strong impact on human health. Its efficiency is fundamentally reliant upon antigen receptor diversity; a restricted repertoire is in many cases unable to recognize the full variety of pathogens. In addition, an immune response as well as certain diseases lead to clonal expansions of B and T cells depending on their receptor specificity. Therefore, analyzing and understanding the repertoire is highly beneficial for research issues as well as to optimize medical treatment of patients .
Today’s most advanced techniques in immune repertoire analysis are based on next-generation sequencing (NGS)  that produces huge amounts of data. Currently, there exist various analysis and visualization tools for system immunology with different focuses such as, for example, MiTCR , Decombinator , IMGT/HighV-QUEST , IgBLAST , ImmunTraCkeR , immunoSEQ , IgAT Tool , and IgTree .
Some of those tools are focused on calculating a wide range of statistics (e.g., IgAT), performing alignments to facilitate analysis of the immunglobulin variable domain sequences (e.g., IgBLAST) or generating lineage trees from immunoglobulin variable region gene sequences (e.g., IgTree). All those tools are based on analyzing the B cell repertoire, while others enable detailed research on the T cell repertoire: For example, ImmunTraCkeR determines V-J rearrangements and sets the main focus on the cell immune repertoire diversity. MiTCR offers a fast CDR3 algorithm and a PCR two-stage approach for correcting sequencing errors. ImmunoSEQ mainly places emphasis on statistical analysis and visualization of IG and TR data.
Whereas most of these tools/frameworks are focused on one cell type or on one specific type of analysis, our here presented framework IMEX has been designed for comprehensive, in-depth analysis of human antigen IG and TR repertoires based on NGS data. IMEX contains algorithms for gaining more knowledge about the diversity on different sequence levels based on IMGT/HighV-QUEST analysis outputs [7, 13]. In the context of the calculation of clonality, IMEX users are able to define how to calculate sequence clonality and to compare diversity and clonality of various samples. A primer efficiency analysis enables the investigation of primer matching frequencies in PCR experiments. IMEX also includes V-(D)-J gene combination algorithms and additionally offers a wide range of visualization methods for gaining essential insights in the human adaptive immune system.
IMEX includes algorithms and statistical analyses for determining descriptive statistics about sequence functionality and V-(D)-J rearranged region frequency, calculating clonality of cells, estimating diversity of the cell spectrum, and visual representation of various gene/allele combinations. IMEX has been designed for analyzing and summarizing NGS-based IG and TR data derived from IMGT®;. IMGT/HighV-QUEST is a NGS high-throughput analysis portal for IG and TR, and so far the only one available online [7, 13]. IMGT/HighV-QUEST uses the same algorithms as IMGT/V-QUEST  with integrated IMGT/JunctionAnalysis , provides 11 compressed output files that contain information about variable (V), diverse (D), and joining (J) gene arrangements (V-(D)-J), identification and characterization of new alleles, detailed analysis of the junction (IMGT/JunctionAnalysis results), and additional information of mutations. IMEX uses these processed files as input for statistical analyses. Sample comparisons, clonotype tracking, and variety analysis are also included in IMEX. IMEX is written in C# and is freely available at http://bioinformatics.fh-hagenberg.at/immunexplorer/. In the following paragraphs we give detailed descriptions of the analysis methods implemented in IMEX.
Preprocessing methods for the IMGT/HighV-QUEST submission
The IMGT/HighV-QUEST online portal enables uploading and processing of up to 500,000 sequences, therefore preprocessing methods have been developed in IMEX: FASTA files can be split into several files (using a user-defined threshold for the size of these files) to prepare the upload to the IMGT®; information system; after uploading to IMGT/HighV-QUEST  at IMGT®;, the international ImMunoGeneTics information system®; (http://www.imgt.org)  and analyzing, the compressed output files can be merged to one compressed data file. This file includes all information that is needed for determining overall statistics of the IG and TR clonotypes, frequencies, diversity and V-(D)-J rearranged region frequencies using IMEX.
Descriptive statistic analyses
The clonality of the IG and TR based on theV-(D)-J rearranged regions, the CDR3 sequences, and/or the nucleotide sequence of the whole amplicon provides additional information. Clonal expansion is related to the level of somatic proliferation of single B or T cell clonotypes triggered by various immunological reactions. In IMEX, the calculation of clonality can be defined by the user by choosing the amino acid or the nucleotide sequence or the V-(D)-J rearranged regions. IMEX enables the calculation of the clonality based on the three complementarity determining regions (CDR), namely CDR1, CDR2, and CDR3. CDR3, the most variable CDR, can be found in the junction of the rearranged V-(D)-J regions. The number of clonotypes can also be determined using the nucleotide sequence of the whole read of the V-(D)-J rearranged region. Total numbers and relative frequencies of the clonotypes are given in tabular view; these lists can be exported and used for further analyses.
The diversity of an antigen receptor repertoire is calculated by analyzing the unique clonotypes of IG and TR in all sequences.
In the literature, several different ways to define the term diversity can be found ; IgAT, for example, calculates the clonotypic diversity as clonotypes per productive sequences and the sequence diversity as unique sequences per productive sequences . IMEX calculates sequence diversity using a more elaborated data mining approach  based on the most variable region, the CDR3 :
where a is the true number of unique clonotypes and k is the fraction of unique sequences caused by read errors.
IMEX provides an algorithm for visualizing various V-(D)-J rearranged region combinations. All V-J, V-D, J-D and V-(D)-J gene and/or allele combinations are determined in the data sample. The framework contains several different graphical representation possibilities to visualize the total gene and allele frequencies; frequency histograms, heat maps, and bubble charts can be created and enable detailed visualizations of the state of the investigated receptor repertoire. Gene and allele frequencies can be sorted by gene names so that results for different samples can be compared easily. A frequency threshold can be used to filter specific genes and alleles.
IMEX also offers the download of all B and T cell genes and alleles listed in the IMGT information system®; for the species Homo Sapiens. For the visualization of the V-(D)-J rearranged region distributions we have first calculated a list of all possible V-(D)-J combinations; all V-(D)-J combinations of a sample are determined and mapped on the full spectrum of all known V-(D)-J rearranged regions. This enables an accurate approach to compare various samples on gene or allele level.
PCR primer matching
IMEX includes a feature for analyzing primer efficiency. Primer sets used for multiplex rearranged V-(D)-J regions PCR amplification can be imported (see Additional file 1: Primer lists for TRB and IGH). This primer matching algorithm searches for the exact sequences in the IMGT aligned sequences and returns the relative frequency of each primer in the imported primer sets. This enables the optimization of the efficiency in multiplex PCR.
Pairwise CDR3 Clone Comparer: IMEX is capable of generating a list of unique CDR3 clonotypes of each data sample and searching the top c unique clonotypes from one sample in the other sample. Each clonotype is assigned a randomly chosen color and matched clonotypes are shown in the same color.
Multiple CDR3 Clone Comparer: The multiple comparison algorithm generates the top c unique clonotypes in each given data sample and searches for all so collected clonotypes in data samples. IMEX also contains a visualization and tabular view to compare overlapping multiple data samples according to CDR3.
Multiple V-(D)-J Clone Comparer: As clonality can not only be defined over the CDRs but also over the V-(D)-J rearranged regions, IMEX also offers a multiple V-(D)-J Clone Comparer. The functionality is implemented in analogy to the Multiple CDR3 Clone Comparer.
Approval of ethics committee and consent
Informed written consent was obtained from all participating individuals according to the Declaration of Helsinki. Ethical approval for the sample collection used here was obtained from the Ethical Committee of Upper Austria (no. E-9-12, Jan 21 s t , 2013).
Results and discussion
Here we demonstrate the analysis of NGS data of a proband whose immune spectrum showed highly abundant clonal Expansion over a longer time period. Using analysis methods provided by IMEX we found two cytotoxic T cell clonotypes (CD8+) that are highly abundant and can be constantly observed over several months. The data sets have been obtained using PCR (Biomed 2 primer panels for gDNA amplification) of the IGH and TRB loci  followed by next-generation sequencing (Illumina Miseq sequencer).
Basic analysis in IMEX of the IMGT/HighV-QUEST sequence alignments for the TR using primer set 1 and 2 of proband p78690
TRB primer set 1
TRB primer set 2
TRB primer set 2:
TRB primer set 2:
Total number of sequences
Basic analysis of the IMGT/HighV-QUEST sequence alignment for the IGH. The analysis was done accordingly as described in Table 1
Total number of sequences
Clonality comparison of the most abundant clonotypes based on CDR3 amino acid sequences in IMEX
TRB Primer Set 2: CD4-/CD8+ T3
TRB Primer Set 2: CD4+/CD8- T3
MNC TRB Primer Set 2 T3
IMEX, a user-friendly tool for analyzing and visualizing IG and TR repertoires based on NGS data, has been presented in this paper. IMEX offers several algorithms for analyzing the clonality and diversity on multiple levels such as V-(D)-J arrangement, CDR, and nucleotide sequences of the whole reads. Moreover, it also provides features for analyzing primer efficiency. IMEX includes various visualization possibilities such as pie charts, histograms, line charts, bubble charts, and heat maps. We have shown that IMEX can be used for visualizing and comparing various aspects of the state of human adaptive immune repertoires.
The software framework IMEX was initially planned for analyzing and further processing IMGT/HighV-QUEST output files for gDNA-based sample preparation. During the development and implementation of IMEX, the community forged ahead in the field of immune repertoire sequencing, therefore we are currently extending the functionalities of IMEX. Algorithms and features for new cDNA sample preparation technologies i.e., single molecule barcoding which is able to reduce PCR bias will be implemented and extended in thenear future.
In addition, we plan to extend our analyses to other IG (IGK, IGL) and TR loci (TRA, TRG and TRD). Medium-term we are aiming to integrate a machine learning approach (based on algorithms implemented in in HeuristicLab (http://dev.heuristiclab.com/) ) that can classify immune status of patients with distinct diseases (e.g., bone marrow stem cell transplantation and minimal residual disease).
IMEX is freely available as GUI for Windows platforms and also as command line version for Windows/Linux and Unix systems and can be downloaded at http://bioinformatics.fh-hagenberg.at/immunexplorer/.
Availability and requirements
Project Name: ImmunExplorer (IMEX)
Project Web-page: http://bioinformatics.fh-hagenberg.at/immunexplorer/
Operating System: Windows, Linux and Unix
Programming Language: C#
Other requirements: Microsoft.NET framework 4.0
License: see License Agreement on IMEX website http://bioinformatics.fh-hagenberg.at/immunexplorer/
The work described in this paper was done within the project “Transplant - Early marker for humoral and cellular rejection of transplanted kidneys” funded by EFRE Regio 13.
- Lefranc MP, Lefranc G. The immunoglobulin FactsBook. Waltham, Massachusetts: Elsevier; 2001.Google Scholar
- Lefranc MP. Immunoglobulin and t cell receptor genes: IMGT®; and the birth and rise of immunoinformatics. Front Immunol. 2014; 5:22.View ArticlePubMedPubMed CentralGoogle Scholar
- Abbas AK, Lichtman AHH, Pillai S. Elsevier Health Sci; 1994.Google Scholar
- Mardis ER. Next-generation sequencing platforms. Annu Rev Anal Chem. 2013; 6(1):287–303.View ArticleGoogle Scholar
- Bolotin DA, Shugay M, Mamedov IZ, Putintseva EV, Turchaninova MA, Zvyagin IV, et al. MiTCR: software for t-cell receptor sequencing data analysis. Nat Methods. 2013; 10(9):813–4.View ArticlePubMedGoogle Scholar
- Thomas N, Heather J, Ndifon W, Shawe-Taylor J, Chain B. Decombinator: a tool for fast, efficient gene assignment in t-cell receptor sequences using a finite state machine. Bioinformatics. 2013; 29(5):542–50.View ArticlePubMedGoogle Scholar
- Li S, Lefranc MP, Miles JJ, Alamyar E, Giudicelli V, Duroux P, et al. IMGT/HighV QUEST paradigm for t cell receptor IMGT clonotype diversity and next generation repertoire immunoprofiling. Nat Commun. 2013; 4:2333.PubMedPubMed CentralGoogle Scholar
- Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 2013; 41:34–40. doi:http://dx.doi.org/10.1093/nar/gkt382.View ArticleGoogle Scholar
- Tanneau I, Nondé A, Courtier A, Parmentier G, Noël M, Grivès A, et al. ImmunTraCkeR as a reliable TCR repertoire profiling tool to understand immune response and to explore immunotherapy biomarkers. J Immunother cancer. 2013; 1:112.View ArticleGoogle Scholar
- ImmunoSEQ @ONLINE. http://www.immunoseq.com/.
- Rogosch T, Kerzel S, Hoi KH, Zhang Z, Maier RF, Ippolito GC, et al. Front Immunol; 2012.Google Scholar
- Daelemans W, Van Den Bosch A, Weijters T. IGTree: using trees for compression and classification in lazy learning algorithms. Artif Intell Rev. 1997; 11(1):407–23.View ArticleGoogle Scholar
- Alamyar E, Duroux P, Lefranc MP, Giudicelli V. IMGT®; tools for the nucleotide analysis of immunoglobulin (IG) and t cell receptor (TR) v-(d)-j repertoires, polymorphisms, and IG mutations: IMGT/v-QUEST and IMGT/HighV-QUEST for NGS. Methods Mol Biol (Clifton, NJ). 2012; 882:569–604.View ArticleGoogle Scholar
- Brochet X, Lefranc MP, Giudicelli V. IMGT/v-QUEST: the highly customized and integrated system for IG and TR standardized v-j and v-d-j sequence analysis. Nucleic Acids Res. 2008; 36:503–8.View ArticleGoogle Scholar
- Yousfi Monod M, Giudicelli V, Chaume D, Lefranc MP. IMGT/JunctionAnalysis: the first tool for the analysis of the immunoglobulin and t cell receptor complex v-j and v-d-j JUNCTIONs. Bioinformatics (Oxford, England). 2004; 20 Suppl 1:379–85.View ArticleGoogle Scholar
- Alamyar E, Giudicelli V, Li S, Duroux P, Lefranc MP. IMGT/HighV-QUEST: the IMGT web portal for immunoglobulin (IG) or antibody and t cell receptor (TR) analysis from NGS high throughput and deep sequencing. Immunome res. 2012; 8(1):26.Google Scholar
- Lefranc MP, Giudicelli V, Duroux P, Jabado-Michaloud J, Folch G, Aouinti S, et al. IMGT®;, the international ImMunoGeneTics information system®; 25 years on. Nucleic Acids Res. 2015; 43:413–22.View ArticleGoogle Scholar
- Arnaout R, Lee W, Cahill P, Honan T, Sparrow T, Weiand M, et al. High-resolution description of antibody heavy-chain repertoires in humans. PloS One. 2011; 6(8):e22365.View ArticlePubMedPubMed CentralGoogle Scholar
- Schaller S, Weinberger J, Danzer M, Gabriel C, Oberbauer R, Winkler SM. Mathematical modeling of the diversity in human b- and t-cell receptors using machine learning. Proc 26 th Eur Model Simul Symp. 2014.Google Scholar
- Rechenberg I. Evolution strategy: nature’s way of optimization. In: Optimization: methods and applications, possibilities and limitations. Volume 47. Springer, Lecture Notes in Engineering: 1989. p. 106–26. http://link.springer.com/chapter/10.1007\%2F978-3-642-83814-9_6.
- van Dongen JJM, Langerak AW, Brüggemann M, Evans PaS, Hummel M, Lavender FL, et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and t-cell receptor gene recombinations in suspect lymphoproliferations: Report of the BIOMED-2 concerted action BMH4-CT98-3936. Leukemia. 2003; 17(12):2257–317.View ArticlePubMedGoogle Scholar
- Wagner S, Kronberger G, Beham A, Kommenda M, Scheibenpflug A, Pitzer E, et al. Architecture and Design of the HeuristicLab Optimization Environment. In: Advanced Methods and Applications in Computational Intelligence, Topics in Intelligent Engineering and Informatics Series. Springer: 2014. p. 197–261. http://dev.heuristiclab.com.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.