Immunologists working on repertoires are daily facing a huge amount of disseminated data usually hard to collect and to check. To work on TR, scientists must first select TR chain of interest which may include thousands of different forms. In IMGT/GeneInfo, we gather, connect and present all needed separated parameters in a unique place. Data are checked by a dedicated quality control. By adding TRG and TRD genes, we make available all TR information for mouse and human. This information is available and presented in a convenient way for users (biological researchers, bioinformaticians, etc.) through a simple and intuitive 2 steps enhanced interface.
IMGT/GeneInfo is an information system, relying on an integrated relational database. It has been specifically designed to give users all information about V(D)J recombinations within two mouse-clicks. With the first click, the user selects the species, the locus and the type of gene combination (V-V, V-J, V-D-J) from a drop-down box [7]. With the second click, the user chooses within each type (V, D, J) the specific genes for which information is required. As opposed to other information systems, IMGT/GeneInfo does not take any user sequence for analysis. The list of genes available for user choice is determined from existing sequences from the major available databases (IMGT, EMBL, GenBank, and DDBJ). Gene choice can be made either according to the gene name, or the relative position of the gene within the locus. After the second click, user is directed to the results page. This page was enhanced in order to accelerate and make easier the rearrangement analysis (Figure 1). Now, each specific term is associated with its definition via a hypertext link. Results page is divided in seven parts (instead of 5 previously):
user query,
information sources,
schematic drawing of the locus,
synthetic view of all TR gene parameters,
DNA sequences,
spliced V-(D)-J rearranged sequences, and
link to the constant gene and allele tables in IMGT Repertoire. Among these 7 parts, the 2 new ones are:
User query
, which is a summary of the request according to species, locus and gene. This will facilitate printed data archiving;
Spliced V-(D)-J rearranged sequence
, which gives the transcribed sequence after rearrangement and splicing. Spliced V-(D)-J rearranged sequences are only provided for the TRA and TRG V-J rearrangements and for the TRB and TRD V-(D)-J rearrangements, excluding other odd combinations. These sequences display blunt ends of the rearranged V, D and J regions and are therefore useful for any comparison with in vivo rearranged sequences whose complex junctions result from nucleotide deletions and N-diversity nucleotide insertions [2]. More detailed analysis of the junctions can eventually be performed with IMGT/V-QUEST [22] and IMGT/JunctionAnalysis [23].
Besides the 2 new parts described above, we enhanced the 5 already existing parts in the results page.
Info sources
. As information sources, accession numbers are now linked to the IMGT/LIGM-DB [8] results page which provides IMGT annotations, IMGT flat file, FASTA format, EMBL flat file, coding regions with protein translation, sequence with 3 reading frames, dump format, IMGT/V-QUEST analysis.
Schematic drawing of the locus
. We created different pictures for each type of rearrangement in order to get a visual check of the user query. Two links have been added: the first one to IMGT Repertoire Locus representation, which is a complete locus representation compared to our schematic drawing, and the second one to the IMGT/LocusView tool which gives a graphical representation of the gene location inside the locus [24].
Synthetic view of all TR gene parameters
. Each label in the table follows the IMGT standardized labels (defined in the IMGT Scientific chart rules and based on the DESCRIPTION concept of IMGT-ONTOLOGY) [21]. Sequences of the labels of the inverted genes (human TRBV30 and TRDV3, mouse TRBV31, TRDV5, TRGV2 and TRGJ2) are presented in the "sense" DNA strand orientation 5'-3' (IMGT Index, DNA strand orientation, http://imgt.cines.fr). RSS labels are presented in the 5'-3' orientation. For each label, users can obtain the relevant definition. The gene name link connects to the IMGT/GENE-DB entry [18] which gives additional information such as the chromosomal localization, reference alleles, known cDNAs for a given gene, etc.
DNA sequences
. They are now headed by a title to differentiate them from the spliced V-(D)-J rearranged sequences. In addition, we provide the size of all sequences in base pairs. Besides the complete DNA sequence as usually given by other web sites, we provide the individual parts of the gene sequence (L-PART1, V-INTRON and V-EXON), for a convenient handling by biologists.
Link
. A link from the constant gene is directed to gene and allele table in IMGT Repertoire which describes the various alleles available for a given constant gene, and to IMGT/GENE-DB [18].
The advantage our system has over existing ones, is that all information concerning the TR is synthesized in one result page, and this page can be accessed in only two computer mouse-clicks. It is also designed in an educational way in that we have a drawing to explain how different parts are organized in the selected V(D)J rearrangement.
In existing applications, sequences are given in flat files (e.g. with line numbers and intergenic sequences).
Thus, the user must manually remove line numbers and then find where is the desired sequence before being able to extract it. In our system, the user can directly copy and paste the sequence without the risk of mistake during manual selection and extraction.
Advances in genomic and postgenomic technologies still need annotation of the genes and bioinformatics tools to select and analyze specific genes. IMGT/GeneInfo extension to the Homo sapiens and Mus musculus TRD and TRG loci will be helpful for users to gather pertinent information about antigen receptor genes which is required to accurately describe the αβ or γδ T cells. This integrated bioinformatics information system will allow a rapid and secure selection of all TR DNA sequences to study the rearrangement frequency by molecular approaches such as Southern blot, real time PCR, multiplex PCR or microarray assays.
Future directions
Future work on IMGT/GeneInfo will proceed in two main directions. On the application side, we will provide a wider range of analysis tools aimed at evaluating and characterizing TR in terms of amino acid sequences. On the infrastructure side, we will increase the amount of data offered by IMGT/GeneInfo by integrating IG data. Finally, we are planning to extend IMGT/GeneInfo to include information about genes in other organisms besides human and mouse.