CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies
© Nakamura et al; licensee BioMed Central Ltd. 2010
Received: 22 April 2010
Accepted: 27 July 2010
Published: 27 July 2010
Immunoglobulin (IG or antibody) and the T-cell receptor (TR) are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]).
This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I) and 1,470 on hematological tumors (Group II). Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable.
The CIG-DB serves as a bridge between immunological gene data and cancer studies, presenting annotation on IG, TR, and their epitopes. This database contains IG and TR data classified into two cancer-related groups and is able to automatically classify accumulating entries into these groups. The entries in Group I are particularly crucial for cancer immunotherapy, providing supportive information for genetic engineering of novel antibody medicines, tumor-specific TR, and peptide vaccines.
The immune system is inherent in vertebrates and provides protection against toxic substances or infectious diseases. Two antigen receptor proteins, immunoglobulin (IG) expressed on B lymphocytes or secreted by plasma cells, and the T-cell receptor (TR), expressed on T lymphocytes, are key molecules for humoral immunity and cell-mediated immunity, respectively . Each of these proteins consists of two chain types, called light and heavy chains for IG (there are two identical light chains and two identical heavy chains in an IG), and alpha and beta chains, or gamma and delta chains for TR. Each chain contains, at its N-terminal end, a variable (V) domain which participates in antigen recognition. The V domain is encoded by two or three genes, a V gene, a diversity (D) gene (for heavy, beta and delta chains) and a joining (J) gene, which rearrange through somatic recombination . In the V domain, three complementarity determining regions (CDRs), which are especially sequence-diversified, contact antigenic epitopes. In particular, the third CDR (CDR3) is the most diversified among the CDRs at the junction of V(D)J recombination and is considered crucial for the recognition of epitopes [3–5].
Cancer cells proliferate abnormally compared to normal cells, often expressing proteins (tumor-associated antigens) that cannot be seen in normal developmental stages . In cancer studies, monitoring the immune status of patients is thus very important for diagnosis, as expression of an autoantibody  and the activation of cytotoxic T lymphocytes (CTLs)  specific to tumor-associated antigens are observed. In hematological tumors, such as leukemia or lymphoma, IG and TR themselves are the subject of investigation, because the encoding genes are often mutated by translocation in tumor B or T cells . Moreover, in recent years, these antigen receptor proteins have attracted attention in the field of cancer immunotherapy to elevate the patient's immune response against tumor cells with few side-effects [10, 11]. In cellular immunotherapy, T cells recognizing tumor-associated antigens can be administrated back to patients after ex vivo culture and processing for immune response enhancement.
During the last decade, monoclonal antibodies have been sought and engineered as candidates for molecular target drugs . These molecules can recognize the cancer cells expressing tumor-associated antigens with high affinity and selectivity, triggering anticancer effects [12, 13], such as complement dependent cytotoxicity, antibody-dependent cellular cytotoxicity, inhibition of angiogenesis, and induction of apoptosis. In general, the source of antibody medicines is the human or mouse: (i) fully murine, (ii) chimeric with V domains from the mouse and constant regions from the human, and (iii) humanized or human antibodies have been developed . For instance, trastuzumab (trade name Herceptin), a humanized antibody that targets the human epidermal growth factor receptor type 2 protein, has shown success in the treatment of breast cancer . Around 10 antibody medicines in cancer therapy are now approved and another 30 are being assessed in clinical trials in the USA .
Cancer vaccines are another type of immunotherapy, in which partial epitope peptides of tumor-associated antigens are administered to patients to potentiate CTL activity . TR on CD8+ T cells recognize the peptide vaccines bound with human leukocyte antigens (HLA), which are displayed by antigen-presenting cells, and enhance cytotoxic activity against cancer cells carrying the peptides . Each peptide vaccine is, in general, 9-10 amino acids long and selected according to the patient's HLA allele type.
These immunotherapeutic studies have emphasized the importance of genetic engineering of IG and TR proteins and peptide vaccines at the sequence level [17–21]. Currently, there are several immunological gene databases available online, such as IMGT®, the international ImMunoGeneTics information system® and the Immune Epitope Database and Analysis Resource (IEDB) . Although the contents of these databases are well-annotated and specific to genetic information on IG, TR, and their epitopes, there are still gaps between this information and the clinical application in cancer research. In particular, the information supplied is too broad for clinicians and pharmacologists (for example, such databases store the data from a large variety of organisms and the majority are unrelated to cancer) to easily obtain the information required for patient-specific treatment. To address these issues, we have developed a freely accessible database, the Cancer-related Immunological Gene Database (CIG-DB). The database integrates the information on IG, TR, and epitopes reported in cancer studies, and presents sequence analysis tools, and structural data.
Construction and content
The CIG-DB is a semi-automated database consisting of four tables, two of which are for IG and TR proteins, respectively. All the included proteins are only derived from the human or mouse. The other two tables are for epitopes of IG or TR, where we collected the amino acid sequences from a variety of organisms regardless of whether they were cancer-related or not. Since the available amount of cancer-related epitope data is still small, a large number of sequences are considered useful for further comparative analysis.
CIG-DB statistics as of 1 October 2009
Screened from NCBI and PDB
Classification of cancer-related references in CIG-DB
Collected from PubMed
Screened by keywords
Regarding epitopes, the interaction between IG/TR and an epitope is the specific focus of this database. The source of epitope sequences was public databases: the IEDB, Bcipep , and the HIV sequence database , and where possible the PDB, if epitopes were crystallized as bound with IG or TR. We then extracted the protein or peptide epitopes and selected those whose antigen receptor sequences are known. The matching criteria were as follows: (i) the complex structure of the antigen receptor and epitope are already in the PDB, (ii) for the epitopes from the IEDB, the GI numbers of the receptors are found in the IG/TR tables of the CIG-DB, and (iii) manual check of references. As a whole, we obtained a total of 920 epitope sequences, 772 for IG, and 148 for TR (Table 1). Antigen-presenting cells display as T cell epitopes around 9-mer peptides, and TR epitopes are therefore always of linear sequences. In the case of IG epitopes, conformational ones neighbored through antigen folding are possible. To show the key residues involved in the interaction with the antigen receptor, residues were highlighted based on the distances of the residues to the receptor's CDR3 (< 4 angstroms).
Reference clustering and classification
Validation of reference classification by canonical discriminant analysis
Protein and group a
Reference and sequence search
The CIG-DB provides two search engines: (i) sequence search and (ii) reference search, which users can select on the home page.
Users can perform keyword searches by GI numbers, amino acid sequences, or GenBank accessions of IG/TR/epitope entries. Users can select eight search conditions, "Contains/Does not contain," "Equals/Does not equal," "Starts with/Does not start with," and "Ends with/Does not end with." The search result is shown as a table that can be sorted in ascending or descending order. Most importantly, the sequence search for IG or TR entries can be focused on cancer-related sequences by checking the filter option box. In a resultant table, GI numbers and references are linked to GenBank and PubMed online, respectively, and the reference IDs (i.e., PubMed IDs) shown are numbered as "1" (cancer therapy) or "2" (hematological tumors). The same search engine is also available for epitope entries.
Sequence analysis tools
3D structure visualization and modeling
Database implementation and update
The CIG-DB is a Java web application developed using an open source Ajax framework, ZK, which uses MySQL as a database backend. As a high performance search engine, this database is equipped with Apache Lucene. The application runs on a servlet container, Apache Tomcat. An update of the basic contents is semi-automatic, achieved by Perl and Shell scripts. In the future, the reference classification script by R programs will be integrated into the automation.
Immunotherapy is now an effective treatment for cancer, where the information on cancer-antigen-specific IG or TR as well as the epitopes, is essential for its use. In particular, genetic engineering, such as the improvement of antibody sequences as molecular target drugs or the design of epitope peptides for cancer vaccines, are of great interest to clinicians and pharmacologists. For such studies, a sequence-based database for cancer immunological genes has potential utility. In addition, sequence comparison tools and structural data are useful for genetic engineering, combined with library screening methods in a laboratory [33, 34]. Our database, the CIG-DB, thus may meet the needs of researchers involved in such cancer studies. Moreover, this database is also designed as a reference-based platform, equipped with a search engine by MeSH terms, title, and abstract words (Figure 2a). In previous public databases, one could find the immunological gene sequences, but the references were not always related to cancer, or one could search for cancer-related papers, but the sequences frequently were undetermined. Our database avoids such inconveniences and users can easily obtain the cancer-related IG/TR sequences that are available. For comparison, IMGT also provides with the data sheets on therapeutic monoclonal antibodies related to oncology http://www.imgt.org/mAb-DB/query, but does not yet encompass the information about cancer vaccines and TRs specific to tumor-associated antigens. The IEDB is an exhaustive epitope resource and covers a wide range of experimental details, but the search option is not optimized to find cancer-related protein/peptide sequences.
Considering their application in cancer studies, IG and TR proteins are derived from two mammals, the human and the mouse. In this study, it was found that such cancer-related IG/TR proteins can be classified into two groups, namely, cancer therapy (Group I) and hematological tumors (Group II). One more group (Group III) of unrelated references includes papers mainly involving experiments about hybridoma using "myeloma" or those about irrelevant "tumors," which are therefore wrongly selected by keyword matching. The classification of cancer-related IG and TR is thus very important for ensuring a high quality of the CIG-DB. Maintenance of the current version of the CIG-DB is semi-automated from initially obtaining the sequence data to presenting the final tables on the graphical user interface. Manual operation is only necessary for the very classification of IG and TR annotation entries into the two groups by checking their references, but this method would be a troublesome task for future updates. To reduce this burden, we prepared a computational classification program inside the database. As a basis for this step, PCA results suggest that the two groups can be discriminated from each other by the term frequencies in MEDLINE texts (Figure 1). Although Group III overlaps somewhat with Group II in the case of IG references, the distributions can be statistically discriminated from each other (P < 0.001, multivariate analysis of variance). In this study, we calculated a canonical discriminant function using a set of references that were classified manually in the current version of the database as a training dataset. We evaluated the function by leave-one-out cross validation and obtained high accuracy rates (Table 3), strongly suggesting that our method can be applicable for an automated update. For the next update, this procedure may possibly be integrated into the maintenance programs of the CIG-DB. Alternatively, we will perform a manual classification and calculate the discriminant function again over a few more updates, further improving the accuracy of the database.
This database provides structural data of antigen receptors. Particularly, for all non-PDB IG/TR entries, the five predicted structural models are available in PDB format, which allows users to study the interaction between IG/TR and the epitope using molecular dynamics or docking simulation. This is an advantage of our database over other public databases, such as IMGT and the IEDB which provide only known structures. It should be noted that the combinatorial study using these tools and IG/TR data in Group I are efficient for cancer immunotherapy, in which genetic design and engineering are performed for developing novel antibody medicines, tumor-specific TR, and peptide vaccines with potent anticancer effects. Recently, new technologies have allowed rapid cloning of effective antibodies to specific antigens [35, 36]. It is thus likely that a large number of antibodies and TR for tumor-associate antigens will be open for genetic improvement in the near future. Since the CIG-DB is a semi-automated database, the latest information will be quickly reflected there. We believe that accumulation of IG/TR/epitope data will enhance the usefulness of this database in clinical cancer studies.
The CIG-DB is designed to serve as a bridge between immunological gene data and cancer studies, presenting annotations of IG, TR, and their epitopes. In its current version, the database has 2,081 cancer-related human and murine receptor entries (1,605 for IG and 476 for TR), and 920 entries for epitopes bound with receptors from a variety of organisms. Regarding IG and TR proteins, this database further provides a helpful guide to two detailed groups; one for cancer therapy and the other for hematological tumors. For the next update, we have developed a powerful method to automatically classify cancer-related entries into these two groups. The high precision (~95% accuracy) shown in validation assessments is promising for the efficient performance of the database. Moreover, the CIG-DB is equipped with sequence- and reference-search engines and analysis tools, the results of which can be utilized for advanced studies. In particular, the database will play important roles in cancer immunotherapy, by integrating the accumulating patient-specific IG/TR/epitope sequence data by novel cloning technologies.
Availability and requirements
The CIG-DB is available without charge or registration at http://www.scchr-cigdb.jp/. We have confirmed that the web site can be viewed by Internet Explorer 7 or later, Safari 3, and Firefox 3. The Java Runtime Environment (JRE 1.5 or higher) is required for displaying 3D structures by Jmol applet.
We thank A. U. Umagiliya of Bioinformatics Institute of Global Good Inc. (BiGG) for technical supports in database construction, and A. Iizuka, K. Ozawa and M. Komiyama for database test and helpful comments. We are also grateful to T. Makino for preparing prototype scripts. This work was supported in part by a grant in Cooperation of Innovative Technology and Advanced Research in an Evolutional Area (CITY AREA) from the Ministry of Education, Culture, Sports, Science and Technology, Japan (MEXT).
- Abbas AK, Lichtman AH, Pillai S: Cellular and Molecular Immunology. 6th edition. Philadelphia: Saunders; 2007.Google Scholar
- Tonegawa S: Somatic generation of antibody diversity. Nature 1983, 302(5909):575–581. 10.1038/302575a0View ArticlePubMedGoogle Scholar
- Chen C, Stenzel-Poore MP, Rittenberg MB: Natural auto- and polyreactive antibodies differing from antigen-induced antibodies in the H chain CDR3. J Immunol 1991, 147(7):2359–2367.PubMedGoogle Scholar
- Davies DR, Cohen GH: Interactions of protein antigens with antibodies. Proc Natl Acad Sci USA 1996, 93(1):7–12. 10.1073/pnas.93.1.7View ArticlePubMedPubMed CentralGoogle Scholar
- Jorgensen JL, Esser U, Fazekas de St Groth B, Reay PA, Davis MM: Mapping T-cell receptor-peptide contacts by variant peptide immunization of single-chain transgenics. Nature 1992, 355(6357):224–230. 10.1038/355224a0View ArticlePubMedGoogle Scholar
- Forgber M, Trefzer U, Sterry W, Walden P: Proteome serological determination of tumor-associated antigens in melanoma. PLoS One 2009, 4(4):e5199. 10.1371/journal.pone.0005199View ArticlePubMedPubMed CentralGoogle Scholar
- Tan EM, Zhang J: Autoantibodies to tumor-associated antigens: reporters from the immune system. Immunol Rev 2008, 222: 328–340. 10.1111/j.1600-065X.2008.00611.xView ArticlePubMedPubMed CentralGoogle Scholar
- Nagorsen D, Scheibenbogen C, Marincola FM, Letsch A, Keilholz U: Natural T cell immunity against cancer. Clin Cancer Res 2003, 9(12):4296–4303.PubMedGoogle Scholar
- Nowell PC: Chromosomal approaches to hematopoietic oncogenesis. Stem Cells 1993, 11(1):9–19. 10.1002/stem.5530110104View ArticlePubMedGoogle Scholar
- Waldmann TA: Immunotherapy: past, present and future. Nat Med 2003, 9(3):269–277. 10.1038/nm0303-269View ArticlePubMedGoogle Scholar
- Finn OJ: Tumor immunology top 10 list. Immunol Rev 2008, 222: 5–8. 10.1111/j.1600-065X.2008.00623.xView ArticlePubMedGoogle Scholar
- Liu XY, Pop LM, Vitetta ES: Engineering therapeutic monoclonal antibodies. Immunol Rev 2008, 222: 9–27. 10.1111/j.1600-065X.2008.00601.xView ArticlePubMedGoogle Scholar
- Kubota T, Niwa R, Satoh M, Akinaga S, Shitara K, Hanai N: Engineered therapeutic antibodies with improved effector functions. Cancer Sci 2009, 100(9):1566–1572. 10.1111/j.1349-7006.2009.01222.xView ArticlePubMedGoogle Scholar
- Ross JS, Fletcher JA: The HER-2/neu oncogene in breast cancer: prognostic factor, predictive factor, and target for therapy. Stem Cells 1998, 16(6):413–428. 10.1002/stem.160413View ArticlePubMedGoogle Scholar
- Itoh K, Yamada A, Mine T, Noguchi M: Recent advances in cancer vaccines: an overview. Jpn J Clin Oncol 2009, 39(2):73–80. 10.1093/jjco/hyn132View ArticlePubMedGoogle Scholar
- Varela-Rohena A, Carpenito C, Perez EE, Richardson M, Parry RV, Milone M, Scholler J, Hao X, Mexas A, Carroll RG, et al.: Genetic engineering of T cells for adoptive immunotherapy. Immunol Res 2008, 42(1–3):166–181. 10.1007/s12026-008-8057-6View ArticlePubMedPubMed CentralGoogle Scholar
- Stewart-Jones G, Wadle A, Hombach A, Shenderov E, Held G, Fischer E, Kleber S, Nuber N, Stenner-Liewen F, Bauer S, et al.: Rational development of high-affinity T-cell receptor-like antibodies. Proc Natl Acad Sci USA 2009, 106(14):5784–5788. 10.1073/pnas.0901425106View ArticlePubMedPubMed CentralGoogle Scholar
- Robbins PF, Li YF, El-Gamil M, Zhao Y, Wargo JA, Zheng Z, Xu H, Morgan RA, Feldman SA, Johnson LA, et al.: Single and dual amino acid substitutions in TCR CDRs can enhance antigen-specific T cell functions. J Immunol 2008, 180(9):6116–6131.View ArticlePubMedPubMed CentralGoogle Scholar
- Parkhurst MR, Joo J, Riley JP, Yu Z, Li Y, Robbins PF, Rosenberg SA: Characterization of genetically modified T-cell receptors that recognize the CEA:691–699 peptide in the context of HLA-A2.1 on human colorectal cancer cells. Clin Cancer Res 2009, 15(1):169–180. 10.1158/1078-0432.CCR-08-1638View ArticlePubMedPubMed CentralGoogle Scholar
- Schoonbroodt S, Steukers M, Viswanathan M, Frans N, Timmermans M, Wehnert A, Nguyen M, Ladner RC, Hoet RM: Engineering antibody heavy chain CDR3 to create a phage display Fab library rich in antibodies that bind charged carbohydrates. J Immunol 2008, 181(9):6213–6221.View ArticlePubMedGoogle Scholar
- Yoon SO, Lee TS, Kim SJ, Jang MH, Kang YJ, Park JH, Kim KS, Lee HS, Ryu CJ, Gonzales NR, et al.: Construction, affinity maturation, and biological characterization of an anti-tumor-associated glycoprotein-72 humanized antibody. J Biol Chem 2006, 281(11):6985–6992. 10.1074/jbc.M511165200View ArticlePubMedGoogle Scholar
- Lefranc MP, Giudicelli V, Kaas Q, Duprat E, Jabado-Michaloud J, Scaviner D, Ginestoux C, Clement O, Chaume D, Lefranc G: IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res 2005, (33 Database):D593–597.Google Scholar
- Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B: The Immune Epitope Database 2.0. Nucleic Acids Res 2009.Google Scholar
- Lefranc M-P, Lefranc G: The Immunoglobulin FactsBook. Academic Press, London, UK; 2001.Google Scholar
- Giudicelli V, Chaume D, Lefranc MP: IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes. Nucleic Acids Res 2005, (33 Database):D256–261.Google Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389View ArticlePubMedPubMed CentralGoogle Scholar
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al.: The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002, 58(Pt 6 No 1):899–907. 10.1107/S0907444902003451View ArticlePubMedGoogle Scholar
- Saha S, Bhasin M, Raghava GP: Bcipep: a database of B-cell epitopes. BMC Genomics 2005, 6(1):79. 10.1186/1471-2164-6-79View ArticlePubMedPubMed CentralGoogle Scholar
- Korber BTM, Brander C, Haynes BF, Koup R, Moore JP, Walker BD, Watkins DI, (eds): HIV Molecular Immunology. Los Alamos National Laboratory; 2006.Google Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al.: Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23(21):2947–2948. 10.1093/bioinformatics/btm404View ArticlePubMedGoogle Scholar
- Jmol: an open-source Java viewer for chemical structures in 3D[http://jmol.sourceforge.net/]
- Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993, 234(3):779–815. 10.1006/jmbi.1993.1626View ArticlePubMedGoogle Scholar
- Dona MG, Giorgi C, Accardi L: Characterization of antibodies in single-chain format against the E7 oncoprotein of the human papillomavirus type 16 and their improvement by mutagenesis. BMC Cancer 2007, 7: 25. 10.1186/1471-2407-7-25View ArticlePubMedPubMed CentralGoogle Scholar
- Ni M, Yu B, Huang Y, Tang Z, Lei P, Shen X, Xin W, Zhu H, Shen G: Homology modelling and bivalent single-chain Fv construction of anti-HepG2 single-chain immunoglobulin Fv fragments from a phage display library. J Biosci 2008, 33(5):691–697. 10.1007/s12038-008-0089-5View ArticlePubMedGoogle Scholar
- Jin A, Ozawa T, Tajiri K, Obata T, Kondo S, Kinoshita K, Kadowaki S, Takahashi K, Sugiyama T, Kishi H, et al.: A rapid and efficient single-cell manipulation method for screening antigen-specific antibody-secreting cells from human peripheral blood. Nat Med 2009, 15(9):1088–1092. 10.1038/nm.1966View ArticlePubMedGoogle Scholar
- Wrammert J, Smith K, Miller J, Langley WA, Kokko K, Larsen C, Zheng NY, Mays I, Garman L, Helms C, et al.: Rapid cloning of high-affinity human monoclonal antibodies against influenza virus. Nature 2008, 453(7195):667–671. 10.1038/nature06890View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.