Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines
© Fiers et al; licensee BioMed Central Ltd. 2004
Received: 18 May 2004
Accepted: 16 September 2004
Published: 16 September 2004
Novel proteins entering the food chain, for example by genetic modification of plants, have to be tested for allergenicity. Allermatch™ http://allermatch.org is a webtool for the efficient and standardized prediction of potential allergenicity of proteins and peptides according to the current recommendations of the FAO/WHO Expert Consultation, as outlined in the Codex alimentarius.
A query amino acid sequence is compared with all known allergenic proteins retrieved from the protein databases using a sliding window approach. This identifies stretches of 80 amino acids with more than 35% similarity or small identical stretches of at least six amino acids. The outcome of the analysis is presented in a concise format. The predictive performance of the FAO/WHO criteria is evaluated by screening sets of allergens and non-allergens against the Allermatch databases. Besides correct predictions, both methods are shown to generate false positive and false negative hits and the outcomes should therefore be combined with other methods of allergenicity assessment, as advised by the FAO/WHO.
Allermatch™ provides an accessible, efficient, and useful webtool for analysis of potential allergenicity of proteins introduced in genetically modified food prior to market release that complies with current FAO/WHO guidelines.
Obtain the amino acids sequences of known allergens in protein databases in FASTA format (using the amino acids from the mature proteins only, disregarding the leader sequences, if any).
Prepare the complete set of 80-amino acid length sequences derived from the query protein (again disregarding the leader sequence, if any).
Compare each of the sequences of (2) with all sequences of (1), using the program FASTA  with default settings for gap penalty and extension.
More than 35 % similarity over a window of 80 amino acids of the query protein with a known allergen.
A stretch of identity of 6 to 8 contiguous amino acids.
This procedure is described in more detail by the expert consultation and the Codex Alimentarius. Potential allergenicity requires further testing of the protein with panels of patient sera and possibly animal exposure tests [1, 2].
Construction and content
Mode 1: Sliding window approach
The query protein sequence is divided into 80 amino acid (aa) windows using a sliding window with steps of a single residue. Each of these windows is compared with all sequences in the allergen database of choice. All database entries showing a similarity higher than a configurable threshold percentage (default is 35%) to any of the 80 aa query sequence windows are flagged. Upon completion of the analysis, a table is shown with all flagged database entries. Per entry, the highest similarity score is given, as well as the number of windows having a similarity above the cut-off percentage. For each allergen database entry identified, more detailed information on the similarity between the allergen and query sequence can be retrieved, such as those areas of both proteins within all 80 aa windows scoring above the cut-off percentage. The similarity score calculated by FASTA can apply to stretches smaller than 80 aa, Allermatch™ converts such a similarity score to an 80 aa window. For example, 40% similarity on a stretch of 40 aa converts to 20% similarity on an 80 aa window.
Mode 2: Wordmatch
This method looks for short sub-sequences (words), which have a perfect identity with a database entry. The wordsize is configurable (default is 6 aa). The output given is similar to the output given by Mode 1. All database entries with at least one hit are listed and for each of these, more detailed information can be retrieved upon request.
Mode 3: full FASTA alignment with an Allermatch™ allergen database
The Allermatch™ webtool also offers a full alignment of the query sequence with either of the allergen databases using FASTA. Although this full alignment is currently not required by the FAO/WHO guidelines, the full alignment of protein sequences helps positioning of regions of potential allergenicity in the whole primary structure of the protein. The FASTA output is parsed and information from the allergen database is added and presented.
Utility and discussion
To examine the predictive performance of the FAO/WHO criteria for potential allergenicity, we have performed two tests. The first test determines the percentage of false negative and the second test assesses the amount of false positives. Both tests are performed with standard settings; for the sliding window approach an 80 amino acid window with a 35% similarity cutoff is used and for the wordmatch approach 6, 7 and 8 aa word sizes are tested.
Prediction quality of the FAO/WHO methods.
False negatives (corrected)
71 / 334
57 / 320
3 / 12
54 / 334
7 / 12
69 / 334
6 / 12
78 / 334
3 / 12
99 / 632
78 / 611
4 / 12
58 / 632
9 / 12
98 / 632
8 / 12
117 / 632
3 / 12
SwissProt & WHO-IUIS
101 / 730
77 / 706
5 / 12
55 / 730
9 / 12
95 / 730
8 / 12
115 / 730
3 / 12
Sequences used for the negative control
Evidence for non-allergenicity
Amaranth seed albumin
IgG-response, but no raised IgE-levels, after administration (intranasal and intraperitoneal) of amaranth seed albumin to mice
No reaction of recombinant T1 in IgE-sera binding, basophile histamine release, and skin prick testing using patients allergic to the related birch pollen allergen Bet v 1
Mite ferritin heavy chain
Reaction of mite ferritin with IgG, but not with IgE, of sera from patients allergic to house dust mite
Maltose binding protein
No reaction with IgE-sera from patients allergic to natural rubber latex (maltose binding protein used as part of fusion proteins with latex allergens)
Human serum albumin
No reaction of human serum albumin with IgE-sera of patients allergic to cat- and porcine-serum albumin
Human heat shock protein 70
No reaction of human heat shock protein 70 with IgE-sera of patients allergic to heat shock protein 70 from Echinococcus granulosus
Human beta-2-glycoprotein I
Presence of IgM antibodies, but not of IgE antibodies, directed against human beta-2-glycoprotein I in sera from atopic eczema/dermatitis patients
Guayule rubber particle protein
No cross-reactivity between proteins from guayule and latex using IgE-sera from patients allergic to latex
Purple acid phosphatase 1
Stimulation of IgG-, but no or only low stimulation of IgE-antibodies following administration of potato acid phosphatase to mice (oral and intraperitoneal)
Purple acid phosphatase 2
Purple acid phosphatase 3
Stimulation of IgG-, but no or only low stimulation of IgE-antibodies following administration of potato lectin to mice (intraperitoneal)
The imperfect results show here agree with literature where the FAO/WHO methods for sequence comparisons are also shown to lack full predictive capability [7–9]. Interestingly, the results show that there is a balance between false positives and negatives when increasing the threshold level for short exact matches from 6 to 8 amino acids, with the number of false positives sharply decreasing at 8 amino acids (Table 1). The outcomes of these tests therefore need to be further refined by checking for the presence of potential IgE-epitopes as recommended by Kleter and Peijnenburg , as well as combined with results of other assays as recommended by the Codex. Other methods to decrease false hit rates may also be considered [8, 9]. We plan to implement such supplementary methods in the future to support the Codex based predictions of potential allergenicity.
The prediction of potential allergenicity by primary sequence comparison depends on the quality of the data used for comparison. Addition of a non-allergenic or poorly annotated protein to any of the Allermatch™ allergen databases would obviously result in undesired false positives and should be prevented. A workable strategy could be to use multiple databases, i.e. a database based on SwissProt's list of allergens, which contains well-annotated sequences from SwissProt, simultaneously with a larger database based on the WHO-IUIS list, which contains possibly less well annotated sequences from other protein databases, such as GenPept. For example, a number of protein accessions in the WHO-IUIS database do not mention the presence of signal- and/or pro-peptides, where removal of such peptides is essential to prevent false positives. Users of Allermatch™ should, at all times, take into account the possibility of a false positive or negative, for example by checking original data (accessions, clinical literature) and confirm results, before arriving at conclusions. To prevent false positives as much as possible, one should choose for the well-annotated SwissProt database. To prevent false negatives, the combination of the larger WHO-IUIS database with that of SwissProt is more appropriate. Updates to the SwissProt and WHO-IUIS allergen lists will be incorporated in the Allermatch™ databases on a regular basis.
Several other websites in the public domain offer sequence alignment facilities that support the prediction of potential allergenicity, such as SDAP [10, 11], AllerPredict  and Farrp . These websites offer search algorithms that find contiguous similar amino acids between a query sequence and database sequences (SDAP, AllerPredict) and more than 35% identity in alignments (SDAP, AllerPredict), as well as a general FASTA of a query protein sequence against the database (SDAP, Farrp).
Allermatch™ is an efficient and comprehensive webtool that combines all bioinformatics approaches required to assess the allergenicity of protein sequences according to the current guidelines in the Codex. The application will be kept up to date with the FAO/WHO criteria and the SwissProt and WHO-IUIS allergen lists. It will be extended with other, supplementary methods to support and refine the prediction of allergenicity.
Availability and requirements
Allermatch™ is platform independent and accessible using any Netscape 4+ compatible webbrowser at http://allermatch.org.
- FAO/WHO: Allergenicity of Genetically Modified Foods.Rome, Italy, FAO/WHO 2001. [http://www.who.int/foodsafety/publications/biotech/en/ec_jan2001.pdf]Google Scholar
- FAO/WHO: Codex Principles and Guidelines on Foods Derived from Biotechnology.Rome, Italy, Joint FAO/WHO Food Standards Programme 2003. [ftp://ftp.fao.org/codex/standard/en/CodexTextsBiotechFoods.pdf]Google Scholar
- Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988, 85: 2444–2448.PubMed CentralView ArticlePubMedGoogle Scholar
- Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365–370. 10.1093/nar/gkg095PubMed CentralView ArticlePubMedGoogle Scholar
- King TP, Hoffman D, Lowenstein H, Marsh DG, Platts-Mills TA, Thomas W: Allergen nomenclature. WHO/IUIS Allergen Nomenclature Subcommittee. Int Arch Allergy Immunol 1994, 105: 224–233.View ArticlePubMedGoogle Scholar
- Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC: The Protein Information Resource. Nucleic Acids Res 2003, 31: 345–347. 10.1093/nar/gkg040PubMed CentralView ArticlePubMedGoogle Scholar
- Kleter GA, Peijnenburg AA: Screening of transgenic proteins expressed in transgenic food crops for the presence of short amino acid sequences identical to potential, IgE - binding linear epitopes of allergens. BMC Struct Biol 2002, 2: 8. 10.1186/1472-6807-2-8PubMed CentralView ArticlePubMedGoogle Scholar
- Zorzet A, Gustafsson M, Hammerling U: Prediction of food protein allergenicity: a bioinformatic learning systems approach. In Silico Biol 2002, 2: 525–534.PubMedGoogle Scholar
- Soeria-Atmadja D, Zorzet A, Gustafsson MG, Hammerling U: Statistical evaluation of local alignment features predicting allergenicity using supervised classification algorithms. Int Arch Allergy Immunol 2004, 133: 101–112. 10.1159/000076382View ArticlePubMedGoogle Scholar
- Ivanciuc O, Schein CH, Braun W: SDAP: database and computational tools for allergenic proteins. Nucleic Acids Res 2003, 31: 359–362. 10.1093/nar/gkg010PubMed CentralView ArticlePubMedGoogle Scholar
- Chakraborty S, Chakraborty N, Datta A: Increased nutritive value of transgenic potato by expressing a nonallergenic seed albumin gene from Amaranthus hypochondriacus. Proc Natl Acad Sci U S A 2000, 97: 3724–3729. 10.1073/pnas.050012697PubMed CentralView ArticlePubMedGoogle Scholar
- Laffer S, Hamdi S, Lupinek C, Sperr WR, Valent P, Verdino P, Keller W, Grote M, Hoffmann-Sommergruber K, Scheiner O, Kraft D, Rideau M, Valenta R: Molecular characterization of recombinant T1, a non-allergenic periwinkle (Catharanthus roseus) protein, with sequence similarity to the Bet v 1 plant allergen family. Biochem J 2003, 373: 261–269. 10.1042/BJ20030331PubMed CentralView ArticlePubMedGoogle Scholar
- Epton MJ, Smith W, Hales BJ, Hazell L, Thompson PJ, Thomas WR: Non-allergenic antigen in allergic sensitization: responses to the mite ferritin heavy chain antigen by allergic and non-allergic subjects. Clin Exp Allergy 2002, 32: 1341–1347. 10.1046/j.1365-2222.2002.01473.xView ArticlePubMedGoogle Scholar
- Rihs HP, Dumont B, Rozynek P, Lundberg M, Cremer R, Bruning T, Raulf-Heimsoth M: Molecular cloning, purification, and IgE-binding of a recombinant class I chitinase from Hevea brasiliensis leaves (rHev b 11.0102). Allergy 2003, 58: 246–251.View ArticlePubMedGoogle Scholar
- Hilger C, Kohnen M, Grigioni F, Lehners C, Hentges F: Allergic cross-reactions between cat and pig serum albumin. Study at the protein and DNA levels. Allergy 1997, 52: 179–187.View ArticlePubMedGoogle Scholar
- Ortona E, Margutti P, Delunardo F, Vaccari S, Rigano R, Profumo E, Buttari B, Teggi A, Siracusano A: Molecular and immunological characterization of the C-terminal region of a new Echinococcus granulosus Heat Shock Protein 70. Parasite Immunol 2003, 25: 119–126.View ArticlePubMedGoogle Scholar
- Szakos E, Lakos G, Aleksza M, Gyimesi E, Pall G, Fodor B, Hunyadi J, Solyom E, Sipka S: Association between the occurrence of the anticardiolipin IgM and mite allergen-specific IgE antibodies in children with extrinsic type of atopic eczema/dermatitis syndrome. Allergy 2004, 59: 164–167. 10.1046/j.1398-9995.2003.00367.xView ArticlePubMedGoogle Scholar
- Siler DJ, Cornish K, Hamilton RG: Absence of cross-reactivity of IgE antibodies from subjects allergic to Hevea brasiliensis latex with a new source of natural rubber latex from guayule (Parthenium argentatum). J Allergy Clin Immunol 1996, 98: 895–902.View ArticlePubMedGoogle Scholar
- Dearman RJ, Kimber I: Determination of protein allergenicity: studies in mice. Toxicol Lett 2001, 120: 181–186. 10.1016/S0378-4274(01)00276-4View ArticlePubMedGoogle Scholar
- Dearman RJ, Stone S, Caddick HT, Basketter DA, Kimber I: Evaluation of protein allergenic potential in mice: dose-response analyses. Clin Exp Allergy 2003, 33: 1586–1594. 10.1046/j.1365-2222.2003.01793.xView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.