GuiTope: an application for mapping random-sequence peptides to protein sequences
© Halperin et al; licensee BioMed Central Ltd. 2012
Received: 19 September 2011
Accepted: 3 January 2012
Published: 3 January 2012
Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task.
GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance.
GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
Random-sequence peptide library screening approaches represent an increasingly popular and powerful tool for identifying binding partners for antibodies and other proteins as well as carbohydrates, pharmaceuticals, and other small molecules. Peptide library methods generally fall into two categories: molecular display approaches such as phage display, and immobilized arrays such as SPOT. Display approaches can typically accommodate much larger libraries, but information is typically obtained only on the clones that survive several rounds of panning, resulting in a population that is heavily biased in favor of clones whose sequences facilitate growth . In contrast, array based approaches may be used to screen smaller libraries with higher throughput than display approaches and semi-quantitative binding information is obtained on all of the peptides in the library. New technologies both on the display side and the array approach promise to overcome these limitations [2–4]. The decreasing cost of both sequencing and peptide synthesis as well as applications such as profiling the humoral immune response  promise to increase interest in connecting random-sequence peptide mimotopes to protein sequences occurring in nature. Therefore, an increase in the demand for appropriate algorithms and software to facilitate the data analysis would also be expected.
While the peptides discovered in these library screening experiments serve as useful ligands in and of themselves, comparison of these sequences to natural protein sequences can reveal novel biological insight. Peptides selected by panning phage display libraries against monoclonal antibodies often closely match the antibody epitope making the sequence comparison rather straightforward . If a strong enough motif is uncovered among the peptide sequences, it may even be used to search a database to predict an antibody target . Though current array technology does not allow sufficient coverage of sequence space to contain sequences closely resembling natural protein sequences by chance, we have shown that experiments of this type still have utility for predicting monoclonal epitopes . Other groups have shown that peptides selected to bind to other types of proteins have utility in understanding and predicting binding to natural binding partners [9–11]. Even small molecule binding peptides provide insight on their binding to natural proteins [3, 12].
Analysis of the peptide sequences obtained from any selection experiment poses two key challenges. First, a set of peptides need to be compared against a protein database. Second, an appropriate scoring scheme is needed to search for structural similarity rather than evolutionary relationships. At first glance, the FASTS/FASTF programs appear to address the first challenge, as they are designed to take peptide sequences generated from protein sequencing techniques and identify homologous proteins . However, the FASTS/FASTF programs search for cases where peptides align to non-overlapping regions of the protein sequence, while we would like to identify regions where the peptides align to the same region of the protein sequence. Another approach is to identify a motif among the selected peptide sequences and use the consensus sequence or a probabilistic representation of the motif to compare to the protein sequence(s) of interest . We previously demonstrated that the glam2 motif finding program is suitable for analyzing random-sequence peptide data [8, 15]. While the motif approach may be powerful in many cases, the peptides of interest may not always have a common pattern because different amino acids may match in the same region of the sequence, or peptides may align to different parts of the protein sequence(s). Another approach would be to align each discovered peptide sequence to the protein sequence targets and sum the alignment scores at each position. The RELIC MATCH program (not currently available or supported) used this approach with some success [3, 9, 10, 12, 16]. This program also had several limitations with regards to transparency, flexibility, statistical analysis, and the ability to search multiple sequences. Here we present an open source application that gives the user access to all parameters, can empirically estimate the statistical significance of the results, and enables the analysis of many sequences at once.
The user inputs protein sequence(s) to search, a set of selected peptides, and (optionally) a representative or complete list of peptides from the library. A scoring matrix may be generated by the program as described below or entered by the user. The maximal local alignment between each selected peptide and protein sequence is found. If the alignment score is greater than the user defined score threshold, the score at each protein residue position is added to the protein residue scores. If the moving average window size is set to greater than one, after all peptides have been aligned to a given protein, the moving average across the protein residue positions is calculated and the residue scores provided correspond to the score at the start of the window. The same number of peptides as in the selected list are randomly selected from the library if a library set was entered, and these are aligned to the protein(s) in the same manner as for the selected peptides; this process is repeated for the specified number of sampling iterations. If the subtract library scores box is checked, the average scores at each residue position from the randomly selected peptides from the library are subtracted from the residue scores. The selected peptide scores across each protein sequence are graphed, as well as the maximum and average scores from the random sampling iterations. The user may use the sort button to order the proteins by their maximal residue scores. The text output tab may be used to view a summary table of the maximum alignment scores for each protein or a table of all of the alignments identified for the number of proteins specified.
GuiTope generates a log-odds-like scoring matrix based on a given measure of amino acid distances and amino acid frequencies. The distance matrix is taken to be inversely proportional to the frequencies of an amino acid pair appearing in a true alignment after a pseudocount of 10% of the average distance is added to the distance matrix to avoid dividing by zero. The rows and columns are iteratively scaled to sum to the expected amino acid frequencies. This matrix is then divided by the product of protein and peptide amino acid frequencies at each position and log10 transformed.
Alignment algorithm and inversion scoring
The maximal gapless local alignment of each peptide with each protein is calculated using the Smith-Waterman algorithm. If the inversion weight is set to greater than 0, the program will identify sequence positions where the protein residue at position i is the same as the peptide residue at position j +1 AND the protein residue at position i+1 is the same as the peptide residue at position j. The residue scores for these inversions will be the product of the inversion weight and the average of the identity scores for the amino acids at the protein positions i and i+1.
For each sampling iteration and each protein sequence, a set of peptides, with the same number of peptides as the selected peptide list, is randomly selected from the library and the residue scores are calculated. From these, the maximum and average residue scores are calculated for each position. If the 'subtract library scores' option is selected, the average library scores are subtracted from the residue scores from each iteration. The maximum scores from each protein iteration are ranked. For each protein, the maximum residue score from the selected peptides is compared to the ranked scores. The percentage of library scores that are higher than the selected peptide score is reported as the significance.
A dataset was previously described containing lists of peptide sequences identified from random-sequence peptide microarray experiments as binding to monoclonal antibodies with known epitopes . This dataset was used to optimize Guitope's alignment parameters. A polyclonal anti-peptide dataset from the same publication was used to evaluate the algorithm. Additionally, another set of monoclonal antibodies with known epitopes was used to probe a completely different set of 10,000 random-sequence peptides on a microarray. The two anti-P53 antibodies from the first monoclonal antibody dataset were repeated on both the first and second version of the 10,000 peptide microarrays. Additionally, an anti-cMyc clone 9E10 (AbD SeroTec, Raleigh), anti-Leu-Enkaphalin clone 1193/220 (AbD SeroTec, Raleigh), anti-PBEF clone E10 (Santa Cruz Biotechnology), and anti-V5 (AbD SeroTec, Raleigh) were used to probe the array and generate lists of peptides to which the antibodies bound. Anti-cMyc, anti-Leu-Enkaphalin, and anti-V5 recognize epitope tags, while the anti-PBEF was epitope mapped using tiling peptides (current authors, manuscript in preparation). Phage display datasets that identified the greatest number of unique peptides were selected from those listed in the "several binding sites" category in Derda et al.  and these were downloaded from MimoDB http://immunet.cn/mimodb/. These phage display datasets include peptides selected against a diverse set of targets, including two human extracellular proteins, one bacterial protein, and immune sera to a virus and a bacterium.
GuiTope was implemented in Visual Basic, using the Microsoft .NET framework. It may be installed on any computer running Microsoft Windows XP or a newer Windows operating system. It has a memory footprint of 400 MB and will take anywhere between seconds to several minutes to run a set of hundreds of peptides against a single protein with 100 sampling iterations on a single Pentium 4 core, 3.2 GHz and 2 GB RAM machine running Windows XP. On the same hardware, searching a protein database of ~20,000 proteins with a set of several hundred peptides with a single sampling iteration, will utilize < 3 GB of memory and use approximately 20 hours of direct CPU time.
Results and discussion
Phage display database search
Database (number of Proteins)
Rank, p-value (Inv/No Inv)
Endothelial protein C receptor
Human Extracellular and Cell Surface Proteins (5074)
Polyclonal Anti-Nipah Virus
Nipah Proteome (9)
2, < 0.1/
2, < 0.1
Human Extracellular and Cell Surface Proteins (5074)
TGF beta 1
TGF beta 3
2, < 0.0002/
1, < 0.0002
3, < 0.0002/
2, < 0.0002
Anti-M. hyopneumoniae polyclonal antibody
Mycoplasma hyopneumoniae Proteome (691)
Lipoproteins and p97
None matched correct region
Escherichia coli FtsA
Escherichia coli (4311)
The peptides that bind to a given target do not always have sequences that are similar to biologically relevant proteins. This problem is confounded when peptide array approaches are used because peptides that are highly similar to a given protein are unlikely to be present in the library. GuiTope was able to take these loosely similar sequences and predict antibody epitopes with modest accuracy (AUROC 0.75-0.9) in line with previously tested methods . Random-sequence peptide microarrays have shown great promise in profiling the humoral immune response [5, 23], and it would be of great utility to be able to use the peptide sequences to trace back to the antigen that elicited the immune response. However, the current prediction accuracy would not be sufficient for this task . In contrast to the peptide array datasets, the phage display selected peptides can sometimes be used to predict interaction partners from a database very accurately. As less biased molecular display methods are developed and higher density peptide arrays become available, we expect that the information content of the peptide sequences will improve, making the type of analysis facilitated by GuiTope even more useful.
Availability and requirements
The executable is available on http://www.immunosignature.com/software and will install and run on any PC with Windows XP or later. The source code is written in Visual Basic and available on sourceforge.net. The Microsoft .NET framework is required.
The authors gratefully acknowledge Dr. J. Bart Legutki for software testing and valuable discussion, and Kevin Brown for critical suggestions on implementation. This work was supported by grants from the DoD Breast Cancer Research Program and the Defense Threat Reduction Agency to SAJ.
- Derda R, Tang SK, Li SC, Ng S, Matochko W, Jafari MR: Diversity of phage-displayed libraries of peptides during panning and amplification. Molecules 16(2):1776–1803.
- Breitling F, Nesterov A, Stadler V, Felgenhauer T, Bischoff FR: High-density peptide arrays. Molecular BioSystems 2009, 5(3):224–234. 10.1039/b819850kView ArticlePubMedGoogle Scholar
- Takakusagi Y, Kuramochi K, Takagi M, Kusayanagi T, Manita D, Ozawa H, Iwakiri K, Takakusagi K, Miyano Y, Nakazaki A, et al.: Efficient one-cycle affinity selection of binding proteins or peptides specific for a small-molecule using a T7 phage display pool. Bioorganic & Medicinal Chemistry 2008, 16(22):9837–9846. 10.1016/j.bmc.2008.09.061View ArticleGoogle Scholar
- Ullman CG, Frigotto L, Cooley RN: In vitro methods for peptide display and their applications. Briefings in Functional Genomics 10(3):125–134.
- Legutki JB, Magee DM, Stafford P, Johnston SA: A general method for characterization of humoral immunity induced by a vaccine or infection. Vaccine 28(28):4529–4537.
- Stephen CW, Lane DP: Mutant conformation of p53. Precise epitope mapping using a filamentous phage epitope library. Journal of Molecular Biology 1992, 225(3):577–583. 10.1016/0022-2836(92)90386-XView ArticlePubMedGoogle Scholar
- Bastas G, Sompuram SR, Pierce B, Vani K, Bogen SA: Bioinformatic requirements for protein database searching using predicted epitopes from disease-associated antibodies. Molecular & Cellular Proteomics 2008, 7(2):247–256.View ArticleGoogle Scholar
- Halperin RF, Stafford P, Johnston SA: Exploring Antibody Recognition of Sequence Space through Random-Sequence Peptide Microarrays. Molecular & Cellular Proteomics 10(3):
- Cao B, Mao C: Identification of microtubule-binding domains on microtubule-associated proteins by major coat phage display technique. Biomacromolecules 2009, 10(3):555–564. 10.1021/bm801224qView ArticlePubMedGoogle Scholar
- Carter DM, Gagnon JN, Damlaj M, Mandava S, Makowski L, Rodi DJ, Pawelek PD, Coulton JW: Phage display reveals multiple contact sites between FhuA, an outer membrane receptor of Escherichia coli, and TonB. Journal of Molecular Biology 2006, 357(1):236–251. 10.1016/j.jmb.2005.12.039View ArticlePubMedGoogle Scholar
- Nie J, Chang B, Traktuev DO, Sun J, March K, Chan L, Sage EH, Pasqualini R, Arap W, Kolonin MG: IFATS collection: Combinatorial peptides identify alpha5beta1 integrin as a receptor for the matricellular protein SPARC on adipose stromal cells. Stem Cells 2008, 26(10):2735–2745. 10.1634/stemcells.2008-0212PubMed CentralView ArticlePubMedGoogle Scholar
- Rodi DJ, Janes RW, Sanganee HJ, Holton RA, Wallace BA, Makowski L: Screening of a library of phage-displayed peptides identifies human bcl-2 as a taxol-binding protein. Journal of Molecular Biology 1999, 285(1):197–203. 10.1006/jmbi.1998.2303View ArticlePubMedGoogle Scholar
- Mackey AJ, Haystead TA, Pearson WR: Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences. Molecular & Cellular Proteomics 2002, 1(2):139–147. 10.1074/mcp.M100004-MCP200View ArticleGoogle Scholar
- Zhao S, Lee EY: A protein phosphatase-1-binding motif identified by the panning of a random peptide display library. Journal of Biological Chemistry 1997, 272(45):28368–28372. 10.1074/jbc.272.45.28368View ArticlePubMedGoogle Scholar
- Frith MC, Saunders NF, Kobe B, Bailey TL: Discovering sequence motifs with arbitrary insertions and deletions. PLoS Computational Biology 2008, 4(4):e1000071.PubMed CentralView ArticlePubMedGoogle Scholar
- Mandava S, Makowski L, Devarapalli S, Uzubell J, Rodi DJ: RELIC--a bioinformatics server for combinatorial peptide analysis and identification of protein-ligand interaction sites. Proteomics 2004, 4(5):1439–1460. 10.1002/pmic.200300680View ArticlePubMedGoogle Scholar
- Haste Andersen P, Nielsen M, Lund O: Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Science 2006, 15(11):2558–2567. 10.1110/ps.062405906PubMed CentralView ArticlePubMedGoogle Scholar
- Yang WJ, Lai JF, Peng KC, Chiang HJ, Weng CN, Shiuan D: Epitope mapping of Mycoplasma hyopneumoniae using phage displayed peptide libraries and the immune responses of the selected phagotopes. Journal of Immunological Methods 2005, 304(1–2):15–29. 10.1016/j.jim.2005.05.009View ArticlePubMedGoogle Scholar
- Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I, Salimi N, Damle R, Sette A, Peters B: The immune epitope database 2.0. Nucleic Acids Research 38(Database):D854–862.
- Meens J, Bolotin V, Frank R, Bohmer J, Gerlach GF: Characterization of a highly immunogenic Mycoplasma hyopneumoniae lipoprotein Mhp366 identified by peptide-spot array. Veterinary Microbiology 142(3–4):293–302.
- Carettoni D, Gomez-Puertas P, Yim L, Mingorance J, Massidda O, Vicente M, Valencia A, Domenici E, Anderluzzi D: Phage-display and correlated mutations identify an essential region of subdomain 1C involved in homodimerization of Escherichia coli FtsA. Proteins 2003, 50(2):192–206.View ArticlePubMedGoogle Scholar
- Adams DW, Errington J: Bacterial cell division: assembly, maintenance and disassembly of the Z ring. Nature Reviews Microbiology 2009, 7(9):642–653. 10.1038/nrmicro2198View ArticlePubMedGoogle Scholar
- Restrepo L, Stafford P, Magee DM, Johnston SA: Application of immunosignatures to the assessment of Alzheimer's disease. Annals of Neurology 70(2):286–295.
- White SJ, Simmonds RE, Lane DA, Baker AH: Efficient isolation of peptide ligands for the endothelial cell protein C receptor (EPCR) using candidate receptor phage display biopanning. Peptides 2005, 26(7):1264–1269. 10.1016/j.peptides.2005.01.015View ArticlePubMedGoogle Scholar
- Eshaghi M, Tan WS, Yusoff K: Identification of epitopes in the nucleocapsid protein of Nipah virus using a linear phage-displayed random peptide library. Journal of Medical Virology 2005, 75(1):147–152. 10.1002/jmv.20249View ArticlePubMedGoogle Scholar
- Kraft S, Diefenbach B, Mehta R, Jonczyk A, Luckenbach GA, Goodman SL: Definition of an unexpected ligand recognition motif for alphav beta6 integrin. Journal of Biological Chemistry 1999, 274(4):1979–1985. 10.1074/jbc.274.4.1979View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.