SUPERFICIAL – Surface mapping of proteins via structure-based peptide library design
© Goede et al. 2005
Received: 17 March 2005
Accepted: 09 September 2005
Published: 09 September 2005
Skip to main content
© Goede et al. 2005
Received: 17 March 2005
Accepted: 09 September 2005
Published: 09 September 2005
The determination of protein surfaces and the detection of binding sites are essential to our understanding of protein-protein interactions. Such binding sites can be characterised as linear and non-linear, the non-linear sites being prevailant. Conventional mapping techniques with arrays of synthetic peptides have limitations with regard to the location of discontinuous or non-linear binding sites of proteins.
We present a structure-based approach to the design of peptide libraries that mimic the whole surface or a particular region of a protein. Neighbouring sequence segments are linked by short spacers to conserve local conformation. To this end, we have developed SUPERFICIAL, a program that uses protein structures as input and generates library proposals consisting of linear and non-linear peptides. This process can be influenced by a graphical user interface at different stages, from the surface computation up to the definition of spatial regions.
Based on 3D structures, SUPERFICIAL may help to negotiate some of the existing limitations, since binding sites consisting of several linear pieces can now be detected.
In order to perform their functions, protein surfaces usually have to interact with each other. However, only accessible parts of a protein can act as binding sites . Since proteins consist of polypeptide chains that fold into complex three-dimensional patterns, binding sites can be divided into two different types: 1. sites that follow the primary amino acid sequence as a continuous or linear interaction site. 2. discontinuous or non-linear binding sites, which are made up of short peptide fragments that are not adjacent in the sequence but are in spatial proximity as a result of folding. Non-linear binding sites predominate in both protein-protein interactions, and in protein binding of small compounds . Their detection is challenging because conventional mapping techniques have limited capabilities [3, 4]. The increasing number of structurally-determined proteins often permits a structure-based automated approach to the design of peptide libraries that can mimick particular surface regions. As Atassi et al.  and Lee et al  proposed, spatially neighbouring sequence segments have to be linked by short (peptidic) linkers to conserve local conformation. To facilitate this process, we have integrated the LIP database containing all peptidic fragments derived from the Brookhaven Protein Data Bank (PDB) up to a length of 15 residues . SUPERFICIAL makes it possible to scan a specific part of the protein or the whole protein. Determination of the peptides and selection of the linkers are automated, and substantial peptide libraries can be generated.
The program was implemented in Delphi and is designed for versions of Windows 98 upwards.
Determination of those parts of the protein surface that provide the basis of the peptide library.
Localisation of those peptides that are neighboured in space (but not in sequence) and form a potential non-linear binding site.
Detection of linkers to connect the spatially neighbouring peptides in consideration of the local conformation.
At first, the library should contain only peptides that mimic the surface of the protein, or of the selected protein chain. Therefore, the peptides themselves should consist mainly of amino acids that are solvent-accessible. In general, there are several possibilities of defining an amino acid as surface-exposed. One can estimate the proportion of the surface area of an amino acid that is accessible to water  and set a threshold for this value. The threshold, however, can be varied for each type of amino acid. Since the packing of protein structures differs depending on the size, degree of polymerisation, and origin of the structure (NMR, crystal or a model), there is no threshold matching all kinds of structures.
If only linear peptides are of interest, their length can be defined (Fig. 2, section D). The solvent-accessible sequence segments are then tailored accordingly. The procedure to identify and assemble the non-linear peptides is more sophisticated. Starting from one linear peptide-fragment, the surrounding space is scanned in a user-defined diameter (Fig. 2, section D). Peptide-fragments within this diameter are combined to form a single entity.
To preserve their conformation, the gaps between the peptide-fragments are filled with linkers, short amino acid sequences derived from the LIP (Loops in Proteins) database . The LIP database contains all peptidic fragments from the PDB up to a length of 15 residues. The peptidic fragments obtained from LIP and the peptide-fragments generated by SUPERFICIAL are combined to form the complete non-linear peptides.
The linkers are integrated depending on the distances and angles of the stem atoms, as described in . All possible arrangements of the peptide-fragments of the protein are examined. For each combination the shortest linkers are determined, and the one with the shortest total length is accepted. This procedure may change the order of the peptide-fragments, in case it shortens the linker. Additionally, it minimises the insertion of foreign amino acids.
The current size of the LIP database is approximately 8 Gigabytes, and it contains about 100 million entries. To connect to this database, it is necessary to install this large amount of data. Instead of the whole database, the downloadable version of SUPERFICIAL implements a table that is derived from the LIP database. This table contains a grid of parameters (distances and angles) along with the corresponding number of amino acids necessary to bridge a gap between two peptides. Applying the table instead of the LIP database allows rapid identification of appropriate peptide linkers, though their sequence is arbitrary. Amino acids are represented by the character "X" that can be replaced in praxis by poly-alanine and/or glycine.
SUPERFICIAL has been tested on Windows 98, NT, 2000 and XP. Additional visualization tools are not required. It can read files in PDB format, which are either derived from the PDB or from modelling. We have successfully tested proteins up to 50,000 atoms, though the maximum size accepted is dependent on computer memory.
To avoid problems during peptide synthesis, amino acids can be automatically replaced, e.g. cysteine versus serine. All generated peptides are listed within a saveable table. Such a structure-based peptide library provides the source for chemically-prepared peptide arrays to identify and characterise binding sites, respectively [9, 10].
Atassi et al. [5, 11] and Lee et al.  proposed the idea of linking several peptides forming a non-linear binding site with short peptidic linkers. They first identified the amino acids of a non-linear antigenic site in native lysozyme and then linked them into a single peptide by inserting glycine residues. A different approach was used by Casset et al. , Franke et al.  and Eichler . They used circular scaffolds to present the peptides of a non-linear binding site, and these structures maintained the conformation of the peptides found in the original protein. For all these methods, detailed structural information of the binding site or the interacting amino acids has to be available. These problems are overcome with SUPERFICIAL, since only the structure of the protein is required. Determination of the surface and selection of the peptides can be influenced by the user, while the selection of the linker and the generation of the peptide library are automatic. The whole library provides the basis for a high-throughput synthesis (e.g. the SPOT-synthesis ) and the identification of binding peptides.
Methods to connect peptides with linkers are mostly used during homology modelling. Generally, two approaches are applied: ab initio or knowledge-based methods. Ab initio methods usually scan the whole conformational space, while knowledge-based methods search for protein segments with a known three-dimensional structure that fits into a gap. Both methods assess the possible linkers according to potential or scoring functions. For ab initio methods, the complexity, and therefore the time and effort increase with the length of the linker. As shown in , detection of suitable linkers by means of LIP is usually performed faster and more accurately than by other methods.
Non-peptidic linkers between peptides can also be applied, but in contrast to the 100 million linkers contained in LIP, their number and availability are limited. Therefore, not all possible conformations of peptide-fragments can be conserved with non-peptidic linkers. Currently, there is no public database of non-peptidic structures that can serve as linkers. Although the combination of peptide fragments and non-peptidic linkers or scaffolds can be advantageous if only a small number of structures is to be synthesised, such a method is not applicable for a high-throughput synthesis.
Predictions concerning the nature of antigenicity and binding sites have a large literature. Determining the antigenicity of different proteins implies that such areas share common properties . Mostly, these involve the hydrophilicity, flexibility and accessibility of a protein. The program BEPITOPE, for example, uses such properties to predict linear protein epitopes and rank them according to their hydrophobicity . SUPERFICIAL follows a different approach: the 3D structure of the whole surface, or parts of it, are considered and transformed into a peptide library representing this surface. Currently, it is the only program that identifies potential non-linear binding sites. Even though information on probable binding sites is not given, SUPERFICIAL includes all potential binding sites by examining the entire protein surface.
SUPERFICIAL is a unique tool for surface mapping, which considers the 3D structure of a protein and translates it into a peptide library. The most novel aspect of this program is its ability to propose peptides that can mimic non-linear binding sites, making it interesting, for instance, in vaccine development.
A free version of SUPERFICIAL is available for academic use at http://bioinformatics.charite.de/superficial:
Project name: SUPERFICIAL
Project home page: http://bioinformatics.charite.de/superficial
Operating system(s): Windows 98 upwards
Programming language: Delphi
Other requirements: none
Restrictions to use by academics: registration needed
Restrictions to use by non-academics: licence needed
This work was supported by the BMBF-funded Berlin Center for Genome Based Bioinformatics (BCB).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.