Computational survey of peptides derived from disulphide-bonded protein loops that may serve as mediators of protein-protein interactions
© Duffy et al.; licensee BioMed Central Ltd. 2014
Received: 3 January 2014
Accepted: 17 July 2014
Published: 17 September 2014
Bioactive cyclic peptides derived from natural sources are well studied, particularly those derived from non-ribosomal synthetases in fungi or bacteria. Ribosomally synthesised bioactive disulphide-bonded loops represent a large, naturally enriched library of potential bioactive compounds, worthy of systematic investigation.
We examined the distribution of short cyclic loops on the surface of a large number of proteins, especially membrane or extracellular proteins. Available three-dimensional structures highlighted a number of disulphide-bonded loops responsible for the majority of the likely binding interactions in a variety of protein complexes, due to their location at protein-protein interfaces. We find that disulphide-bonded loops at protein-protein interfaces may, but do not necessarily, show biological activity independent of their parent protein. Examining the conservation of short disulphide bonded loops in proteins, we find a small but significant increase in conservation inside these loops compared to surrounding residues. We identify a subset of these loops that exhibit a high relative conservation, particularly among peptide hormones.
We conclude that short disulphide-bonded loops are found in a wide variety of biological interactions. They may retain biological activity outside their parent proteins. Such structurally independent peptides may be useful as biologically active templates for the development of novel modulators of protein-protein interactions.
Cyclic peptides are macrocyclic peptides which possess where linear peptide side chains or termini are covalently bonded to shape the peptide into a ring. Macrocyclic compounds such as cyclic peptides have been a renewed focus of drug discovery in recent years , and identifying a biologically pre-designed set of cyclic peptides in protein sequences would be of great potential interest in pharmaceutical development. Many cyclic peptides are disulphide-cyclic, where two cysteine residues form an S-S bond between two thiol side chain groups to cyclise the peptide. Many other types of cyclic peptide are possible, including head-tail cyclised peptides, where the amino N-terminus and carboxy C-terminus are bound together with an amide bond to cyclise the peptide backbone, and other side-chain crosslinked cyclic peptides, such as those possessing amide bond between a lysine side chain and an aspartic acid side chain. Side-chain crosslinked cyclic peptides include disulphide-cyclic peptides, such as those described in this work; cyclic peptides where the side-chains mimic a peptidic bond, such as Lysine-Aspartic acid side chain cyclised peptides; as well as depsipeptides, where the side-chain crosslinking bond is an ester bond. An example cyclic depsipeptide would be a serine-aspartic acid side-chain cyclised peptide. There also exist backbone head-tail cyclic peptides with additional disulphide bonding such as the cyclotides  and theta-defensins , where the disulphide bonding serves to provide additional conformational constraint to the already cyclic peptide. Cyclic peptides are interesting from a drug development point of view, due to their generally better specificity, proteolytic resistance, and stability than linear peptides . While the term "cyclic peptide" can refer to a peptide cyclised by any of the above strategies, the term is often used to refer to head-tail cyclised peptides. Therefore, this work use the term "small macrocyclic peptides" to refer to cyclic peptides in general, including the disulphide-cyclised peptides described in this work.
Many examples of bioactive and therapeutic natural peptides are known, including antibiotics , natural hormone mimics  and immunosuppresants . The source of these peptides are mostly non-ribosomal plant and fungal secondary metabolites, produced by specialised non-ribosomal peptide-synthetase enzymes. These can create peptides with a wide range of unusual amino acids, that can be of varying chirality, carry modified side chains and backbones, and be cyclised . Reviews by Conlon et al.  and Cascales et al.  provide more detailed overview of the diversity of small macrocyclic peptides in nature. The Cybase [11, 12] database has been developed as a publically available resource describing of sequence, structural and functional properties of naturally derived cyclic proteins and peptides.
The biological activity of naturally occurring ribosomally synthesised peptides have not been subject to any systematic surveys. Disulphide bonds are known to play a key structural role in proteins, stabilising the protein tertiary structure on a large scale, and can also influence quaternary structure via interchain disulphide bonds. However, disulphide bonds can also have a local effect in constraining a much smaller component of the structure. A protein loop is a general term for a protein secondary structural element which is not a helix or sheet, and generally exhibit a lack of hydrogen bonding and high flexibility, and often served to join other secondary structural elements. Protein turns are specific types of loops where the polypeptide chain reverses its overall direction and can be between 1 and 5 residues long (α, β, γ, δ and π turns) . A special case of this is the β-hairpin turn, which connects two antiparallel β-sheets. These regions are known to be important in protein-protein contacts , and short (2-8 amino-acid) protein loops or turns can be "pinned" in place by a disulphide bond , forming a surface structural motif held in a relatively fixed position by the disulphide bond, thus having a certain amount of independence from the larger protein tertiary structure. This approach has previously been explored in phage-display studies .
The idea of finding "self-inhibitory" peptides, where a peptide derived from a protein-protein interface inhibits the formation of that interface, has been previously explored, and it has been found that many of globular protein interactions are dominated by linear peptide segments . Among other applications, this approach has recently been used to identify peptides that inhibit viral membrane fusion . Therefore, in principle short disulphide-bonded loops derived from protein sequences and located at protein interfaces could be synthesised separately and show a similar or related biological activity to the parent protein.
The use of small macrocyclic peptides to mimic protein loops has also been exploited for the RGD peptides . The RGD tripeptide motif is a cell attachment β-turn motif found in numerous proteins, and small macrocyclic peptides containing this motif have been shown to inhibit integrin α V β3 activity, which plays an important role in tumour metastasis.
Traditionally, protein-protein interaction inhibitors are discovered by screening compounds against a particular known "target" interaction of interest in a biochemical pathway. Our motivation in this study is to harness the vast amount of protein sequence and structural data available to develop a bioinformatic approach to identifying candidate bioactive small macrocyclic peptides from disulphide-bonded protein loops. In contrast to screening compounds against a single target, this method of analysis allows simultaneous identification of modulators against a variety of protein-protein interactions that have been evolutionarily selected for. This type of bioinformatics approach has been previously successfully used to identify peptides from signalling rich juxtamembrane regions that have the ability to modulate platelet function . In this study, we surveyed the sequence, structural and conservation properties of disulphide bonded protein loops, in order to infer a set of small macrocyclic peptides capable of bioactivity outside the context of their parent protein.
Results and discussion
Finding disulphide-bonded loops at protein-protein interfaces
To identify short disulphide bonded loops that play a crucial role at Protein-Protein Interfaces, we set out to find known three-dimensional structures of protein complexes mediated by a disulphide bonded loops. We defined short disulphide bonded loops as those equal to or less than eleven residues in length (two flanking cysteines, plus 2-9 internal residues), and excluding cysteine-knot like regions of overlapping disulphide bridges.
Uniprot Proteins containing a disulphide-bonded loop comprising over 50% of a PDB protein-protein interface
Collagen alpha-1(IV) chain [Cleaved into: Arresten],Homo sapiens
Natrin-1 (Cysteine-rich venom protein 1) (NA-CRVP1) (Protein G2a),Naja atra
Pro-epidermal growth factor (EGF) [Cleaved into: Epidermal growth factor (Urogastrone)], Homo sapiens
EGF Receptor (ErbB1), Homo sapiens
Bowman-Birk type proteinase inhibitor,Phaseolus angularis
Bowman-Birk type trypsin inhibitor, Vigna radiata
Trypsin, Bos taurus
Bowman-Birk type seed trypsin and chymotrypsin inhibitor (BTCI), Vigna unguiculata
Trypsin, Bos taurus
Inhibitor of vertebrate lysozyme, Escherichia coli
Lysosome C, Gallus gallus
Lymphocyte antigen 96 (Ly-96) (ESOP-1) (Protein MD-2), Mus musculus
Toll-like receptor 4 (TLR-4), Mus musculus
Irditoxin subunit B (IrTxB),Boiga irregularis
Iridotoxin subunit A
Palmitoyl-protein thioesterase 1 (PPT-1) (EC 18.104.22.168) (Palmitoyl-protein hydrolase 1), Homo sapiens
Somatomedin-B subunit of Vitronectin (VN), Homo sapiens
Plasminogen activator inhibitor-1 (PAI-1),Homo sapiens
Somatomedin-B subunit of Vitronectin (VN), Homo sapiens
Urokinase plasminogen activator surface receptor (uPAR), Homo sapiens
Urokinase plasminogen activator surface receptor (uPAR), Homo sapiens
Urokinase-type plasminogen activator (uPA), Homo sapiens
Beta-1,3-xylanase (txyA), Alcaligenes sp.
Table 1 identifies the 13 proteins containing a short disulphide-bonded loop comprising over 50% of the surface area of a PDB protein-protein interface, along with the partner protein at the interface. These interfaces come from a variety of species, including bacteria (Alcaligenes sp., E. coli), plants (P. angularis, V. radiata, V. unguiculata), and animals (H. sapiens, M. musculus, N. atra, B. irregularis). They also cover a variety of protein types, including snake venoms, proteinase inhibitors, collagen, and extracellular proteins involved in growth and cell adhesion.
Structural independence of short disulphide loops
Protein families containing preferentially conserved disulphide-bonded loop
Guanylate cyclase activator/Guanylin
The mean RMSD of the lowest energy PEP-FOLD model to each corresponding crystal structure was 2.468 ±0.767Å. As a point of reference, the ongoing Critical Assessment of Techniques of Protein Structure Prediction (CASP) experiment describes the generation of a homology structure with an accuracy of better than 6.5 Å as "not trivial", and models with an accuracy of 1.5 Å as "high-resolution". Models with an RMSD of 4 Å can be considered as having a broadly correct fold . Using these values as a guideline it seems that disulphide-bonded loop structure prediction using the loop sequence alone is sufficient to predict a moderately accurate structure, lending support to the idea that these loops have a large degree of structural independence from their parent protein.
Short disulphide-bonded loop mediated interfaces
For many of the heterodimeric interfaces listed in Table 1, there is additional experimental evidence to suggest that the short disulphide-bonded loops play a key role in binding. In the case of the interaction of Egg Lysosyme and Inhibitor of Vertebrate Lysosyme (Ivy), a cyclic loop on the surface of Ivy (CKPHDC) has been shown to be essential for its inhibitory effect, as shown by mutagenesis studies . This loop is strictly conserved across 30 members of the Ivy family, and 5 other members contain a related CExxDxC motif.
For the Lymphocyte antigen 96 and Toll-like receptor 4 interaction, a disulphide-bonded loop CHGHDDDYSFC sits on the conserved "A" patch of Lymphocyte antigen 96. Mutations of five separate amino acids in this peptide have been shown to disrupt binding, including both cysteine residues, implying that the cyclic nature of the peptide is important for complex formation .
A CSYYQSC disulphide-bonded loop contained within Vitronectin is located at the interface of two protein-protein complexes, those of Vitronectin with Plasminogen activator inhibitor-1 and Vitronectin with Urokinase plasminogen activator receptor. The relevance of the short disulphide-bonded loop to the interaction of Vitronectin and Plasminogen activator inhibitor-1 has been experimentally verified. Alanine scanning shows that deletion of Asp22, Glu23, Leu24, Tyr27 and Tyr28 significantly reduces binding affinity. The cyclic region of the interface covers the Cys25 to Cys31 region, including 2 of the 5 critical residues . The same Vitronectin disulphide-bonded loop is critical in the binding of Vitronectin to urokinase plasminogen activator receptor. The four serine (Ser26, Ser30) and tyrosine (Tyr27, Tyr28) residues in the disulphide-bonded loop fit into a cavity on Urokinase plasminogen activator receptor that shows a high degree of shape and charge complementarity . Notably, the same disulphide-bonded loop tyrosine residues are important in Vitronectin binding to both Urokinase plasminogen activator receptor and Plasminogen activator inhibitor-1. Also, a Urokinase plasminogen activator receptor disulphide-bonded loop, CKTNGDC, is involved in the interaction of Urokinase plasminogen activator receptor with Urokinase plasminogen activator, in direct contact with the Urokinase plasminogen activator Kringle domain. However, this Urokinase plasminogen activator receptor disulphide-bonded loop is not located close to the Vitronectin binding region . The three proteins Urokinase plasminogen activator receptor, Urokinase plasminogen activator and Vitronectin play important roles in regulating the proteolytic degradation of the extracellular matrix and blood clots . This network can also play a role in cancer progression, including degrading the extracellular matrix to facilitate cancer metastasis . Thus a small macrocyclic peptide that can interfere with this network would be of clinical interest.
The Epidermal growth factor disulphide-bonded loop CVVGYIGERC interacts with domain I and III of the Epidermal growth factor receptor . The position of the disulphide-bonded loop region at the interface is evident in Additional file 1: Figure S2, but there is no evidence whether this disulphide-bonded loop may have independent activity. The Epidermal growth factor receptor is part of the ErbB family of receptor tyrosine kinases which act as receptors for a variety of different Epidermal growth factor-domain containing growth factor ligands. All of these Epidermal growth factor domains contain homologous C-terminal disulphide-bonded loops.
Along with their role at the binding interfaces between different proteins, it can be seen from Table 1 that disulphide-bonded loops are also involved in interactions between protein subunits (Natrin-1, Iridotoxin), and between homodimeric interfaces (Collagen α-1(IV) homodimer, Palmitoyl-protein thioesterase homodimer).
For the cases discussed above (Inhibitor of Vertebrate Lysosyme, Lymphocyte antigen 96, Vitronectin, Epidermal growth factor), the disulphide-bonded loop contains all or part of the protein region responsible for the interaction, and it would be interesting to determine if the disulphide-bonded loop portion of the interface alone is sufficient to modulate the protein-protein interaction.
A disulphide-bonded loop region in Bowman Birk inhibitors possesses independent activity
For one of the interfaces in Table 1 it has been previously shown that the short disulphide-bonded loop portion of the interface is not only a critical part of the interaction, but can act independently of the parent protein. This is the Bowman-Birk family of serine proteinase inhibitors, where the a single disulphide-bonded loop in the protein can act as an inhibitor of trypsin at nanomolar concentrations. The active disulphide loop of the Bowman-Birk inhibitor is nine residues, and Luckett et al.  have identified a natural sunflower cyclic Bowman-Birk inhibitor 14 residues long (SFTI-1).
The inhibitory abilities of the disulphide-bonded loop alone has been demonstrated by Domingo et al. , who designed a set of eleven residue small macrocyclic peptide loops based on this loop from a variety of Bowman-Birk serine protease inhibitors, and showed that the resulting small macrocyclic peptides inhibit a similar set of serine proteases as the parent protein. The native Bowman-Birk proteins inhibit at picomolar concentrations, and the disulphide-bonded loops at nanomolar concentrations - which is the range that would be expected from a druglike molecule. This result is promising from the point of view of using these loops as lead peptides for drug discovery efforts.
EGF domain small macrocyclic peptides do not show independent activity
Cyclic peptides derived from EGF-domain containing proteins and tested for EGF activation/inhibition
Cyclic peptide sequence
Human parent protein
EGF, Epidermal Growth Factor*
TGFA, Transforming Growth
HBEGF, Heparin-binding EGF-like
PEAR1, Platelet endothelial
aggregation receptor 1
PTGS1, Prostaglandin G/H
F7, Coagulation factor VII
ITGB3, Integrin Beta 3
These loop-derived small macrocyclic peptides were synthesised separately and tested for their ability to 1) activate the EGF receptor and 2) competitively inhibit the EGF receptor in the presence of native EGF. Western blotting was used to assess the amount of phosphorylated EGFR after treatment with EGF, small macrocyclic peptide, or EGF following incubation with small macrocyclic peptide, as a proxy for activation. However none of the selected peptides demonstrated any ability to either activate or inhibit the EGF receptor (Additional file 1: Figure S3). We conclude that these peptides do not show significant biological activity independent of their parent protein, in contrast with the Bowman-Birk protease inhibitor peptides.
It is possible that the disulphide loop takes a significantly different shape when removed from the context of the wider EGF protein, hence explaining the lack of biological activity observed. Additional file 1: Table S1 shows that the lowest energy de-novo model of this loop has an RMSD of 2.374 Å based on the C α alignment. This suggests that the free peptide retains a structure reasonably close to what has been seen in the crystal structure.
To explain why these EGF peptides do not have activity, we examined the structure of the EGF-EGFR complex. (PDB ID: 1IVO). The EGFR protein comprises three structural domains (I, II, and III). EGF activates EGFR by binding to a cavity between EGFR domain I and III, with binding sites existing on both domain I and III . The CVVGYIGERC loop (Cys33 - Cys41 of EGF) tested here comprises a large portion of the total EGF-Domain I interface contacts in the crystal structure, but only a small proportion of the EGF-Domain III contacts (Additional file 1: Figure S2). Residues in the C-terminal end of EGF, such as Leu47 are known to make important contacts with Domain III. Thus, despite comprising a large portion of the interface, the disulphide loop is not able to fill the EGFR cavity on both sides, which would likely explain why the disulphide bonded loop is not able to conformationally shift EGFR to its active position. It is possible that the disulphide bonded loop is binding to Domain I of EGFR, but clearly any potential binding is not strong enough to compete with EGF binding to its native receptor.
Conservation of disulphide-bonded loops
The cyclic-peptide mediated interfaces above represent an interesting set of compounds, but it is also of interest to see if disulphide-bonded loops represent a widely used natural strategy to influence protein-protein interactions, by examining evolutionary conservation of short disulphide-bonded loops in proteins.
Homologs of SwissProt proteins containing annotated short disulphide-bonded loops were identified using the Gopher  webserver (bioware.ucd.ie), searching the default set of model organisms. All short disulphide-bonded loop containing proteins with at least one Gopher-identified ortholog were then aligned using MUSCLE . Per-residue conservation scores were then calculated for each alignment using the Jensen-Shannon divergence method of Capra and Singh . Aligned short disulphide regions between the original protein and homolog were identified by examining alignments of the annotated disulphide regions of the original protein. If the loop terminal cysteine residues in the original protein exactly aligned with cysteine residues in the homolog protein, this region was considered a conserved disulphide loop. It is well known that cysteines involved in disulphide bonding are well conserved across a variety of protein families . For this reason, only the conservation of the interior loop residues was considered in this study.
The surface and interface disulphide-bonded loops show a similar distribution to other cyclic regions. Thus, simply existing at a protein surface or interface does not mean that a disulphide-bonded loop will be preferentially conserved.
However, the graph does have a smaller second peak at about 0.30, indicating a set of disulphide-bonded loops that are significantly more conserved than their adjacent residues. Table 2 lists the names of the protein families that contain these highly conserved disulphide-bonded loops. Over half are secreted short proteins and peptide hormones such as somatotropin, prolactin, guanylin and urotensin, as well as the larger polygalacturonase from plants. Many of these peptides and proteins are very short, such as guanylin (15 amino acids) and urotensin (11 amino acids), with the disulphide bonded section making up over half of the peptide length. Given the lower conservation outside the cyclic region, such proteins are good candidates for investigating the role of the cyclic regions alone.
Viral proteins as sources of bioactive disulphide-bonded loops
Similar disulphide-bonded loops between human and virus
ADAM20: Disintegrin and metalloproteinase domain-containing protein 20
Uncharacterized protein BNLF2b: Human herpesvirus 4
ENPP1 and ENPP2: Ectonucleotide pyrophosphatase/phosphodiesterase family member 1 and 2
Immediate-early protein, 73: Murid herpesvirus 4
ERBB2: Receptor tyrosine-protein kinase erbB-2
Single stranded DNA-binding protein: Human herpesvirus 4
F8: Coagulation factor VIII
Single stranded DNA-binding protein: Human herpesvirus 4
ITGB6: Integrin beta-6
Early E1A 32 kDa protein: Human adenovirus 2
PGLYRP3: Peptidoglycan recognition protein 3
Protein U90: Human herpesvirus 6B
PRG4: Proteoglycan 4 and VN: Vitronectin
Immediate-early protein, 73: Murid herpesvirus 4
The match of an adenoviral peptide CNSSTDSC to a human integrin β-6 CTTSTDSC disulphide-bonded loop was of most interest, since viral proteins exploit extracellular matrix and integrin interactions to facilitate cell adhesion and entry. The integrin CTTSTDSC loop is located in the third EGF-like repeat domain region of the extracellular portion of Integrin β-6. This region is known to be "masked" in the integrin’s inactive conformation, and exposed in the active conformation . The human herpesvirus ssDNA binding protein binds the secreted FBLN5 protein: (Additional file 1: Table S2) the viral peptide CRRPC resembles the human ERBB2 CSKPC peptide somewhat. This similarity is relatively weak, but it is intriguing that the FBLN5 protein contains 9 EGF domains, since ERBB2 belongs to a receptor family known to bind EGF domains.
Many of the matches described in Table 4 consist of quite short disulphide loops, 5 or 6 residues long, which points to the possibility of these matches being a chance occurrence. However, as described above, there is some biological support for considering viral proteomes as representing potential sources of cyclic bioactive peptides worthy of further investigation.
Perspectives on disulphide-bonded loops in protein-protein interactions
We would have expected to uncover more disulphide-bonded loops that play key roles in protein-protein interactions, given the importance of cyclic compounds in modulating both enzyme and protein-interactions. It is possible that the selection of small loops within larger proteins is, in general, slower to evolve to the required degree of specificity of interaction than the evolution of larger complementarities between interacting proteins. Thus, in general, proteins do not often seem to evolve structured loops as independent determinants of protein interaction. This is in contrast to eukaryotic Short Linear Motifs (SLiMs)  which have specifically evolved to match particular motif binding domains. Why it is that biology has decided to focus on linear motifs rather than cyclic motifs in eukaryotes is not clear. However, it is worth noting that many linear motifs, although they bind independently, typically bind with modest affinity, in a manner that suggests that modest affinity is optimal . Thus, the benefit of higher affinity binding that cyclic loops might confer may not be advantageous in the context of SLiM signalling, perhaps explaining why our survey has revealed relatively few such interactors. While there are examples of such loops having independent effects from the literature, overall, it may be that mining existing cyclic loop sequences from protein interactions may not be a very fruitful strategy for discovering novel modulators of protein-protein interactions, although we must point out that our own experimental validation only considered one class of such loops.
Considering a small disulphide-bonded loop as a surface-exposed loop pinned in place by a pair of cysteines, the structure of this loop could be relatively independent of the sequence of the larger protein, and mostly determined by the amino acids between the cysteines, due to the conformational constraint imposed by the disulphide bond, and the solvent exposed nature of a surface loop. Thus, a disulphide-bonded loop known to be at a protein-protein interface could on its own, potentially maintain the same binding activity as the parent protein. Since cyclic peptides are known to be more "drug-like" than linear peptides, this approach could provide a promising set of biologically-optimised lead-like molecules to attack the difficult problem of modulating protein-protein interactions . We have contrasted two cases in this study, that of the Bowman-Birk protease derived small macrocyclic peptides, which are known to have independent biological activity, and compared this with a range of EGF-derived small macrocyclic peptides, which do not, showing that while ribosomally derived disulphide-bonded loops can be a promising source of bioactive macrocycles, they may not contain the necessary binding features, even when they make up a majority of a protein-protein interface. Bowman-Birk inhibitors are currently being investigated for their applications in fighting colorectal cancer, due to promising mouse results .
A disadvantage of re-purposing disulphide bonded loops as small macrocyclic peptides is that disulphide bonds are more easily broken than the amide bonds present in a traditional head-tail cyclic peptides. Nyugen et al.  have compared the serum stability of short (6 residue) antimicrobial linear peptides to the disulphide and head-tail bonded equivalents, and observed that while linear peptides will be 80% degraded in serum over the course of an hour, disulphide-bonded peptides retain over 50% of their original concentration after 2 hours, and head-tail bonded peptides retain over 70% of their original concentration after 6.5 hours. The concentration of the short depsipeptide arenastatin A in mouse serum has been observed to decline 50% in an hour , leading to the conclusion that head-tail bonding is the most desirable from a serum stability point of view. There are a number of well-described strategies which can be used to improve the stability and bioavailability of peptide drugs in the bloodstream , and these strategies, along with substituting disulphide bonds for another cyclisation method may be necessary to develop a truly "drug-like" molecule.
Despite being mainly restricted to extracellular or vesicular compartments, short disulphide-bonded loops are relatively widespread with 720 out of a total of 20,252 human proteins in the SwissProt database of manually curated proteins contain annotated disulphide-bonded loops. Disulphide-bonded loop residues are more conserved than non-cyclic residues in the same set of proteins, and are also more conserved than residues located directly beside short disulphide-bonded loops. While the conservation is modest, there is a subset of peptides which show marked conservation (see Table 2), mainly short secreted peptide hormones or chemical messengers.
This is possibly due to cyclisation by disulphide bonding being the simplest way to impose some structure on a short peptide, which may not have the size to form hydrogen-bonded secondary structural features like helices or beta-sheets. Potential disulphide-bonded loops are also seen in viral proteins, and a subset of these may play potential roles in viral adhesion and entry, or other aspects of viral biology.
This study has used the approach of mining sequence databases for putative disulphide bonded disulphide-bonded loops, conserved relative to their adjacent residues, thus generating a library of compounds of interest. Table 1 contains 13 disulphide bonded loops from well characterised interaction crystal structures, of which only the Bowman-Birk type loops have ever been tested for their independent activity prior to this study. The disulphide bonded loops of interest include not only the loops described in Table 1, but, as the PDB currently only contains structures for a tiny fraction of all possible protein-protein interactions, the 1,231 short disulphide bonded loops at protein surfaces (Additional file 1: Table S3) are also of worthy of investigation, along with the 287 highly conserved disulphide-bonded loops mentioned in Table 2 which are further described in Additional file 1: Table S4.
It is well known that small macrocyclic peptides represent a class of molecules with a wide range of biological activities, and therefore merely showing that there exist bioactive small macrocyclic peptides derived from larger proteins would not represent any furthering of scientific knowledge. Thus, the key novelty of this work is in exploring how to systematically harness sequence, structural, and evolutionary data over all well characterised proteins in order to identify bioactive proteins. This work both develops a method for identifying potentially bioactive compounds, as well as providing a list of disulphide-bonded loop-protein interaction pairs, readily synthesisable and open to medium or high throughput functional screening for binding or activity.
Assessing short disulphide-bonded loop structures
Protein data bank (PDB) structures containing short disulphide-bonded loops were identified by searching UNIPROT for manually curated proteins containing a non-overlapping disulphide loop with 2-9 internal residues, with a listed entry in the PDB. The biological assembly format (.pdb1) of each PDB structure was downloaded. Choosing the biological assembly format ensures that the downloaded structure is the biologically relevant form of the structure, as opposed to the crystallographic asymmetric unit, which are not always the same thing. A PyMol  script was used to iterate over each structure file, and test each possible pair of cysteine residues spaced up to 11 residues apart in the same protein chain to see if a disulphide bond existed, by checking whether an S-S bond of approximately 2.05 Å existed. For each short disulphide-bonded loop found, the number of surface residues were found by considering a residue with a solvent accessible surface of over 2.5 Å 2. The number of residues at a protein-protein interface was found by counting as an interface residue all amino acids with heavy atoms within 3Å of another protein chain.
EGF receptor activation and competition assays
MCF10A immortalised breast epithelial cells were serum starved for 3 hours before all experiments. For activation experiments, cells were incubated with either 100 μM small macrocyclic peptide or 10 nM EGF for 5 minutes. For EGF/small macrocyclic peptide competition assays, the cells were incubated with 100 μM small macrocyclic peptide for 5 minutes before incubating for another 5 minutes with 1 nM EGF. EGF Receptor activation was then measured by western blotting for phosphorylated Tyrosine 1173 on EGFR, along with blotting for total EGFR and total Actin.
Predicting short disulphide-bonded loops in viruses using protein structural information
Likely short disulphide-bonded loop locations were predicted based on sequence, secondary structure and solvent accessibility information. Secondary structure and solvent accessibility were predicted by the Porter and PaleAle servers from the Distill  suite of protein structural prediction servers. Portions of sequences under 11 residues were included that 1) started and ended with a cysteine, 2) contained no internal cysteine residues, 3) were in a region of a protein that was not predicted to be a β sheet or an α helix, and 4) had an average solvent accessibility of the region that was predicted to be more exposed than buried.
Scoring similarity of short disulphide-bonded loops
Identifying viral disulphide-bonded loops similar to human proteins
Human and virus proteins containing SwissProt annotated short disulphide-bonded loop regions were identified or predicted using DISTILL as described above. To identify similar viral proteins, we set out to find 1) disulphide-bonded loops with similar sequences in human and viral proteins 2) where the viral proteins had known human interactor proteins 3) where the human proteins interacting with the viral proteins containing a short disulphide-bonded loop also interacted with the original human proteins containing the similar short disulphide-bonded loop.
Uniprot accession numbers were used to uniquely identify proteins, and pairs of Uniprot accessions parsed from the interaction files were used to identify interactions. Similarity between human and virus short disulphide-bonded loop sequences was calculated by aligning and scoring the peptides using the BLOSUM62 substitution matrix to identify similar short disulphide-bonded loops in host and virus. Virus and human short disulphide-bonded loops with a similarity of greater than 0.50 were identified, and checked for shared interactors.
To find which human and virus proteins shared interactors, the sets of known human protein interactions (including human-virus interactions) were downloaded as PSI-MITAB format text files. A text file of human binary protein interactions was downloaded from the MINT  database (download date: 08/Apr/2013), which was then parsed to extract human-virus interactions only. A text file of human-virus protein binary interactions was downloaded from the Virhostnet  database (download date: 24/Apr/2013), and text files containing human-virus binary interaction data for each of the virus species available in the BioGRID  database were also downloaded (download date 24/Apr/2012). All three interaction sources were combined into a single dataset and compared with the list of virus and human proteins with a similarity greater than 0.50, to identify shared interactors.
The authors would like to thank Science Foundation Ireland, grant number 08 IN.1 B1864 for funding this work.
- Kotz J: Bringing macrocycles full circle. Science-Business eXchange. 2012, 5 (45): doi:10.1038/scibx.2012.1176Google Scholar
- Gould A, Ji Y, Aboye LT, Camarero AJ: Cyclotides, a novel ultrastable polypeptide scaffold for drug discovery. Curr Pharm Des. 2011, 17 (38): 4294-4307. 10.2174/138161211798999438. doi:10.2174/138161211798999438View ArticlePubMed CentralPubMedGoogle Scholar
- Lehrer RI, Cole AM, Selsted ME: θ-Defensins: cyclic peptides with endless potential. J Biol Chem. 2012, 287 (32): 27014-9. 10.1074/jbc.R112.346098. doi:10.1074/jbc.R112.346098View ArticlePubMed CentralPubMedGoogle Scholar
- Hamman JH, Enslin GM, Kotzé AF: Oral delivery of peptide drugs: barriers and developments. BioDrugs : Clin Immunotherapeutics Biopharmaceuticals Gene Ther. 2005, 19 (3): 165-77. 10.2165/00063030-200519030-00003.View ArticleGoogle Scholar
- Xiao Q, Pei D: High-throughput synthesis and screening of cyclic peptide antibiotics. J Med Chem. 2007, 50 (13): 3132-7. 10.1021/jm070282e. doi:10.1021/jm070282eView ArticlePubMed CentralPubMedGoogle Scholar
- Lamberts SW, van der Lely AJ, de Herder WW, Hofland LJ: Octreotide. N Engl J Med. 1996, 334 (4): 246-254. 10.1056/NEJM199601253340408. doi:10.1056/NEJM199601253340408View ArticlePubMedGoogle Scholar
- Schreiber SL, Crabtree GR: The mechanism of action of cyclosporin A and FK506. Immunol Today. 1992, 13 (4): 136-142. 10.1016/0167-5699(92)90111-J. doi:10.1016/0167-5699(92)90111-JView ArticlePubMedGoogle Scholar
- Schwarzer D, Finking R, Marahiel MA: Nonribosomal peptides?: from genes to products. Nat Prod Rep. 2003, 20 (3): 275-287. 10.1039/b111145k. doi:10.1039/b111145kView ArticlePubMedGoogle Scholar
- Conlan BF, Gillon AD, Craik DJ, Anderson MA: Circular proteins and mechanisms of cyclization. Biopolymers. 2010, 94 (5): 573-583. 10.1002/bip.21422. doi:10.1002/bip.21422View ArticlePubMedGoogle Scholar
- Cascales L, Craik DJ: Naturally occurring circular proteins: distribution, biosynthesis and evolution. Org Biomol Chem. 2010, 8 (22): 5035-47. 10.1039/c0ob00139b. doi:10.1039/c0ob00139bView ArticlePubMedGoogle Scholar
- Mulvenna JP, Wang C, Craik DJ: CyBase: a database of cyclic protein sequence and structure. Nucleic Acids Res. 2006, 34 (Database issue): 192-4. doi:10.1093/nar/gkj005View ArticleGoogle Scholar
- Wang CKL, Kaas Q, Chiche L, Craik DJ: CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering. Nucleic Acids Res. 2008, 36 (Database issue): 206-10. doi:10.1093/nar/gkm953Google Scholar
- Toniolo C: Intramolecularly hydrogen-bonded peptide conformations. CRC Crit Rev Biochem. 1980, 9 (1): 1-44.View ArticlePubMedGoogle Scholar
- Bisang C, Weber C, Robinson JA: Protein-loop mimetics: a diketopiperazine-based template to stabilize loop conformations in cyclic peptides containing the NPNA and RGD motifs. Helvetica Chimica Acta. 1996, 79 (7): 1825-1842. 10.1002/hlca.19960790708. doi:10.1002/hlca.19960790708View ArticleGoogle Scholar
- Gunasekaran K, Ramakrishnan C, Balaram P: Beta-hairpins in proteins revisited: lessons for de novo design. Protein Eng Des Select. 1997, 10 (10): 1131-1141. 10.1093/protein/10.10.1131. doi:10.1093/protein/10.10.1131View ArticleGoogle Scholar
- Cochran AG, Tong RT, Starovasnik MA, Park EJ, McDowell RS, Theaker JE, Skelton NJ: A minimal peptide scaffold for β-turn display: optimizing a strand position in disulfide-cyclized β-hairpins. J Am Chem Soc. 2001, 123 (4): 625-632. 10.1021/ja003369x. doi:10.1021/ja003369xView ArticlePubMedGoogle Scholar
- London N, Raveh B, Movshovitz-Attias D, Schueler-Furman O: Can self-inhibitory peptides be derived from the interfaces of globular protein-protein interactions?. Proteins. 2010, 78 (15): 3140-9. 10.1002/prot.22785. doi:10.1002/prot.22785View ArticlePubMed CentralPubMedGoogle Scholar
- Xu Y, Rahman NaBD, Othman R, Hu P, Huang M: Computational identification of self-inhibitory peptides from envelope proteins. Proteins. 2012, 80 (9): 2154-68. 10.1002/prot.24105. doi:10.1002/prot.24105View ArticlePubMedGoogle Scholar
- Dechantsreiter MA, Planker E, Mathä B, Lohof E, Hölzemann G, Jonczyk A, Goodman SL, Kessler H: N-Methylated cyclic RGD peptides as highly active and selective alpha(V)beta3 integrin antagonists. J Med Chem. 1999, 42 (16): 3033-40. 10.1021/jm970832g. doi:10.1021/jm970832gView ArticlePubMedGoogle Scholar
- Edwards RJ, Moran N, Devocelle M, Kiernan A, Meade G, Signac W, Foy M, Park SDE, Dunne E, Kenny D, Shields DC: Bioinformatic discovery of novel bioactive peptides. Nat Chem Biol. 2007, 3 (2): 108-112. 10.1038/nchembio854.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res. 2000, 28 (1): 235-242. 10.1093/nar/28.1.235.View ArticlePubMed CentralPubMedGoogle Scholar
- Schrodinger LLC: The PyMOL Molecular Graphics System, Version 1.3r1. 2010, [http://www.pymol.org/citing],Google Scholar
- Cukuroglu E, Gursoy A, Keskin O: HotRegion: a database of predicted hot spot clusters. Nucleic Acids Res. 2012, 40 (Database issue): 829-33. doi:10.1093/nar/gkr929View ArticleGoogle Scholar
- Nguyen VD, Hatahet F, Salo KEH, Enlund E, Zhang C, Ruddock LW: Pre-expression of a sulfhydryl oxidase significantly increases the yields of eukaryotic disulfide bond containing proteins expressed in the cytoplasm of E.coli. Microbial Cell Fact. 2011, 10: 1-10.1186/1475-2859-10-1. doi:10.1186/1475-2859-10-1View ArticleGoogle Scholar
- Maupetit J, Derreumaux P, Tufféry P: A fast method for large-scale de novo peptide and miniprotein structure prediction. J Comput Chem. 2010, 31 (4): 726-38. doi:10.1002/jcc.21365PubMedGoogle Scholar
- Kihara D, Chen H, Yang YD: Quality assessment of protein structure models. Cur Prot Pept Sci. 2009, 10 (3): 216-28. 10.2174/138920309788452173.View ArticleGoogle Scholar
- Abergel C, Monchois V, Byrne D, Chenivesse S, Lembo F, Lazzaroni J-C, Claverie J-M: Structure and evolution of the Ivy protein family, unexpected lysozyme inhibitors in Gram-negative bacteria. Proc Natl Acad Sci U S A. 2007, 104 (15): 6394-9. 10.1073/pnas.0611019104. doi:10.1073/pnas.0611019104View ArticlePubMed CentralPubMedGoogle Scholar
- Kim HM, Park BS, Kim J-I, Kim SE, Lee J, Oh SC, Enkhbayar P, Matsushima N, Lee H, Yoo OJ, Lee J-O: Crystal structure of the TLR4-MD-2 complex with bound endotoxin antagonist Eritoran. Cell. 2007, 130 (5): 906-17. 10.1016/j.cell.2007.08.002. doi:10.1016/j.cell.2007.08.002View ArticlePubMedGoogle Scholar
- Zhou A, Huntington JA, Pannu NS, Carrell RW, Read RJ: How vitronectin binds PAI-1 to modulate fibrinolysis and cell migration. Nat Rev Urol. 2003, 10 (7): 541-4. doi:10.1038/nsb943Google Scholar
- Huai Q, Zhou A, Lin L, Mazar AP, Parry GC, Callahan J, Shaw DE, Furie B, Furie BC, Huang M: Crystal structures of two human vitronectin, urokinase and urokinase receptor complexes. Nat Struct Mol Biol. 2008, 15 (4): 422-3. 10.1038/nsmb.1404. doi:10.1038/nsmb.1404View ArticlePubMed CentralPubMedGoogle Scholar
- Alfano D, Franco P, Vocca I, Gambi N, Pisa V, Mancini A, Caputi M, Carriero MV, Iaccarino I, Stoppelli MP: The urokinase plasminogen activator and its receptor: role in cell growth and apoptosis. Thromb Haemost. 2005, 93 (2): 205-11. doi:10.1267/THRO05020205PubMedGoogle Scholar
- Sidenius N, Blasi F: The urokinase plasminogen activator system in cancer: recent advances and implication for prognosis and therapy. Cancer Metastasis Rev. 2003, 22 (2-3): 205-22. doi:10.1023/A:1023099415940View ArticlePubMedGoogle Scholar
- Ogiso H, Ishitani R, Nureki O, Fukai S, Yamanaka M, Kim J-H, Saito K, Sakamoto A, Inoue M, Shirouzu M, Yokoyama S: Crystal structure of the complex of human epidermal growth factor and receptor extracellular domains. Cell. 2002, 110 (6): 775-787. 10.1016/S0092-8674(02)00963-7. doi:10.1016/S0092-8674(02)00963-7View ArticlePubMedGoogle Scholar
- Luckett S, Garcia RS, Barker JJ, Konarev AV, Shewry PR, Clarke AR, Brady RL: High-resolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds. J Mol Biol. 1999, 290 (2): 525-33. 10.1006/jmbi.1999.2891. doi:10.1006/jmbi.1999.2891View ArticlePubMedGoogle Scholar
- Domingo GJ, Leatherbarrow RJ, Freeman N, Patel S, Weir M: Synthesis of a mixture of cyclic peptides based on the Bowman-Birk reactive site loop to screen for serine protease inhibitors. Int J Pept Protein Res. 1995, 46 (1): 79-87. doi:10.1111/j.1399-3011.1995.tb00585.xView ArticlePubMedGoogle Scholar
- Linggi B, Carpenter G: ErbB receptors: new insights on mechanisms and biology. Trends Cell Biol. 2006, 16 (12): 649-56. 10.1016/j.tcb.2006.10.008. doi:10.1016/j.tcb.2006.10.008View ArticlePubMedGoogle Scholar
- Sun YQ: Glycine residues provide flexibility for enzyme active sites. J Biol Chem. 1997, 272 (6): 3190-3194. 10.1074/jbc.272.6.3190. doi:10.1074/jbc.272.6.3190View ArticlePubMedGoogle Scholar
- Jacob J, Duclohier H, Cafiso DS: The role of proline and glycine in determining the backbone flexibility of a channel-forming peptide. Biophys J. 1999, 76 (3): 1367-76. 10.1016/S0006-3495(99)77298-X. doi:10.1016/S0006-3495(99)77298-XView ArticlePubMed CentralPubMedGoogle Scholar
- Edwards RJ: GOPHER: Generation of Orthologous Proteins from High-throughput Evolutionary Relationships. 2006, [http://bioware.ucd.ie/~compass/biowareweb/],Google Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-7. 10.1093/nar/gkh340. doi:10.1093/nar/gkh340View ArticlePubMed CentralPubMedGoogle Scholar
- Capra JA, Singh M: Predicting functionally important residues from sequence conservation. Bioinformatics (Oxford, England). 2007, 23 (15): 1875-82. 10.1093/bioinformatics/btm270. doi:10.1093/bioinformatics/btm270View ArticleGoogle Scholar
- Cygler M, Schrag JD, Sussman JL, Harel M, Silman I, Gentry MK, Doctor BP: Relationship between sequence conservation and three-dimensional structure in a large family of esterases, lipases, and related proteins. Prot Sci : Publication Prot Soc. 1993, 2 (3): 366-82. doi:10.1002/pro.5560020309View ArticleGoogle Scholar
- Bustamante CD, Townsend JP, Hartl DL: Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol Biol Evol. 2000, 17 (2): 301-8. 10.1093/oxfordjournals.molbev.a026310.View ArticlePubMedGoogle Scholar
- Yeh S-W, Liu J-W, Yu S-H, Shih C-H, Hwang J-K, Echave J: Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol Biol Evol. 2014, 31 (1): 135-9. 10.1093/molbev/mst178. doi:10.1093/molbev/mst178View ArticlePubMedGoogle Scholar
- Woolhouse MEJ, Webster JP, Domingo E, Charlesworth B, Levin BR: Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat Genet. 2002, 32 (4): 569-77. 10.1038/ng1202-569. doi:10.1038/ng1202-569View ArticlePubMedGoogle Scholar
- Shimaoka M, Takagi J, Springer Ta: Conformational regulation of integrin structure and function. Annu Rev Biophys Biomol Struct. 2002, 31: 485-516. 10.1146/annurev.biophys.31.101101.140922. doi:10.1146/annurev.biophys.31.101101.140922View ArticlePubMedGoogle Scholar
- Gould CM, Diella F, Via A, Puntervoll P, Gemund C, Chabanis-Davidson S, Michael S, Sayadi A, Bryne JC, Chica C, Seiler M, Davey NE, Haslam N, Weatheritt RJ, Budd A, Hughes T, Pas J, Rychlewski L, Trave G, Aasland R, Helmer-Citterich M, Linding R, Gibson TJ: ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res. 2010, 38 (Database issue): 167-80. doi:10.1093/nar/gkp1016View ArticleGoogle Scholar
- Haslam NJ, Shields DC: Peptide-binding domains: are limp handshakes safest?. Sci Signal. 2012, 5 (243): 40-doi:10.1126/scisignal.2003372View ArticleGoogle Scholar
- Metz A, Pfleger C, Kopitz H, Pfeiffer-Marek S, Baringhaus K-H, Gohlke H: Hot spots and transient pockets: predicting the determinants of small-molecule binding to a protein-protein interface. J Chem Inf Model. 2012, 52 (1): 120-33. 10.1021/ci200322s. doi:10.1021/ci200322sView ArticlePubMedGoogle Scholar
- Kennedy AR, Davis JG, Carlton W, Ware JH: Effects of dietary antioxidant supplementation on the development of malignant lymphoma and other neoplastic lesions in mice exposed to proton or iron-ion radiation. Radiat Res. 2008, 169 (6): 615-25. 10.1667/RR1296.1. doi:10.1667/RR1296.1View ArticlePubMed CentralPubMedGoogle Scholar
- Nguyen LT, Chau JK, Perry Na, de Boer L, Zaat SaJ, Vogel HJ: Serum stabilities of short tryptophan- and arginine-rich antimicrobial peptide analogs. PloS one. 2010, 5 (9): 11-18. doi:10.1371/journal.pone.0012684View ArticleGoogle Scholar
- Murakami N, Tamura S, Wang W, Takagi T, Kobayashi M: Synthesis of stable analogs in blood and conformational analysis of arenastatin A, a potent cytotoxic spongean depsipeptide. Tetrahedron. 2001, 57 (20): 4323-4336. 10.1016/S0040-4020(01)00339-8. doi:10.1016/S0040-4020(01)00339-8View ArticleGoogle Scholar
- Werle M, Bernkop-Schnürch A: Strategies to improve plasma half life time of peptide and protein drugs. Amino Acids. 2006, 30 (4): 351-67. 10.1007/s00726-005-0289-3. doi:10.1007/s00726-005-0289-3View ArticlePubMedGoogle Scholar
- Baú D, Martin AJM, Mooney C, Vullo A, Walsh I, Pollastri G: Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics. 2006, 7: 402-10.1186/1471-2105-7-402. doi:10.1186/1471-2105-7-402View ArticlePubMed CentralPubMedGoogle Scholar
- Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ: Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009, 25 (11): 1422-1423. 10.1093/bioinformatics/btp163. doi:10.1093/bioinformatics/btp163View ArticlePubMed CentralPubMedGoogle Scholar
- Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database. Nucleic Acids Res. 2007, 35 (Database issue): 572-4. doi:10.1093/nar/gkl950View ArticleGoogle Scholar
- Navratil V, de Chassey B, Meyniel L, Delmotte S, Gautier C, André P, Lotteau V, Rabourdin-Combe C: VirHostNet: a knowledge base for the management and the analysis of proteome-wide virus-host interaction networks. Nucleic Acids Res. 2009, 37 (Database issue): 661-8. doi:10.1093/nar/gkn794View ArticleGoogle Scholar
- Chatr-Aryamontri A, Breitkreutz B-J, Heinicke S, Boucher L, Winter A, Stark C, Nixon J, Ramage L, Kolas N, O’Donnell L, Reguly T, Breitkreutz A, Sellam A, Chen D, Chang C, Rust J, Livstone M, Oughtred R, Dolinski K, Tyers M: The BioGRID interaction database 2013 update. Nucleic Acids Res. 2013, 41 (Database issue): 816-23. doi:10.1093/nar/gks1158View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.