ProtSA: a web application for calculating sequence specific protein solvent accessibilities in the unfolded ensemble
© Estrada et al; licensee BioMed Central Ltd. 2009
Received: 04 December 2008
Accepted: 08 April 2009
Published: 08 April 2009
The stability of proteins is governed by the heat capacity, enthalpy and entropy changes of folding, which are strongly correlated to the change in solvent accessible surface area experienced by the polypeptide. While the surface exposed in the folded state can be easily determined, accessibilities for the unfolded state at the atomic level cannot be obtained experimentally and are typically estimated using simplistic models of the unfolded ensemble. A web application providing realistic accessibilities of the unfolded ensemble of a given protein at the atomic level will prove useful.
ProtSA, a web application that calculates sequence-specific solvent accessibilities of the unfolded state ensembles of proteins has been developed and made freely available to the scientific community. The input is the amino acid sequence of the protein of interest. ProtSA follows a previously published calculation protocol which uses the Flexible-Meccano algorithm to generate unfolded conformations representative of the unfolded ensemble of the protein, and uses the exact analytical software ALPHASURF to calculate atom solvent accessibilities, which are averaged on the ensemble.
ProtSA is a novel tool for the researcher investigating protein folding energetics. The sequence specific atom accessibilities provided by ProtSA will allow obtaining better estimates of the contribution of the hydrophobic effect to the free energy of folding, will help to refine existing parameterizations of protein folding energetics, and will be useful to understand the influence of point mutations on protein stability.
A detailed understanding of protein folding energetics is fundamental for ab initio prediction of protein 3-D structures from sequences, for the rational engineering of new proteins, and for understanding diseases related to protein misfolding or aggregation [1, 2]. The unfolded state of proteins is central in developing the theoretical framework of folding processes because it represents the starting point from which proteins evolve to the native state. The hydrophobic effect operating on apolar side chains is an important factor driving protein folding , and the change in solvent accessible surface area (SASA) of a protein upon folding can be used to estimate the contribution of the hydrophobic effect to the free energy of folding [1, 3]. Empirical models relate changes in SASA (total, polar or apolar) upon folding to the heat capacity, enthalpy, or entropy of folding, and to equilibrium m-values in chemical unfolding [4, 5].
The SASA of a protein was defined  as the surface described around the protein by the centre of a solvent sphere in contact with the van der Waals surface of the molecule. Experimental determination of accurate SASAs of folded proteins at the atom level is not yet possible. Fortunately, computation of SASA values in the native state is straightforward when 3-D structures are available. Accurate SASA values for the unfolded state are not only difficult to determine but also difficult to calculate. In attempts to calculate the changes in SASA associated to the protein folding reaction, a variety of models of the unfolded state have been proposed. They include tripeptides [7–9], peptide-fragment collections in both native and extended conformations, extracted from a set of native structures [10, 11], ensembles of Ac-(Ala)3-X-(Ala)3-Nme peptides , and ensembles of polypeptide conformations of a specific selected protein . A common characteristic of all these models, but the last one, is that they provide mean solvent accessibilities for the 20 residue types, but they do not take into account the possibility that these accessibilities are modulated by the specific sequence context of the residue of interest.
We have recently developed a way to estimate SASA at atomic resolution in the unfolded ensemble. The method provides individual SASAs for each atom of each residue in a given protein sequence . The structural model chosen to describe the unfolded state consists of hundreds to thousands of unfolded conformations generated by Flexible-Meccano, an algorithm that performs conformational sampling using a coil-library and a simple volume exclusion term . The ensembles generated in this way successfully describe backbone fluctuations of several intrinsically unfolded proteins probed by Nuclear Magnetic Resonance (NMR) and small-angle X-ray scattering (SAXS) [15–19]. Our analysis of solvent exposures in unfolded ensembles of proteins generated with this method clearly indicates that the SASA of any residue is strongly influenced by its sequence neighbours  and, therefore, using generic residue-type values is not justified. A detailed benchmarking of the method has been described .
Here, we present a web application that calculates SASA of protein unfolded-state ensembles, detailed per residue and atom, using the methodology described . As far as we know, only two related servers exist. BPPred  calculates, from the number of residues, an overall protein change of SASA upon folding. Unfolded implements the approach by , which is based in generic residue-type values. None of these two servers calculates SASA values on a sequence specific representation of the unfolded-state ensemble of the protein of interest. In this sense, ProtSA is an innovative web application that will provide researchers with more accurate accessibility data for the parameterization and interpretation of protein folding thermodynamics.
ProtSA architecture consists of three parts: the user web browser, a middle tier Common Gateway Interface (CGI) application running on a web server, and the server part that calculates SASA of the protein unfolded-state ensemble. The server part uses three external software programs to perform the calculations: Flexible-Meccano for backbone-conformation generation, SCCOMP for side chain building, and ALPHASURF for SASA calculations of each conformation of the unfolded ensemble of the requested protein. The interaction between the ProtSA parts is as follows: the user fills in the input form using the web browser; the browser sends the input data to the CGI application, which checks its completeness and validity, and redirects complete and valid requests to the server part; the server part is a multithreaded program, with one network thread for receiving requests, and several worker threads for processing requests (one request per thread); the network thread receives the request from the CGI application, checks for resource availability and replies to the CGI application with an acceptance or refusal message; the CGI application informs the user whether the request is accepted or not, with the reason for refusal in the latter case; if resources are available, the network thread in the server part queues the request and, when a worker thread becomes available and no earlier requests are queued, that worker thread processes the request, calculating SASA of the protein unfolded-state ensemble; finally, the worker thread emails the results to the user. Both the CGI application and the server part were programmed using C++.
Check that all residues in the protein sequence belong to the set of 20 standard types. If the user provides a 3-D structure, check it for gaps or missing atoms.
Generate, from the protein sequence, a set of unfolded-state backbone-only conformations using Flexible-Meccano.
Add side chains to each conformation using SCCOMP.
Calculate SASA of each conformation using ALPHASURF. Obtain mean values per residue and per atom.
(Only if the user provides a 3-D structure of the protein) Calculate SASA for the 3-D structure (assumed to represent the folded state) using ALPHASURF. Calculate differences between folded and unfolded SASA, per atom and per residue.
Flexible-Meccano's Monte Carlo algorithm for generating the backbone of the unfolded-state conformations uses a subset of the database of amino-acid-specific Φ-and Ψ-torsion angles described in ; the subset is obtained by exclusion of all residues in α-helices and β-sheets. The database includes symmetric values for glycine Φ- and Ψ-torsion angles, and has special cases for residues preceding a proline. For each protein unfolded-state conformation the algorithm constructs the backbone starting at the C-terminal, although it has been shown that building directionality does not influence SASA results . Residue i is connected to residue i+1 by selecting a random pair of Φ- and Ψ-angles, for the type of residue i, from the torsional subset database. If residue i presents clashes with other residues (where residues are represented as spheres centred at the Cβ atom -the Cα atom for glycine residues- using radii derived from Levitt's force-field ), the Φ- and Ψ-torsion-angle pair is rejected, and another one is randomly selected. If, after 500 tries, the algorithm does not find a non-clashing Φ-and Ψ-torsion-angle pair, the partially-built conformation is rejected and the algorithm starts again at the C-terminal residue.
A key factor to the sequence-specificity of SASAs calculated for unfolded ensembles is the decoration of each polypeptide backbone with energetically realistic conformers of the sequence residues. This is performed using the iterative method implemented in SCCOMP. Using the rotamers of a backbone-dependent library, and a backbone independent one for special locations in the protein chain (such as the first and last residues), SCCOMP assigns rotamers, residue by residue, optimizing a scoring function with terms accounting for atom-atom contacts, steric overlaps, torsion energy, and the hydrophobic effect. SCCOMP repeats the complete assignment of rotamers to the protein residues until either there is no change in structure in two consecutive iterations or the limit of allowed iterations is reached.
The original method described to calculate solvent accessibilities in unfolded ensembles  used NACCESS, while the ProtSA application relies in ALPHASURF. We have compared the performance of these two methods by recalculating unfolded solvent accessibilities for the set of 19 proteins used in the original implementation. Another popular program to calculate exposures, DSSP , yields values 5% higher than those of NACCESS and ALPHASURF (not shown).
ProtSA solvent accessibilities of unfolded ensembles of test proteins
Protein PDB code
Number of residues
Accessibilities by ProtSA1
Accessibilities by NACCESS2
Solvent accessibilities (Å2) of amino acid residues in protein unfolded ensembles calculated with ProtSA
Number of Residues1
From the execution times of the 19 test proteins in a computer running CentOS 5, with 2 GB of RAM and a Core 2 Duo-2.4 GHz CPU (data not shown) we can deduce a linear dependence on the size of the input sequence. For the more demanding requests corresponding to ensembles of 2000 protein molecules there is a fix cost of about 80 minutes and a variable cost of about 0.5 minutes per residue in the sequence.
ProtSA, the freely-available web application presented in this work, represents a novel tool for the researcher interested in protein folding energetics. The sequence-specific protein solvent accessibilities in the unfolded state ensemble calculated by ProtSA will provide researchers a more precise view of unfolded state ensembles, and will help to understand the influence of mutations on protein stability.
Availability and requirements
Project name: ProtSA
Project home page: http://webapps.bifi.es/protsa/
Operating system(s): Platform independent
Programming language: C/C++
Any restrictions to use by non-academics: None
We thank Sara Ayuso (Univ. Zaragoza, Spain), for help in testing; Patrice Koehl (Univ. California Davis, USA), for help with ALPHASURF; Guillermo Losilla (BIFI, Univ. Zaragoza, Spain), for technical help and José Ramón Peregrina (Univ. Zaragoza, Spain), for comments and discussions. The molecular graphics image in Fig. 5 was produced using POV-Ray  and the UCSF Chimera package  from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIH P41 RR-01081). We acknowledge financial support from MEC (Spain): grant BFU2007-61476/BMC and BIO2007-63458, and from DGA (Spain): grant PI078/08. PB holds a Ramon y Cajal contract financed by MEC (Spain) and the Institute for Research in Biomedicine. JE was a recipient of an FPU doctoral fellowship from MEC (Spain).
- Baldwin RL: Energetics of protein folding. J Mol Biol 2007, 371: 283–301. 10.1016/j.jmb.2007.05.078View ArticlePubMedGoogle Scholar
- Chen Y, Ding F, Nie H, Serohijos AW, Sharma S, Wilcox KC, Yin S, Dokholyan NV: Protein folding: then and now. Arch Biochem Biophys 2008, 469: 4–19. 10.1016/j.abb.2007.05.014PubMed CentralView ArticlePubMedGoogle Scholar
- Wesson L, Eisenberg D: Atomic solvation parameters applied to molecular dynamics of proteins in solution. Protein Sci 1992, 1: 227–235.PubMed CentralView ArticlePubMedGoogle Scholar
- Robertson AD, Murphy KP: Protein structure and the energetics of protein stability. Chem Rev 1997, 97: 1251–1268. 10.1021/cr960383cView ArticlePubMedGoogle Scholar
- Myers JK, Pace CN, Scholtz JM: Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci 1995, 4: 2138–2148. 10.1002/pro.5560041020PubMed CentralView ArticlePubMedGoogle Scholar
- Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol 1971, 55: 379–400. 10.1016/0022-2836(71)90324-XView ArticlePubMedGoogle Scholar
- Miller S, Janin J, Lesk AM, Chothia C: Interior and surface of monomeric proteins. J Mol Biol 1987, 196: 641–656. 10.1016/0022-2836(87)90038-6View ArticlePubMedGoogle Scholar
- Shrake A, Rupley JA: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J Mol Biol 1973, 79: 351–371. 10.1016/0022-2836(73)90011-9View ArticlePubMedGoogle Scholar
- Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH: Hydrophobicity of amino acid residues in globular proteins. Science 1985, 229: 834–838. 10.1126/science.4023714View ArticlePubMedGoogle Scholar
- Creamer TP, Srinivasan R, Rose GD: Modeling unfolded states of peptides and proteins. Biochemistry 1995, 34: 16245–16250. 10.1021/bi00050a003View ArticlePubMedGoogle Scholar
- Creamer TP, Srinivasan R, Rose GD: Modeling unfolded states of proteins and peptides. II. Backbone solvent accessibility. Biochemistry 1997, 36: 2832–2835. 10.1021/bi962819oView ArticlePubMedGoogle Scholar
- Gong H, Rose GD: Assessing the solvent-dependent surface area of unfolded proteins using an ensemble model. Proc Natl Acad Sci USA 2008, 105: 3321–3326. 10.1073/pnas.0712240105PubMed CentralView ArticlePubMedGoogle Scholar
- Goldenberg DP: Computational simulation of the statistical properties of unfolded proteins. J Mol Biol 2003, 326: 1615–1633. 10.1016/S0022-2836(03)00033-0View ArticlePubMedGoogle Scholar
- Bernadó P, Blackledge M, Sancho J: Sequence-specific solvent accessibilities of protein residues in unfolded protein ensembles. Biophys J 2006, 91: 4536–4543. 10.1529/biophysj.106.087528PubMed CentralView ArticlePubMedGoogle Scholar
- Bernadó P, Blanchard L, Timmins P, Marion D, Ruigrok RW, Blackledge M: A structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc Natl Acad Sci USA 2005, 102: 17002–17007. 10.1073/pnas.0506202102PubMed CentralView ArticlePubMedGoogle Scholar
- Bernadó P, Bertoncini CW, Griesinger C, Zweckstetter M, Blackledge M: Defining long-range order and local disorder in native α-synuclein using residual dipolar couplings. J Am Chem Soc 2005, 127: 17968–17969. 10.1021/ja055538pView ArticlePubMedGoogle Scholar
- Mukrasch MD, Markwick P, Biernat J, Bergen M, Bernadó P, Griesinger C, Mandelkow E, Zweckstetter M, Blackledge M: Highly populated turn conformations in natively unfolded tau protein identified from residual dipolar couplings and molecular simulation. J Am Chem Soc 2007, 129: 5235–5243. 10.1021/ja0690159View ArticlePubMedGoogle Scholar
- Wells M, Tidow H, Rutherford TJ, Markwick P, Jensen MR, Mylonas E, Svergun DI, Blackledge M, Fersht AR: Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc Natl Acad Sci USA 2008, 105: 5762–5767. 10.1073/pnas.0801353105PubMed CentralView ArticlePubMedGoogle Scholar
- Jensen MR, Houben K, Lescop E, Blanchard L, Ruigrok RW, Blackledge M: Quantitative conformational analysis of partially folded proteins from residual dipolar couplings: application to the molecular recognition element of Sendai virus nucleoprotein. J Am Chem Soc 2008, 130: 8055–8061. 10.1021/ja801332dView ArticlePubMedGoogle Scholar
- Geierhaas CD, Nickson AA, Lindorff-Larsen K, Clarke J, Vendruscolo M: BPPred: a Web-based computational tool for predicting biophysical parameters of proteins. Protein Sci 2007, 16: 125–134. 10.1110/ps.062383807PubMed CentralView ArticlePubMedGoogle Scholar
- Eyal E, Najmanovich R, McConkey BJ, Edelman M, Sobolev V: Importance of solvent accessibility and contact surfaces in modeling side-chain conformations in proteins. J Comput Chem 2004, 25: 712–724. 10.1002/jcc.10420View ArticlePubMedGoogle Scholar
- Edelsbrunner H, Koehl P: The weighted-volume derivative of a space-filling diagram. Proc Natl Acad Sci USA 2003, 100: 2203–2208. 10.1073/pnas.0537830100PubMed CentralView ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: NACCESS Computer Program. Department of Biochemistry and Molecular Biology, University College London, London, UK; 1993.Google Scholar
- Lovell SC, Davis IW, Arendall WB 3rd, de Bakker PI, Word JM, Prisant MG, Richardson JS, Richardson DC: Structure validation by Cα geometry: Φ,Ψ and Cβ deviation. Proteins 2003, 50(3):437–450. 10.1002/prot.10286View ArticlePubMedGoogle Scholar
- Levitt M: A simplified representation of protein conformations for rapid simulation of protein folding. J Mol Biol 1976, 104: 59–107. 10.1016/0022-2836(76)90004-8View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- ProtSA: A web application for calculating sequence specific protein solvent accessibilities in the unfolded ensemble[http://webapps.bifi.es/protsa/]
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Pakula AA, Sauer RT: Reverse hydrophobic effects relieved by amino-acid substitutions at a protein surface. Nature 1990, 344: 363–364. 10.1038/344363a0View ArticlePubMedGoogle Scholar
- Ohlendorf DH, Tronrud DE, Matthews BW: Refined structure of Cro repressor protein from bacteriophage λ suggests both flexibility and plasticity. J Mol Biol 1998, 280: 129–136. 10.1006/jmbi.1998.1849View ArticlePubMedGoogle Scholar
- Takano K, Scholtz JM, Sacchettini JC, Pace CN: The contribution of polar group burial to protein stability is strongly context-dependent. J Biol Chem 2003, 278: 31790–31795. 10.1074/jbc.M304177200View ArticlePubMedGoogle Scholar
- Persistence of Vision Raytracer[http://www.povray.org/]
- Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE: UCSF Chimera – A visualization system for exploratory research and analysis. J Comput Chem 2004, 25: 1605–1612. 10.1002/jcc.20084View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.