- Open Access
ProteinVolume: calculating molecular van der Waals and void volumes in proteins
© Chen and Makhatadze; licensee BioMed Central. 2015
- Received: 28 December 2014
- Accepted: 10 March 2015
- Published: 26 March 2015
Voids and cavities in the native protein structure determine the pressure unfolding of proteins. In addition, the volume changes due to the interaction of newly exposed atoms with solvent upon protein unfolding also contribute to the pressure unfolding of proteins. Quantitative understanding of these effects is important for predicting and designing proteins with predefined response to changes in hydrostatic pressure using computational approaches. The molecular surface volume is a useful metric that describes contribution of geometrical volume, which includes van der Waals volume and volume of the voids, to the total volume of a protein in solution, thus isolating the effects of hydration for separate calculations.
We developed ProteinVolume, a highly robust and easy-to-use tool to compute geometric volumes of proteins. ProteinVolume generates the molecular surface of a protein and uses an innovative flood-fill algorithm to calculate the individual components of the molecular surface volume, van der Waals and intramolecular void volumes. ProteinVolume is user friendly and is available as a web-server or a platform-independent command-line version.
ProteinVolume is a highly accurate and fast application to interrogate geometric volumes of proteins. ProteinVolume is a free web server available on http://gmlab.bio.rpi.edu. Free-standing platform-independent Java-based ProteinVolume executable is also freely available at this web site.
- Volume calculations
- Void volume
- van der Waals volume
Currently there are several algorithms to calculate geometric volumes of proteins. They can be divided into three distinct categories. The first is 3D grid-based calculations and include VOIDOO , AVP , 3 V , Voronoia . The second category uses analytical methods and includes MSROLL , VORLUME  and ALPHAVOL . The third category includes calculations based on Delaunay triangulation such as VADAR  or Monte Carlo method such as MCVOL . Each of these methods has its own advantages but more importantly some disadvantages. For example, 3D-grid methods have irreproducibility issues due to the positioning of protein structure on the grid. The Delaunay triangulation does perform well in the protein interior but suffers from uncertainty of how protein boundaries are delineated. These issues are sometimes further amplified upon implementation in software packages that are usually written to evaluate a particular property (see comparison in Additional file 1: Table S1).
Several methods calculate VVDW and VSA. VOIDOO  is a 3D grid-based algorithm that calculates the VVDW and/or VSA of a protein. VORLUME  and ALPHAVOL  are analytical alpha-shape methods that also calculate VVDW and/or VSA. Another method to calculate protein volume involves partitioning the space around each atom into Voronoi polyhedra, as implemented by Finney in 1970  and Richards in 1974 . However, this method does not calculate any of the volumes individually, but instead calculate the sum of the VVDW, VVoid, and portions of the VE. Parts of the VE are assigned to surface atoms because the boundary separating protein and bulk solvent is drawn between the surface atoms and neighboring solvent molecules. Thus, the boundary separating protein and bulk solvent is highly dependent on the method used for the placement of the solvent molecules. Depending on the placement method, the volume and packing density of surface atoms will vary. Since parts of the VE are grouped with protein atoms, it is impossible to separate hydration or geometric volume components from the total volume computed using Voronoi polyhedra methods.
It is crucial to separate geometric and hydration volumes of a protein to understand the magnitude of contribution of each of these components to the total volume of a protein in solution. Therefore, it is necessary to calculate the VMS of a protein instead of VSA and VVDW. Unfortunately, there are a limited number of non-grid based programs that can calculate VMS. MCVOL  uses a Monte Carlo algorithm to approximate the VMS of a protein, whereas MSROLL  analytically calculates VMS. However, both programs have inherent limitations. MCVOL will underestimate VVoid when the diameter along the shortest axis of a cavity is larger than 2.8 Å, because a point is considered part of the solvent if it is more than 1.4 Å away from the surface of any protein atom . MSROLL is extremely fast, but it suffers from lower robustness when encountering degenerate geometry. Finally neither is available as a web-server. We present ProteinVolume, a robust method to numerically calculate VMS, VVDW and VVoid using a flood-fill algorithm to generate the molecular surface and fill the surface interior with high-resolution probes. Volume probes can dynamically reduce their radius when needed, increasing the accuracy of numerical approximation.
ProteinVolume is available as free-standing software as well as via a web-based interface from http://gmlab.bio.rpi.edu. Below we describe the overall properties of the ProteinVolume followed by the description of web-server.
The surface of a protein is generated from the user provided Protein Data Bank (PDB) coordinates using a flood-fill algorithm operating in the spherical coordinate system, analogous to rolling a ball on the surface of a protein. The furthest atom from the protein center of mass is selected as the starting atom. Then, an exhaustive ray-sphere intersection test is carried out on all angles around the starting atom to find an unoccupied position for a probe with 1.4 Å radius. This is the starting position for the surface algorithm. The starting spherical coordinates are converted into Cartesian coordinates and then the surface is grown from that starting point using a flood-fill algorithm. A hashset is used to store all previously visited locations on each atom surface to prevent backtracking. To detect inter-atom surface probe collisions, all surface probes within nearby spatial bins are tested for distance below a minimum cutoff, the surface probe minimum distance (default value set to 0.1 Å). For reference, this method generates approximately 500,000 surface probes for the native structure of ubiquitin (1UBQ, 76 residues, 1,231 atoms) in ~2 seconds on a single core of an i7-3630QM.
Grid-based spatial binning is employed to reduce the number of collision checks when placing a new volume probe in the protein. The entire 3D coordinate space is divided into cubic spatial bins of 2 Å diameter. This value is slightly larger than the radius of the largest protein atom which will minimize the number of possible bins an atom can occupy. Each existing protein atom and generated surface probe is added into a hashmap of spatial bins before volume calculation. The data structure of the hashmap is a spatial bin index and an ArrayList of atoms/probes. The spatial bin index is calculated from the 9 possible extreme edges of each sphere and duplicate bin indices are ignored. When testing for a collision between volume probes and surface atoms or nearby protein atoms, only spatial bins surrounding the volume probe are selected for collision testing as to reduce computational time. This results in an overall runtime complexity of O(n), where n is the number of atoms in the system.
Language and libraries
ProteinVolume was programmed in Java (JDK 1.7) using the Trove collections library for higher performance and overall lower memory usage. ProteinVolume is platform independent and can be run on any platform with a Java runtime environment.
ProteinVolume web interface
ProteinVolume web interface allows users to upload PDB files and run ProteinVolume from any device without expending their local computing resources. We have strived to create a clean, user-friendly, and responsive interface for ease of use. All interactions with the server are AJAX-powered, which provides a native feel to the application. Users are presented with a form that allows them to upload file(s) of interest and fill in their names and email addresses. Anonymous users are allowed to upload one PDB file whereas users providing their name are allowed to upload up to ten PDB files. After the PDB files are uploaded, users are placed into a queue. As resources become available, the job is executed and the output of the program is displayed in real time to the user and a progress bar is displayed. The progress bar shows the percent completion value, estimated based on the total number of atoms in all submitted PDB files and the selected ProteinVolume options.
Input structure preparation
The default option of ProteinVolume uses explicit hydrogen atoms and Bondi  van der Waals radii for all atoms due to overestimation of van der Waals volumes when united atom radii are used. It is highly recommended to energy minimize all structures before volume processing to reduce unfavorable steric clashes that will skew volume results and make volume comparisons inaccurate. For example, we routinely energy minimize our proteins using the CHARMM27  all-atom forcefield in GROMACS  for 1 ps using the steepest decent method in implicit solvent and a 1 nm cutoff for electrostatic interactions. This will also add all hydrogen atoms to the structure. The user can add minimization as a preprocessing option to web server calculations. Alternatively, the hydrogen atoms can be explicitly  modeled using REDUCE software . In the executable version of ProteinVolume, the user can modify the van der Waals radii set by editing parameter file. If hydrogen atom radius is set to zero, hydrogens will be ignored in the calculations.
The volume calculation of a protein ranges from seconds to minutes depending on protein size and program options. On a single core of an i7-3630QM @ 2.4ghz, the structure of ubiquitin (1UBQ, 76 residues) takes ~1 minute to calculate with 0.08 Å starting probe size, 0.02 Å ending probe size, and 0.1 Å surface probe minimum distance. With the current server hardware the same protein with the same parameter settings takes ~9 min. The computational complexity of the algorithm is O(n) or linear, where n is the number of atoms in the system, due to spatial binning optimizations which limit the number of pairwise distance calculations.
A set of 1,379 high-resolution (<1.7 Å) crystal structures had their native ensembles modeled and calculated with ProteinVolume. MODELLER  was used to model the native ensemble, which contained 11 structures per protein. The range of protein sizes was between 40 to 1,052 amino acid residues. The total number of structures tested was 15,169. For all structures, ProteinVolume successfully calculated volumes without runtime errors.
The effects of the probe size parameters
Three parameters, starting probe size, ending probe size, and surface probe minimum distance, have a significant effect on the running time and accuracy of the algorithm.
The surface probe minimum distance is the minimum distance at which two surface probes can be placed next to each other. When this value is increased, surface probe density decreases which causes a significant reduction in pairwise distance calculations made and reduces processing time taken. The default value for surface resolution is 0.1 Å. Increasing this up to 0.4 Å will decrease computational time at the expense of accuracy of the calculations (see Figure 3B). A surface probe minimum distance of 0.1 Å generates a very high-resolution surface of approximately 5,000 probes per a single isolated atom.
ProteinVolume was benchmarked against two volume calculation programs: MCVOL  and MSROLL . MCVOL uses a Monte Carlo algorithm to approximate the VMS and VVoid of a protein. MSROLL analytically calculates the VMS of a protein. Triangles occupying the intersection volume between atoms are discarded. VMS is calculated by summing the volume of each triangular pyramid formed by the tessellated surface to the center of each atom. 217 ultra-high resolution (0.7-1.2 Å) crystal structures [23,24] were selected for benchmarking volume calculations Additional file 2. Ultra High Resolution Protein Set (0.73 - 1.20 Å). These two programs were selected because they directly compute VMS. The average VMS deviation between ProteinVolume and MCVOL or MSROLL was 0.2% and 0.7%, respectively (Additional file 3: Figure S1). The excellent agreement of ProteinVolume, with MSROLL and MCVOL shows that ProteinVolume is accurately calculating VMS. Since VOIDOO, Vorlume, and AlphaVol directly compute VSA instead of VMS, direct comparison with ProteinVolume volumes is not possible, yet the VVDW computed by for example VOIDOO is in excellent agreement with VVDW computed by ProteinVolume (see Additional file 3: Figure S1). To test whether ProteinVolume accuracy was dependent on crystallographic resolution, calculations performed on a set of proteins, solved to an ultra-high resolution (0.7 - 1.2 Å, n = 217) was compared to a set solved to high resolution (1.2 - 1.7 Å, n = 1,161). As expected , both sets display the same slope and intercept for the dependence of volume on the protein size (Additional file 3: Figure S1). This indicates that accuracy of ProteinVolume is independent of the crystallographic resolution.
Scaling behavior of geometric volumes of proteins
The void volumes inside the proteins, i.e. the magnitude of VVoid, have been implicated in determining the pressure unfolding of proteins [5,6]. The prediction based on the scaling behavior of VVoid is that larger proteins will be more prone to unfold under pressure. This prediction still awaits experimental validation.
We present ProteinVolume, a volume calculator that reports the van der Waals (VVDW), void (VVoid), and total volume (VMS) enclosed within the molecular surface a protein. The VMS, or solvent-excluded volume, can be thought of as the geometric volume contribution of a protein which consists of van der Waals and intramolecular void volume. This allows us to clearly separate the volume contribution of the protein geometry (VMS) and the protein-solvent interactions (hydration volume). The sum of these two components should result in a better approximation of the apparent volume of a protein molecule in solution than other computational models which are based on the volume enclosed by the accessible surface area. Finally, partitioning the volume components into geometric (VMS) and hydration components will lead to a quantitative insight of each term, and will allow rational engineering of volume changes in proteins.
Project name: ProteinVolume
Project home page: http://gmlab.bio.rpi.edu
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java Runtime Environment 1.7 and above
License: Closed source proprietary
Any restrictions to use by non-academics: none
This work was supported by the US National Science Foundation grant CHE-1145407 (Chemistry of Life Processes).
- Royer CA. Revisiting volume changes in pressure-induced protein unfolding. Biochim Biophys Acta. 2002;1595:201–9.View ArticlePubMedGoogle Scholar
- Chalikian TV. On the molecular origins of volumetric data. J Phys Chem B. 2008;112:911–7.View ArticlePubMedGoogle Scholar
- Schweiker KL, Fitz VW, Makhatadze GI. Universal convergence of the specific volume changes of globular proteins upon unfolding. Biochemistry. 2009;48:10846–51.View ArticlePubMedGoogle Scholar
- Richards FM. Areas, volumes, packing and protein structure. Annu Rev Biophys Bioeng. 1977;6:151–76.View ArticlePubMedGoogle Scholar
- Frye KJ, Royer CA. Probing the contribution of internal cavities to the volume change of protein unfolding under pressure. Protein Sci. 1998;7:2217–22.View ArticlePubMedPubMed CentralGoogle Scholar
- Roche J, Caro JA, Norberto DR, Barthe P, Roumestand C, Schlessman JL, et al. Cavities determine the pressure unfolding of proteins. Proc Natl Acad Sci U S A. 2012;109:6945–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Kleywegt GJ, Jones TA. Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Crystallogr D Biol Crystallogr. 1994;50:178–85.View ArticlePubMedGoogle Scholar
- Cuff AL, Martin AC. Analysis of void volumes in proteins and application to stability of the p53 tumour suppressor protein. J Mol Biol. 2004;344:1199–209.View ArticlePubMedGoogle Scholar
- Voss NR, Gerstein M. 3V: cavity, channel and cleft volume calculator and extractor. Nucleic Acids Res. 2010;38:W555–62.View ArticlePubMedPubMed CentralGoogle Scholar
- Rother K, Hildebrand PW, Goede A, Gruening B, Preissner R. Voronoia: analyzing packing in protein structures. Nucleic Acids Res. 2009;37:D393–5.View ArticlePubMedGoogle Scholar
- Connolly ML. Computation of Molecular Volume. J Am Chem Soc. 1985;107:1118–24.View ArticleGoogle Scholar
- Cazals F, Kanhere H, Loriot S. Computing the Volume of a Union of Balls: A Certified Algorithm. ACM. 2011;38:1–25.Google Scholar
- Edelsbrunner H, Koehl P. The weighted-volume derivative of a space-filling diagram. Proc Natl Acad Sci U S A. 2003;100:2203–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, et al. VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res. 2003;31:3316–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Till MS, Ullmann GM. McVol - A program for calculating protein volumes and identifying cavities by a Monte Carlo algorithm. J Mol Model. 2010;16:419–29.View ArticlePubMedGoogle Scholar
- Finney JL. Random Packings and Structure of Simple Liquids.1. Geometry of Random Close Packing. Proc R Soc Lon Ser-A. 1970;319:479–93.View ArticleGoogle Scholar
- Richards FM. Interpretation of Protein Structures - Total Volume, Group Volume Distributions and Packing Density. J Mol Biol. 1974;82:1–14.View ArticlePubMedGoogle Scholar
- Bondi A. Van Der Waals Volumes + Radii. J Phys Chem-Us. 1964;68:441–7.View ArticleGoogle Scholar
- Brooks BR, Brooks CL, Mackerell AD, Nilsson L, Petrella RJ, Roux B, et al. CHARMM: The Biomolecular Simulation Program. J Comput Chem. 2009;30:1545–614.View ArticlePubMedPubMed CentralGoogle Scholar
- Pronk S, Pall S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics. 2013;29:845–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Word JM, Lovell SC, Richardson JS, Richardson DC. Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol. 1999;285:1735–47.View ArticlePubMedGoogle Scholar
- Fiser A, Sali A. MODELLER: Generation and refinement of homology-based protein structure models. Method Enzymol. 2003;374:461–91.View ArticleGoogle Scholar
- Bush J, Makhatadze GI. Statistical analysis of protein structures suggests that buried ionizable residues in proteins are hydrogen bonded or form salt bridges. Proteins. 2011;79:2027–32.View ArticlePubMedGoogle Scholar
- Loladze VV, Makhatadze GI. Energetics of charge-charge interactions between residues adjacent in sequence. Proteins. 2011;79:3494–9.View ArticlePubMedGoogle Scholar
- Fleming PJ, Richards FM. Protein packing: Dependence on protein size, secondary structure and amino acid composition. J Mol Biol. 2000;299:487–98.View ArticlePubMedGoogle Scholar
- Trifonov EN, Berezovsky IN. Evolutionary aspects of protein structure and folding. Curr Opin Struct Biol. 2003;13:110–4.View ArticlePubMedGoogle Scholar
- Sandhya S, Rani SS, Pankaj B, Govind MK, Offmann B, Srinivasan N, et al. Length variations amongst protein domain superfamilies and consequences on structure and function. PLoS One. 2009;4:e4981.View ArticlePubMedPubMed CentralGoogle Scholar
- Privalov PL. Stability of proteins. Proteins which do not present a single cooperative system. Adv Protein Chem. 1982;35:1–104.View ArticlePubMedGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.