Volume 8 Supplement 1
ProCMD: a database and 3D web resource for protein C mutants
© D'Ursi et al; licensee BioMed Central Ltd. 2007
Published: 8 March 2007
Activated Protein C (ProC) is an anticoagulant plasma serine protease which also plays an important role in controlling inflammation and cell proliferation. Several mutations of the gene are associated with phenotypic functional deficiency of protein C, and with the risk of developing venous thrombosis. Structure prediction and computational analysis of the mutants have proven to be a valuable aid in understanding the molecular aspects of clinical thrombophilia.
We have built a specialized relational database and a search tool for natural mutants of protein C. It contains 195 entries that include 182 missense and 13 stop mutations. A menu driven search engine allows the user to retrieve stored information for each variant, that include genetic as well as structural data and a multiple alignment highlighting the substituted position. Molecular models of variants can be visualized with interactive tools; PDB coordinates of the models are also available for further analysis. Furthermore, an automatic modelling interface allows the user to generate multiple alignments and 3D models of new variants.
ProCMD is an up-to-date interactive mutant database that integrates phenotypical descriptions with functional and structural data obtained by computational approaches. It will be useful in the research and clinical fields to help elucidate the chain of events leading from a molecular defect to the related disease. It is available for academics at the URL http://www.itb.cnr.it/procmd/.
Activated protein C (APC) is a vitamin K-dependent serine protease that plays a central role in the regulation of blood coagulation. It is the key component of an anticoagulant system that provides an essential mechanism in thrombosis prevention and in inflammatory response control.
APC binds to its cofactor protein S, and the complex inactivates the two cofactors involved in the clotting cascade, factors Va and factor VIIIa, leading to the efficient inhibition of the coagulation process. Furthermore APC has direct and indirect anti-inflammatory actions. It prevents leukocyte rolling, tissue factor exposure and tumour necrosis factor production by monocytes, thrombin-mediated inflammatory actions and apoptosis of endothelial cells . The anti-thrombotic and anti-inflammatory actions of APC have been therapeutically exploited in severe sepsis .
Protein C, the zymogen of APC, is synthesized as a single chain precursor containing an amino-terminal leader sequence followed by a propeptide, which are cleaved upon secretion. It circulates in plasma mostly as a two-chain zymogen obtained by removal of the dipeptide Lys198 and Arg199 that results in the formation of two chains (light, 21 kDa and heavy, 41 kDa) linked by a disulphide bridge between Cys183 and Cys320. The zymogen is activated by thrombin through the proteolytic cleavage of a 12 amino acid peptide (Asp200-Arg211) . The activation occurs on the endothelium of blood vessels by the thrombin-thrombomodulin complex  and this process is enhanced by the endothelial cell protein C receptor (EPCR) .
Protein C has a multi-domain structure: the light chain contains a γ-carboxyglutamic acid (Gla)-rich membrane binding domain and two epidermal growth factor (EGF)-like modules, while the heavy chain has the form of a typical trypsin-like serine protease domain .
The structure of the Gla-domainless APC has been solved by X-ray crystallography , while the structure of the Gla-domain is available in the complex with endothelial protein C receptor . Protein C shares homologies with other vitamin K-dependent coagulation proteins as a result of a common evolutionary pathway.
Mutations on the gene have been found in patients with protein C deficiency (OMIM 176860), a disorder associated with development of purpura fulminans and severe recurrent thrombotic events in the homozygous form while, in the heterozygous form, it is responsible for an increased risk of venous thromboembolism in early adulthood [8, 9].
Phenotypically, two distinct types of protein C deficiency are recognized: type I deficiency, the most common, is characterized by a parallel reduction in protein C concentration and function (measured in plasma by amidolytic or anticoagulant methods); type II deficiency is identified by normal or increased concentration and reduced function .
A large number of the mutations that contribute to the protein C deficiency fall in the structurally solved domains and thus are amenable to homology modelling and computational analysis.
Structural models can be useful in the research and clinical fields to elucidate how a mutation may interfere with enzymatic activity, ligand binding and cofactor interaction and relate the effect to patient phenotype.
In the last published database of mutations of the protein C gene [10, 11], 161 different mutations (corresponding to 351 entries) are reported. Additional variants can be obtained from other sources (Human Gene Mutation Database [12, 13] and the Swiss-Prot Variant Page [14, 15]. However, to our knowledge, none of the available collections include data on the structural-functional interpretation of reported variants.
We describe here an updated, 3D-structure oriented database of protein C that associates clinical and phenotypical descriptions with functional and structural data obtained by computational approaches. It includes the description of 21 new variants that we have identified and analyzed in a previous work . The database is integrated with an interactive search interface and with tools for structure visualization and mutant modelling.
Construction and content
We collected a total of 195 naturally occurring mutations in the coding region of the protein C gene that include 182 missense and 13 stop mutations. Of these, a set of 21 variants were identified by our group through the screening of the protein C gene of 42 patients with a phenotypic functional deficiency of protein C . The remaining portion of the dataset consists of missense and nonsense variants already reported, obtained from 3 different sources: the database of mutations of protein C gene [10, 11], the Human Gene Mutation Database [12, 13] and the Swiss-Prot Variant Page [14, 15]. Additional variants, not included in the above databases, were obtained from literature. The entries were manually extracted and filtered to avoid duplicates.
General information associated with each entry was derived from literature sources. A multiple alignment obtained with CLUSTALW  using a set of orthologous and paralogous sequences was also associated with each entry.
Variants with substitutions in the structurally solved regions of the protein C, have been modelled starting from X-ray coordinates (PDB entry 1AUT). Molecular modelling of the 21 variants identified by our group, were achieved by residue replacement using InsightII (Accelrys INC., San Diego, Ca, USA). The lowest energy rotamer was chosen as the starting side chain position, followed by energy minimization calculation consisting of 500 steps of steepest descent keeping the backbone fixed, followed by 500 steps of conjugate gradient on the whole structure. A detailed computational analysis of these variants, such as electrostatic potential calculations, was formerly carried out and the results are also stored in the database.
All other variants were modelled using an automatic approach based on an adapted Python script of Modeller . The script replaces the side chain of the mutated residue in the PDB file (1AUT) and optimizes the conformation by energy minimization and molecular dynamics. For each variant modelled, a set of 3D molecular representation were constructed using the programs PyMol (PyMOL Molecular Graphics System, DeLano Scientific, San Carlos, CA, USA) and MOLSCRIPT  to obtain fixed images and VRML (Virtual Reality Modelling Language) files. VRMLs can be viewed through a browser in a dynamic, interactive way using a player like CORTONA2 .
Database and User Interface
The data are stored in a relational database managed by a MySQL Database Management System . The database at the moment contains four tables: a "Mutations" table with all the single point mutation entries, a "Variants" table with clinical comments and, if available, literature data concerning each mutant; and two other tables with structural information about domains, chains, secondary structures, molecular modelling results and structure-function relationships.
We created a web based interface with the aim of helping users search for specific information, to browse the entire database or to visualize 2D and 3D images of the variants. The web site is also a source of documentation about protein C, and a point of access to external resources and databases related to protein C. This user interface has been built with PHP language scripts  on an Apache Web Server .
sequence position of the mutated residues numbered according to:UniProtKB/Swiss-Prot entry P04070, Foster's codon numbering  and the chymotrypsin numbering used in PDB entry 1AUT when the substituted residue is included in the X-ray structure of protein C;
wild type and mutated residues;
clinical or laboratory phenotype data as obtained from other database reports or from the literature;
links to PUBMED database;
cross-references to other databases reporting the mutation.
There is also a structural information section that assigns the mutated residue to its specific chain, secondary structure and domain localization. A multiple alignment of homologous sequences, a 3D gallery of structural images and the PDB coordinates of the mutants are also present. Results of computational analysis are collected in a 3D-notes page that includes considerations on the physico-chemical properties of the mutant residue compared to the wild-type, (i.e. charge, hydrophobicity, solvent accessibility), a list of hydrogen bonds and hydrophobic interactions. Additional information resulting from further computational studies, such as electrostatic potential calculation, and the prediction of structural-functional effects of the mutation are associated with some entries.
Utility and Discussion
The ProCMD database aims to provide a summary of the sequence and structure information on variants with substitutions in the coding region of the protein C gene. The database is interfaced with a fully interactive website, through which the user can retrieve entries of interest, find cross-references and visualize structural models with interactive tools.
Data search and retrieval
A query page allows the user to retrieve entries by the position in the sequence of a mutated residue, by amino acidic substitution, and by domain localization. Results of the query are listed in a table showing the amino acidic substitution and the sequence position for each entry, and provide links to details pages summarizing all the associated data.
Taken together, all the data related to each variant are useful to understand the relationship between the mutation and phenotype and help to elucidate the role of specific residues for protein function.
Analysis tools for new mutations
For other user-defined missense mutants, not present in the database, the site provides tools for evaluating the residue conservation and for the homology modelling of the variant.
Selecting "Multiple alignments" from the home page (figure 1), the user can promptly visualize the residue of interest in the homologous alignment to have hints on its evolutionary conservation.
Molecular models can be obtained, if the residue falls in the 3D-structure, by the same automatic procedure based on the script of Modeller  used for entries preparation. Models can be visualized and 3D coordinates can be downloaded for further studies.
As the models are obtained in a completely automatic way, the user is cautioned about the possibility of having obtained non-accurate results. Careful inspection of the outcome is therefore recommended.
ProCMD features an online submission of mutation data. New mutations regarding the coding region of the protein C gene can be sent by filling out the fields on the online submission form. A text mail will be automatically generated by the server and sent to the authors after submission. The database curators verify the submitted data and will incorporate them after annotation according to the database format.
This database provides a tool, complementary to other mutant collections of protein C, that is especially devoted to structural analysis and interpretation. A great effort has been put into the production of 3D-images associated with the molecular models and input files for interactive viewers which visualize the models in 3D space. The availability of structural models can be useful in the research and clinical fields both to elucidate how a mutation may interfere with enzymatic activity, ligand binding and cofactor interaction, and to relate the effect to patient phenotype. The present resource can be valuable to help predict the effect of a mutation, to clarify the role of specific residues in protein function and hopefully to give hints for the rational design of specific variants of protein C for therapeutic use.
Availability and requirements
The database is maintained on the server of the Institute of Biomedical Technology -National Research Council (Segrate – MI, Italy) and is available at the following URL http://www.itb.cnr.it/procmd
protein C mutation database
Protein Data Bank
Activated Protein C
Human Gene Mutation Database
Virtual Reality Modelling Language.
We are indebted to Chiara Bishop for the critical reading of the manuscript.
This work was supported by European Project BioinfoGRID (Bioinformatics Application for Life Science).
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 1, 2007: Italian Society of Bioinformatics (BITS): Annual Meeting 2006. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S1.
- Van De Wouwer M, Collen D, Conway EM: Thrombomodulin-protein C-EPCR system. Integrated to regulate coagulation and inflammation. Arterioscler Thromb Vasc Biol 2004, 24: 1374–1383. 10.1161/01.ATV.0000112102.28859.39View ArticlePubMedGoogle Scholar
- Bernard GR, Vincent JL, Laterre PF, LaRosa SP, Dhainaut JF, Lopez-Rodriguez A, Steingrub JS, Garber GE, Helterbrand JD, Ely EW, Fisher CJ: Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med 2001, 344: 699–709. 10.1056/NEJM200103083441001View ArticlePubMedGoogle Scholar
- Stenflo J: Structure and function of protein C. Semin Thromb Hemost 1984, 10: 109–112.View ArticlePubMedGoogle Scholar
- Esmon CT: The regulation of natural anticoagulant pathways. Science 1987, 235: 1348–1352. 10.1126/science.3029867View ArticlePubMedGoogle Scholar
- Fukudome K, Ye X, Tsuneyoshi N, Tokunaga O, Sugawara K, Mizokami H, Kimoto M: Activation mechanism of anticoagulant protein C in large blood vessels involving the endothelial cell protein C receptor. J Exp Med 1998, 187: 1029–1035. 10.1084/jem.187.7.1029PubMed CentralView ArticlePubMedGoogle Scholar
- Mather T, Oganessyan V, Hof P, Huber R, Foundling S, Esmon C, Bode W: The 2.8 Å crystal structure of Gla-domainless activated protein C. The EMBO Journal 1996, 15: 6822–6831.PubMed CentralPubMedGoogle Scholar
- Oganesyan V, Oganesyan N, Terzyan N, Qu D, Dauter Z, Esmon NL, Esmon CT: The crystal structure of the endothelial protein C receptor and a bound phospholipid. J Biol Chem 2002, 277: 24851–24854. 10.1074/jbc.C200163200View ArticlePubMedGoogle Scholar
- Aiach M, Borgel D, Gaussem P, Emmerich J, Alhenc-Gelas M, Gandrille S: Protein C and protein S deficiencies. Semin Hematol 1997, 34: 205–216.PubMedGoogle Scholar
- Dahlback B: The protein C anticoagulant system: Inherited defects as basis for venous thrombosis. Thromb Res 1995, 77: 1–43. 10.1016/0049-3848(94)00138-4View ArticlePubMedGoogle Scholar
- Reitsma PH, Bernardi F, Doig RG, Gandrille S, Greengard JS, Ireland H, Krawczak M, Lind B, Long GL, Bertina RM: Protein C deficiency: a database of mutations, 1995 update. On behalf of the subcommittee on plasma coagulation inhibitors of the scientific and standardization committee of the ISTH. Thromb Haemost 1995, 73: 876–889.PubMedGoogle Scholar
- Database of mutations – Protein C[http://www.xs4all.nl/~reitsma/Prot_c_intro.html]
- Human Gene Mutation Database, HGMD[http://www.hgmd.cf.ac.uk/.]
- Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN: Human Gene Mutation Database (HGMD ® ): 2003 Update. Hum Mutat 2003, 21: 577–581. 10.1002/humu.10212View ArticlePubMedGoogle Scholar
- Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A: The Swiss-Prot Variant Page and the ModSNP Database: A Resource for Sequence and Structure information on Human Protein Variants. Hum Mutat 2004, 23: 464–470. 10.1002/humu.20021View ArticlePubMedGoogle Scholar
- Faioni EM, Hermida J, Rovida E, Razzari C, Asti D, Zeinali S, Mannucci PM: Type II protein C deficiency: Identification and molecular modelling of two natural mutants with low anticoagulant and normal amidolytic activity. Br J Haematol 2000, 108: 265–271. 10.1046/j.1365-2141.2000.01848.xView ArticlePubMedGoogle Scholar
- Rovida E, Merati G, D'Ursi P, Zanardelli S, Marino F, Fontana G, Castaman G, Faioni EM: Identification and computationally based structural interpretation of naturally occurring variants of human protein C. Hum Mutat 2006, in press.Google Scholar
- Higgins DG, Sharp PM: CLUSTAL: a package for performing multiple sequence alignments on a microcomputer. Gene 1988, 73: 237–244. 10.1016/0378-1119(88)90330-7View ArticlePubMedGoogle Scholar
- Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraint. J Mol Biol 1993, 234: 779–815. 10.1006/jmbi.1993.1626View ArticlePubMedGoogle Scholar
- Kraulis J: MOLSCRIPT: A Program to Produce Both Detailed and Schematic Plots of Protein Structures. J Appl Cryst 1991, 24: 946–950. 10.1107/S0021889891004399View ArticleGoogle Scholar
- Cortona VRML Client[http://www.parallelgraphics.com/products/cortona]
- Apache Web Server[http://www.apache.org/]
- Foster DC, Yoshitake S, Davie EW: The nucleotide sequence of the gene for human protein C. Proc Natl Acad Sci USA 1985, 82: 4673–4677. 10.1073/pnas.82.14.4673PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.