SKPDB: a structural database of shikimate pathway enzymes
© Arcuri et al; licensee BioMed Central Ltd. 2010
Received: 12 May 2009
Accepted: 7 January 2010
Published: 7 January 2010
The functional and s tructural characterisation of enzymes that belong to microbial metabolic pathways is very important for structure-based drug design. The main interest in studying shikimate pathway enzymes involves the fact that they are essential for bacteria but do not occur in humans, making them selective targets for design of drugs that do not directly impact humans.
The ShiKimate Pathway DataBase (SKPDB) is a relational database applied to the study of shikimate pathway enzymes in microorganisms and plants. The current database is updated regularly with the addition of new data; there are currently 8902 enzymes of the shikimate pathway from different sources. The database contains extensive information on each enzyme, including detailed descriptions about sequence, references, and structural and functional studies. All files (primary sequence, atomic coordinates and quality scores) are available for downloading. The modeled structures can be viewed using the Jmol program.
The SKPDB provides a large number of structural models to be used in docking simulations, virtual screening initiatives and drug design. It is freely accessible at http://lsbzix.rc.unesp.br/skpdb/.
This pathway links the metabolism of carbohydrates to the biosynthesis of aromatic compounds through seven metabolic steps, where phosphoenolpyruvate (PEP) and erythrose 4-phosphate are converted to chorismate, which in turn is the common precursor for synthesising a series of aromatic compounds, naphtoquinones, menaquinones, and mycobactins [3, 4].
Inhibition of the shikimate pathway has been effective in controlling bacterial growth , and in mycobacteria, this pathway has been shown to be essential for the viability of Mycobacterium tuberculosis [6–8].
The functional and structural characterisation of a protein sequence is one of the most frequent problems in structural molecular biology. This task is usually facilitated by an accurate three-dimensional (3D) structure of the studied protein, which is best determined by experimental methods such as X-ray crystallography and NMR spectroscopy . In the absence of an experimentally determined 3D structure, the modeling (comparative or by homology) can sometimes provide a useful 3D model for a target protein . In the present work, we used comparative modeling at a large scale for predicting protein structures through the program MODELLER .
The automation of large-scale comparative modeling involves assembling a software pipeline, which consists of modules for fold assignment, template selection, target-template alignment, model generation, and model evaluation. Computer programs for these individual operations already exist, and it may seem trivial to combine them [11, 12]. One example of large-scale comparative modeling for complete genomes has been described for sequences encoded in the Mycobacterium tuberculosis and Xylella fastidiosa genomes in the DBMODELING database [13, 14]. The challenge in large-scale comparative modeling is to build an automated, fast, robust, sensitive, and accurate pipeline applicable to whole genomes; such a pipeline should perform at least as well as a human expert on individual proteins.
However, since the accuracy of structural models is highly dependent on sequence identity between template and target, it is necessary to make clear to the user that only models presenting high structural quality should be used in such efforts. Molecular modeling of these enzymes generated the SKPDB database, in which all structural models were built by using alignments presenting more than 30% sequence identity, generating models with medium and high accuracy [10, 15].
SKPDB is a relational database of protein structure predicted by comparative modeling or solved by X-ray crystallography, applied to the study of shikimate pathway enzymes of microorganisms and plants. This database is freely accessible for all users on the Web, providing us with a large number of structural models for use in structure-based virtual screening and molecular docking analysis. Furthermore, SKPDB also provides a docking interface, which allows the user to carry out geometric docking simulations against the molecular models available in the database.
Construction and Content
Molecular modeling in large scale
Homology modeling is usually the method of choice when there is a clear relationship of homology between the sequences of a target protein and at least one experimentally determined three-dimensional structure. This computational technique is based on the assumption that tertiary structures of two proteins will be similar if their sequences are related, and it is the approach most likely to give accurate results .
The number of protein sequences that can be modeled and the accuracy of the predictions are increasing steadily due to the growth in the number of experimentally determined protein structures and because of the improvements in the modeling software. It is currently possible to model with useful accuracy significant parts of approximately one half of all known protein sequences .
The molecular modeling in this work was performed by the MODELLER version 9v4 [10, 18] program, which is a computer program for comparative protein structure modeling http://salilab.org/modeller. The program extracts atom-atom distance and dihedral angle restraints on the target from the template structure, and combines them with general rules of protein structure such as bond length and angle preferences. The model is then calculated by an optimisation procedure that minimises violations of the spatial restraints . In the simplest case, the input is an alignment of a sequence to be modeled with the template structures, the atomic coordinates of the templates and a short script file. MODELLER then automatically calculates a model containing all non-hydrogen atoms, without any user intervention and within minutes on a processor .
The MODELLER program was completely automated to calculate comparative models for a large number of protein sequences, by using many different template structures and sequence-structure alignments [12, 16, 17]. Sequence-structure matches are established by aligning SALIGN  sequence profile of the target sequence against each of the template sequences extracted from PDB . Significant alignments covering distinct regions of the target sequence are chosen for modeling. Models are calculated for each one of the sequence-structure matches by using MODELLER . The models consist of coordinates for all non-hydrogen atoms in the modeled part of a protein . For each enzyme in the SKPDB, a total of 1000 models were generated and the final models were selected based on stereochemical quality and objective function by MODELLER. The final models were then evaluated by composite model quality criteria (see topic Analysis tools).
Difficult cases in homology modeling correspond to protein sequences that only possess distant homologues of known structure. In such cases, incorrect alignment can lead to regions of a model that have significant structural errors. The quality of the predicted model determines the information that can be extracted from it. Thus, estimating the accuracy of 3D protein models is essential for interpreting them. The model can be evaluated as a whole as well as in the individual regions .
The overall stereochemical quality and the evaluation of the final model were performed by the programs PROCHECK  and WHATCHECK. These programs were used to check bond lengths, bond angles, peptide bonds and side-chain ring planarities, chirality, main-chain and side-chain torsion angles. Another quality score used in the analysis of the structural model was the G-factor, which is essentially just a log-odds score based on the observed distributions of the stereochemical parameters performed by the program PROCHECK . The root mean square deviations (RMSD) from ideal geometries for bond lengths, bond angles, dihedrals and impropers were extracted for each model by using the program X-PLOR , and the program VERIFY-3D was used to measure the compatibility of a protein model with its sequence by using a 3D profile [24, 25]. These programs were used to assess the quality of the available models and can be accessed by any user in the SKPDB web page for each enzyme.
Web SKPDB platform
All entries in SKPDB were sourced from Swiss-Prot/UniprotKB  protein sequence database and PDB  protein structure database. Initially, exhaustive queries were made to Swiss-Prot/UniprotKB, returning more than 10.000 enzymes of shikimate pathways from different organisms. The process of building SKPDB is shown in figure 3. The enzyme data were then filtered to exclude redundancy, errors, and incomplete data. Then the data were included into a single composite non-redundant database.
SKPDB is a relational database of protein structures predicted by comparative modeling or solved by crystallography, applied to the study of shikimate pathway enzymes. Each entry in SKPDB provides information about a given enzyme, including: (1) a detailed description of the enzyme, (2) the primary sequence of the enzyme, (3) the structure model of the enzyme, (4) the chemical properties of the enzyme, (5) references about the enzyme, and (6) comments and miscellaneous information. All files (primary sequence, atomic coordinates and quality values) are available for downloading. This database is available for all users on the Web, providing a large amount of structural models to be used in virtual screening initiatives and molecular docking.
The SKPDB is regularly updated with the addition of new data and tools about shikimate pathway enzymes. A click on the links opens a new window that displays more detailed information for the selected enzyme, in different biological databases such as Swiss-Prot/UniprotKB, PDB, KEGG, BRENDA, IUMB, and PUBMED, among others. The enzyme records page contains primary sequence and structure of the model, information about alignment, analysis of target models such as PROCHECK, G-factor and the values of the RMSD from ideal geometry.
Description table content in SKPDB
Utility and Discussion
How to use the SKPDB?
Searching tool in the SKPDB
The homepage offers to the user different ways for searching the database (Figure 5). The user can search in the SKPDB for a specific enzyme, just by using the Swiss-Prot+UniprotKB access number code. SKPDB also can be searched using the field "organism" or "name of enzyme", or even using a combination of both fields. A query is formed by selecting one radio button, and the blank text forms are entered with keywords or strings, such as a partial or full name of the enzyme or organism.
When there is neither an experimentally determined structure nor templates to generate the homology model, a warning message is given.
Output is generated in an html page too. The tab "Sequence Info" displays general information about the enzyme such as molecular weight, organism, taxonomy, size of the enzyme (aa), a download of the primary sequence, and a link to the Swiss-Prot+UniprotKB to increase accessibility to other types of information on the enzyme (Figure 8).
Large scale protein homology modeling, in which whole sequence databases or whole genomes are used as inputs into automated modeling algorithms, has been reported by several groups [14, 27]. By utilising powerful computer systems with multiple processors, these efforts have allowed the creation of large databases of homology models of proteins. This work resulted in the development of SKPDB, which is a useful tool in structural biology that provides annotating sequence information that contributes to structural biology and functional studies of shikimate pathway enzymes for drug development purposes. If a 3D model of the protein of interest can be derived, it may be usable as the basis for a structure-based drug-design study. In addition to this, such models can also be useful to the rational design of experiments such as site-directed mutagenesis.
Availability and requirements
Project Home Page: http://lsbzix.rc.unesp.br/skpdb
Operating Systems: Linux Fedora
License: the SKPDB is publically accessible viacite
This research was supported by grants from FAPESP (Proc. no. 04/10318-9), CNPq and Instituto do Milênio/FINEP-PRONALMO (CNPq-MCT) (Ref. 3717/06); Instituto Nacional de Ciência e Tecnologia (MCT-CNPq, Brazil). MSP and WFAJ are researchers of the Brazilian Council for Scientific and Technological Development (CNPq).
- Jacobson M, Sali A: Comparative Protein Structure Modeling and its Applications to Drug Discovery. Annu Rep Medic Chem 2004, 39: 259–276. full_textView ArticleGoogle Scholar
- Bentley R: The shikimate pathway - A metabolic tree with many branches. Crit Rev Biochem Mol Biol 1990, 25: 307–384. 10.3109/10409239009090615View ArticlePubMedGoogle Scholar
- Herman KM, Weaver LM: The shikimate pathway. Annu Rev Plant Mol Biol 1999, 50(3):473–503. 10.1146/annurev.arplant.50.1.473View ArticleGoogle Scholar
- De Azevedo WF Jr: Targets for development of drugs against orphan diseases. Current Drug Targets 2007, 8(3):387–387. 10.2174/138945007780058960View ArticleGoogle Scholar
- Roberts CW, Roberts F, Lyons RE, Kirisits MJ, Mui EJ, Finnerty J, Johnson JJ, Ferguson DJ, Coggins JR, Krell T, Coombs GH, Milhous WK, Kyle DE, Tzipori S, Barnwell J, Dame JB, Carlton J, McLeod R: The shikimate pathway and its branches in apicomplexan parasites. J Infect Dis 2002, 185: S25-S36. 10.1086/338004View ArticlePubMedGoogle Scholar
- Ratledge C, Stanford JL: The biology of the Mycobacteria. In Nutrition, growth and metabolism. Volume 1. Edited by: Ratledge C, Stanford JL. London: Academic Press; 1982:185–271.Google Scholar
- Arcuri HA, Borges JC, Fonseca IO, Pereira JH, Neto JR, Basso LA, Santos DS, de Azevedo WF Jr: Structural studies of shikimate 5-dehydrogenase from Mycobacterium tuberculosis. Proteins 2008, 72: 720–730. 10.1002/prot.21953View ArticlePubMedGoogle Scholar
- Marques MR, Vaso A, Neto JR, Fossey MA, Oliveira JS, Basso LA, dos Santos DS, de Azevedo WF Jr, Palma MS: Dynamics of glyphosate-induced conformational changes of Mycobacterium tuberculosis 5-enolpyruvylshikimate-3-phosphate synthase (EC 126.96.36.199) determined by hydrogen-deuterium exchange and electrospray mass spectrometry. Biochemistry 2008, 47(28):7509–7522. 10.1021/bi800134yView ArticlePubMedGoogle Scholar
- Brenner SE, Levitt M: Expectations from structural genomics. Proteins Sci 2000, 9: 197–200.View ArticleGoogle Scholar
- Xiang Z: Advances in Homology Protein Structure Modeling. Curr Protein Pept Sci 2006, 7(3):217–227. 10.2174/138920306777452312View ArticlePubMedPubMed CentralGoogle Scholar
- Sali A, Blundell TL: Comparative Protein Modelling by Satisfaction of Spatial Restraints. J Mol Biol 1993, 234: 779–815. 10.1006/jmbi.1993.1626View ArticlePubMedGoogle Scholar
- Sánchez R, Sali A: Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 1998, 95: 13597–13602. 10.1073/pnas.95.23.13597View ArticlePubMedPubMed CentralGoogle Scholar
- Silveira NJ, Uchôa HB, Pereira JH, Canduri F, Basso LA, Palma MS, Santos DS, de Azevedo WF Jr: Molecular models of protein targets from Mycobacterium tuberculosis . J Mol Model 2005, 11: 160–166. 10.1007/s00894-005-0240-2View ArticlePubMedGoogle Scholar
- Allen FH, Bellard S, Brice MD, Cartwright BA, Doubleday A, Higgs H, Hummelink T, Hummelink-Peters BG, Kennard O, Motherwell WDS, Rodgers JR, Watson DG: The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst 1979, B35: 2331–2339.View ArticleGoogle Scholar
- Bourne PE, Weissig H: Homology Modeling. In Structural Bioinformatics. Edited by: Bourne PE, Weissig H. Hoboken, NJ: Wiley-Liss; 2003:509–524.View ArticleGoogle Scholar
- Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A: Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 2000, 29: 291–325. 10.1146/annurev.biophys.29.1.291View ArticlePubMedGoogle Scholar
- Pieper U, Eswar N, Braberg H, Madhusudhan MS, Davis FP, Stuart AC, Mirkovic N, Rossi A, Marti-Renom MA, Fiser A, Webb B, Greenblatt D, Huang CC, Ferrin TE, Sali A: ModBase, a database of annotated comparative protein structure models. Nucleic Acids Res 2002, 30: 255–259. 10.1093/nar/30.1.255View ArticlePubMedPubMed CentralGoogle Scholar
- Fiser A, Do RK, Sali A: Modeling of loops in protein structures. Protein Sci 2000, 9: 1753–1773. 10.1110/ps.9.9.1753View ArticlePubMedPubMed CentralGoogle Scholar
- Eswar N, John B, Mirkovic N, Fiser A, Ilyin VA, Pieper U, Stuart AC, Marti-Renom MA, Madhusudhan MS, Yerkovich B, Sali A: Tools for comparative protein structure modeling and analysis. Nucleic Acids Res 2003, 31(13):3375–3380. 10.1093/nar/gkg543View ArticlePubMedPubMed CentralGoogle Scholar
- Needleman S, Wunsch C: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48: 443–453. 10.1016/0022-2836(70)90057-4View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235View ArticlePubMedPubMed CentralGoogle Scholar
- Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK - A program to check the stereochemical quality of protein structures. J Appl Cryst 1993, 26: 283–291. 10.1107/S0021889892009944View ArticleGoogle Scholar
- Brünger A T: X-PLOR version 3.1. A System for X-ray Crystallography and NMR. New Haven: Yale University Press; 1992.Google Scholar
- Lüthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Proteins 1991, 10: 229–239. 10.1002/prot.340100307View ArticlePubMedGoogle Scholar
- JU Bowie, R Luthy, D Eisenberg: A method to identify protein sequences that fold into a known three- dimensional structure. Science 1991, 253: 164–170. 10.1126/science.1853201View ArticleGoogle Scholar
- The UniProt Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 2009, 37: D169-D174. 10.1093/nar/gkn664View ArticlePubMed CentralGoogle Scholar
- da Silveira NJF, Bonalumi CE, Arcuri HA, de Azevedo WF Jr: Molecular Modeling Databases: A New Way in the Search of Protein Targets for Drug Development. Current Bioinformatics 2007, 2: 1–10. 10.2174/157489307779314320View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.