ASMPKS: an analysis system for modular polyketide synthases
© Tae et al; licensee BioMed Central Ltd. 2007
Received: 19 April 2007
Accepted: 03 September 2007
Published: 03 September 2007
Polyketides are secondary metabolites of microorganisms with diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. Polyketides are synthesized by serialized reactions of a set of enzymes called polyketide synthase(PKS)s, which coordinate the elongation of carbon skeletons by the stepwise condensation of short carbon precursors. Due to their importance as drugs, the volume of data on polyketides is rapidly increasing and creating a need for computational analysis methods for efficient polyketide research. Moreover, the increasing use of genetic engineering to research new kinds of polyketides requires genome wide analysis.
We describe a system named ASMPKS (Analysis System for Modular Polyketide Synthesis) for computational analysis of PKSs against genome sequences. It also provides overall management of information on modular PKS, including polyketide database construction, new PKS assembly, and chain visualization. ASMPKS operates on a web interface to construct the database and to analyze PKSs, allowing polyketide researchers to add their data to this database and to use it easily. In addition, the ASMPKS can predict functional modules for a protein sequence submitted by users, estimate the chemical composition of a polyketide synthesized from the modules, and display the carbon chain structure on the web interface.
ASMPKS has powerful computation features to aid modular PKS research. As various factors, such as starter units and post-processing, are related to polyketide biosynthesis, ASMPKS will be improved through further development for study of the factors.
As many infectious microorganisms have been acquiring tolerance to antibiotics, the need for novel antibiotics is increasing. Modification of known antibiotics is a more efficient approach than finding microorganisms with new kinds of antibiotic activities, and various methods are being used to develop novel antibiotics, including gene manipulation for biosynthesis of antibiotics such as polyketides.
Polyketides are secondary metabolites of many kinds of microorganisms [1, 2] with diverse biological functions, including pharmacological activities such as antibiotic, antitumor and agrochemical properties. While the polyketide antibiotics are important clinical drugs such as avermectin [3, 4] and erythromycin , new kinds of polyketides are still being discovered. Polyketides are synthesized by serialized reactions of a set of enzymes called polyketide synthase(PKS)s , which coordinate the elongation of carbon skeletons by the stepwise condensation of short carbon precursors . PKSs are classified into two types, depending on the organization of their active sites, namely modular [8, 9] and iterative [10, 11] PKSs. Modular PKSs are multifunctional enzymes composed of so-called modules [12, 13], which noniteratively process serialized steps of polyketide chain elongation. Iterative PKSs are multiple enzyme complexes that iteratively perform a set of activities used in various condensation and chain elongation steps. PKS genes that synthesize a polyketide are usually clustered on a genome, and so the chemical structure of the polyketide can be modified through gene manipulation [14, 15]. For example, the chemical structure of a polyketide can be modified by changing the genes related to starting units [16, 17], extender units , the reduction of carbonyl carbons , the length of the carbon chain , or post-processing. In addition, genetic engineering, whether by the fusion of different polyketides  or their biosynthesis in heterologous hosts , has been shown recently to improve polyketide activities.
Although polyketides have been shown to be important antibiotics, few computational analysis systems have been developed. A lack of adequate polyketide databases has caused researchers to study many chemical databases to collect polyketide data. Computational analysis for polyketide data can reduce the time and the cost of the research. PKSDB , developed by the National Institute of Immunology (NII) of India, contains data on 20 modular polyketides. While it has some visualization and analysis components, more features and functions are needed, including an efficient database system to manage and analyze polyketide data.
Catalytic domains, which organize PKS modules with their linkers , are defined according to their function. The core domains related to carbon chain elongation include an acyltransferase domain (AT), which selects and transfers extender units; an acyl carrier protein (ACP), which tethers the growing polyketide chain to the PKS for condensations; and a ketoacyl synthase (KS) domain, for decarboxylative condensations. Additional domains participate in the modification of the carbonyl group; these include ketoreductase (KR), dehydratase (DH) and enoyl reductase (ER) domains. A thioesterase (TE) domain catalyzes the release of the polyketide product from the last PKS participating in chain elongation. We have developed ASMPKS (an Analysis System for Modular Polyketide Synthesis) to efficiently support computational analysis of the modular PKS in genome sequences. ASMPKS operates as a web application and provides various features including visualization of polyketide structures, new PKS assembly simulation and management of the modular polyketide database. Homology search, multi-alignment of domains and polyketide prediction are also available. BLAST  and ClustalW  were used for the homology search and multi-alignment, respectively. An important feature of ASMPKS is its genome analysis component, which finds known PKS clusters and predicts putative PKS clusters in given microbial genome sequences. This component analyzes several genomes and stores the result in the database, which is then displayed by the genome browser. PKS analysis against genome sequences can make it very easy to find known PKSs and to select novel candidate PKS genes.
PKS search against microbial genome sequences
ASMPKS searches microbial genome sequences for modular PKSs. It detects PKS gene clusters producing known polyketides, which are included in the database, by measuring the homology between protein sequences of an annotated genome and all PKS sequences, and predicts unknown gene clusters to produce putative polyketide candidates by identifying domains. It can accept genome sequences or GenBank format data, including gene information. If genome sequences are submitted, genes are predicted by Glimmer , and their sequences are converted to proteins.
In the second process, the remaining proteins are aligned with template sequences of KS, AT, DH, ER and KR domains, each of which has the highest similarity with other sequences belonging to the same domain type. From the result of BLAST, the fragments with e-values less than 0.000001 are taken as domains, and the proteins with valid domain compositions, at least 'KS-AT-ACP', are selected. If the distance between neighbouring proteins is less than 15 kilobases, they belong to the same cluster and the cluster is marked as producing a new kind of polyketide. If a protein has domains and does not belong to a cluster, it is marked as an ambiguous PKS candidate.
Domain identification from protein sequences
ASMPKS predicts domain information from protein sequences. Domain identification is based on the homology search method with template sequences of domains. To detect domains, BLAST is used. Template sequences that represent each domain type are formatted into the BLAST database file. To select each template sequence representing each domain type, homology scores between every pair of sequences of that type are measured, and the sequence with the highest score, which is the sum of its top 10 homology scores with other sequences, is selected. KS and AT from module 8 of amphotericin , DH from module 5 of nystatin , KR from module 15 of nystatin, ER from module 15 of amphotericin, ACP from module 3 of nystatin and TE from module 6 of erythromycin , were selected.
Accuracy of ASMPKS in the domain identification
No of total
No of predicted
No of correct
Results and discussion
Database of ASMPKS
ASMPKS has been developed for the overall management of modular PKS data. As it operates on the web interface to construct the database and to analyze polyketides, the extension of database is very accessible for multiple users. Researcher can add and delete their data in the database easily. The database system of ASMPKS is divided into two parts. The first part, which has been constructed with published data, contains information regarding PKS genes, modules, domains and assembly. It is used to search and to align domains of protein sequences. The second part, which contains genome data of microorganisms and polyketide information related to the genomes, allows researchers to study synthesis of polyketides in a specified microorganism.
The ASMPKS database includes the PKSDB data, and more entries are being added by an updating interface. ASMPKS searches domain and module data against genome sequences, reports known PKS clusters producing known polyketides and predicts unknown gene clusters of putative polyketide candidates. The analysis results are stored in the database to be used in the next PKS analysis.
Visualization of detailed polyketide information
The domain button has a hyperlink to homology search and multiple sequence alignment components for the analysis of domain similarity relationships. BLAST is used for homology analysis between the same type domains. When a domain is selected, ASMPKS uses BLAST to search all domains of the same type in the database and displays a list of all homologous domains (Figure 2B). ClustalW is used for multiple sequence alignments. When a domain type is selected, all domain sequences of the type in the database can be added to the list to be aligned (Figure 2C). These two similarity analysis features are also applied to the domains of unidentified PKSs in domain identification and genome analysis components. Researchers can compare their polyketide data with the public data in the ASMPKS and predict the structure or activity of their novel polyketides.
Simulation of the assembly for predicted PKSs and construction of an expected chain
If the module sequence of PKSs for a predicted polyketide is completed, the carbon body of the polyketide can be predicted. The chemical structure of a polyketide makes its chemical activity easily understood. The ASMPKS provides a PKS assembly component, which assembles a set of modules and shows the construction of an expected carbon body for a predicted polyketide. The biosynthesis of a polyketide begins by selecting a starter unit and continues by adding many extender units onto the carbon chain until a TE domain appears. As there are various kinds of starter and extender units, diverse polyketides can be constructed by their combination. In this version, the starter units include 2-Methylbutyryl-CoA, 3,4-DHCHC-CoA, 3,5-AHBA-CoA, 3-Amino-2-Methylpropionate, 3-Methylbutyryl-CoA, Acetoacetyl-CoA, Acetyl-CoA, Benzoyl-CoA, Butyryl-CoA, Cyclohexanecarboxylic acid, Glycine, Glycolate, p-Aminobenzoate, p-Coumaroyl-CoA, p-Nitrobenzoate, Propionyl-CoA and trans-1,2-CPDA, and the extenders include malonyl-CoA, methylmalonyl-CoA, Ethylmalonyl-CoA and Hydroxymalonyl-CoA, both of which are stored in the database. The feature to adding new starter and extender units to the database via the user interface will be available in ASMPKS.
PKS visualization on a genome browser
The detailed genome and gene information can be shown by our genome annotation system called WeGAS. The integration of two systems aids researchers to get useful data of PKSs without searching public databases such as NCBI.
Updating polyketide entry data of the database
ASMPKS has been developed for computational analysis of the modular PKS for genome sequences. It also provides overall management of information on modular PKS, including PKS database construction, new PKS assembly, and visualization of polyketide structures. It is a useful system to analyze known polyketides and to predict new polyketides. The PKS assembly and genome analysis components are especially powerful computation tools for polyketide research.
As various factors are related to polyketide biosynthesis, ASMPKS can be improved through further study. Biological activities of many polyketides are critically related to post-synthetic processing steps, such as O-methylation, alkylation, cyclization and glycosylation, which give polyketides diverse attributes. ASMPKS will include the information and analysis features for post-synthetic processing. In addition, while the current version constructs only the carbon body under the PKS assembly component, the procedure displaying a completed chemical structure of an expected polyketide from the new module composition will be added later. With the addition of new features, it will aid in more efficient research on polyketides. ASMPKS is a system to analyze modular PKSs, and it does not provide the analysis features for other PKS types. Because many polyketides are produced by PKS-NRPS hybrid enzymes or iterative PKSs, the analysis components for these types are necessary. ASMPKS will be improved as an overall management system of polyketides including other PKS types.
Availability and requirements
Project name: modular PKS analysis system
Project home page: http://gate.smallsoft.co.kr:8008/~hstae/asmpks/index.html
Operating system(s): Linux
Programming language: Perl
Other requirements: Apache web server, Mysql 4.1 or higher, GD-2.30 or higher
Any restrictions to use by non-academics: yes, contact the author HT for details.
acyl carrier protein
nonribosomal peptide synthetases
This work was supported by the grant from MOCIE(Ministry Of Commerce, Industry and Energy) of south Korea.
- Silakowski B, Schairer HU, Ehret H, Kunze B, Weinig S, Nordsiek G, Brandt P, Blöcker H, Höfle G, Beyer S, Müller R: New lessons for combinatorial biosynthesis from myxobacteria. The myxothiazol biosynthetic gene cluster of Stigmatella aurantiaca DW4/3–1. J Biol Chem 1999, 274: 37391–37399. 10.1074/jbc.274.52.37391View ArticlePubMedGoogle Scholar
- Kakavas SJ, Katz L, Stassi D: Identification and characterization of the niddamycin polyketide synthase genes from Streptomyces caelestis . J Bacteriol 1997, 179: 7515–7522.PubMed CentralPubMedGoogle Scholar
- Ikeda H, Nonomiya T, Usami M, Ohta T, Omura S: Organization of the biosynthetic gene cluster for the polyketide antihelmintic macrolide avermectin in Streptomyces avermitilis . Proc Natl Acad Sci USA 1999, 96: 9509–9514. 10.1073/pnas.96.17.9509PubMed CentralView ArticlePubMedGoogle Scholar
- Omura S, Ikeda H, Ishikawa J, Hanamoto A, Takahashi C, Shinose M, Takahashi Y, Horikawa H, Nakazawa H, Osonoe T, Kikuchi H, Shiba T, Sakaki Y, Hattori M: Genome sequence of an industrial microorganism Streptomyces avermitilis : deducing the ability of producing secondary metabolites. Proc Nat Acad Sci USA 2001, 98: 12215–12220. 10.1073/pnas.211433198PubMed CentralView ArticlePubMedGoogle Scholar
- Aparicio JF, Caffrey P, Marsden AFA, Staunton J, Leadlay PF: Limited proteolysis and active-site studies of the first multienzyme component of the erythromycin-producing polyketide synthase. J Biol Chem 1994, 269: 8524–8528.PubMedGoogle Scholar
- Staunton J, Weissman KJ: Polyketide biosynthesis: a millennium review. Nat Prod Rep 2001, 18: 380–416. 10.1039/a909079gView ArticlePubMedGoogle Scholar
- Cheng YQ, Tang GL, Shen B: Type I polyketide synthase requiring a discrete acyltransferase for polyketide biosynthesis. Proc Natl Acad Sci USA 2003, 100: 3149–3154. 10.1073/pnas.0537286100PubMed CentralView ArticlePubMedGoogle Scholar
- Wiesmann KE, Cortes J, Brown MJ, Cutter AL, Staunton J, Leadlay PF: Polyketide synthesis in vitro on a modular polyketide synthase. Chem Biol 1995, 2: 583–589. 10.1016/1074-5521(95)90122-1View ArticlePubMedGoogle Scholar
- Gaitatzis M, Silakowski B, Kunze B, Nordsiek G, Blöcker H, Höfle G, Müller R: The biosynthesis of the aromatic myxobacterial electron transport inhibitor stigmatellin is directed by a novel type of modular polyketide synthase. J Biol Chem 2002, 277: 13082–13090. 10.1074/jbc.M111738200View ArticlePubMedGoogle Scholar
- Ansari MZ, Yadav G, Gokhale RS, Mohanty D: NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases. Nucleic Acids Res 2004, 32: W405-W413. 10.1093/nar/gkh359PubMed CentralView ArticlePubMedGoogle Scholar
- Shen B, Hutchinson CR: Deciphering the mechanism for the assembly of aromatic polyketides by a bacterial polyketide synthase. Proc Natl Acad Sci USA 1996, 93: 6600–6604. 10.1073/pnas.93.13.6600PubMed CentralView ArticlePubMedGoogle Scholar
- Rawlings BJ: Type I polyketide biosynthesis in bacteria (Part A-erythromycin biosynthesis). Nat Prod Rep 2001, 18: 190–227. 10.1039/b009329gView ArticlePubMedGoogle Scholar
- Rawlings BJ: Type I polyketide biosynthesis in bacteria (Part B). Nat Prod Rep 2001, 18: 231–281. 10.1039/b100191oView ArticlePubMedGoogle Scholar
- McDaniel R, Thamchaipenet A, Gustafsson C, Fu H, Betlach M, Betlach M, Ashley G: Multiple genetic modifications of the erythromycin polyketide synthase to produce a library of novel "unnatural" natural products. Proc Natl Acad Sci USA 1999, 96: 1846–1851. 10.1073/pnas.96.5.1846PubMed CentralView ArticlePubMedGoogle Scholar
- McDaniel R, Kao CM, Hwang SJ, Khosla C: Engineered intermodular and intramodular polyketide synthase fusions. Chem Biol 1997, 4: 667–674. 10.1016/S1074-5521(97)90222-2View ArticlePubMedGoogle Scholar
- Long P, Wilkinson C, Bisang C, Cortes J, Dunster N, Oliynyl M, McCormick E, McArthur H, Mendez C, Salas JA, Staunton J, Leadley PF: Engineering specificity of starter unit selection by the erythromycin-producing polyketide synthase. Mol Microbiol 2002, 43: 1215–1225. 10.1046/j.1365-2958.2002.02815.xView ArticlePubMedGoogle Scholar
- Moore BS, Hertweck C: Biosynthesis and attachment of novel bacterial polyketide synthase starter units. Nat Prod Rep 2002, 19: 70–99. 10.1039/b003939jView ArticlePubMedGoogle Scholar
- Wu K, Chung L, Revill WP, Katz L, Reeves CD: The Fk520 gene cluster of Streptomycetes hygroscopicus var. ascomyceticus ( ATCC 14891 ) contains genes for biosynthesis of unusual polyketide extender units. Gene 2000, 251: 81–90. 10.1016/S0378-1119(00)00171-2View ArticlePubMedGoogle Scholar
- Watanabe K, Wang CC, Boddy CN, Cane DE, Khosla C: Understanding substrate specificity of polyketide synthase modules by generating hybrid multimodular synthases. J Biol Chem 2003, 278: 42020–42026. 10.1074/jbc.M305339200View ArticlePubMedGoogle Scholar
- Broadhurst RW, Nietlispach D, Wheatcroft MP, Leadlay PF, Weissman KJ: The structure of docking domains in modular polyketide synthases. Chem Biol 2003, 10: 723–731. 10.1016/S1074-5521(03)00156-XView ArticlePubMedGoogle Scholar
- Kim BS, Sherman DH, Reynolds KA: An efficient method for creation and functional analysis of libraries of hybrid type I polyketide synthases. Protein Eng Des Sel 2004, 17: 277–284. 10.1093/protein/gzh032View ArticlePubMedGoogle Scholar
- Pfeifer BA, Khosla C: Biosynthesis of polyketides in heterologous hosts. Microbiol Mol Biol Rev 2001, 65: 106–118. 10.1128/MMBR.65.1.106-118.2001PubMed CentralView ArticlePubMedGoogle Scholar
- Yadav G, Gokhale RS, Mohanty D: Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases. J Mol Biol 2003, 328: 335–363. 10.1016/S0022-2836(03)00232-8View ArticlePubMedGoogle Scholar
- Udwary DW, Merski M, Townsend CA: A method for prediction of the locations of linker regions within large multifunctional proteins, and application to a type I polyketide synthase. J Mol Biol 2002, 323: 585–598. 10.1016/S0022-2836(02)00972-5PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673–4680. 10.1093/nar/22.22.4673PubMed CentralView ArticlePubMedGoogle Scholar
- Salzberg SL, Delcher AL, Kasif S, White O: Microbial gene identification using interpolated Markov models. Nucleic Acids Res 1998, 26: 544–548. 10.1093/nar/26.2.544PubMed CentralView ArticlePubMedGoogle Scholar
- Caffrey P, Lynch S, Flood E, Finnan S, Oliynyk M: Amphotericin biosynthesis in Streptomyces nodosus : deductions from analysis of polyketide synthase and late genes. Chem Biol 2001, 8: 713–723. 10.1016/S1074-5521(01)00046-1View ArticlePubMedGoogle Scholar
- Brautaset T, Sekurova ON, Sletta H, Ellingsen TE, Strøm AR, Valla S, Zotchev SB: Biosynthesis of the polyene antifungal antibiotic nystatin in Streptomyces noursei ATCC 11455 : analysis of the gene cluster and deduction of the biosynthetic pathway. Chem Biol 2000, 7: 395–403. 10.1016/S1074-5521(00)00120-4View ArticlePubMedGoogle Scholar
- Lee DS, Seo HJ, Tae HS, Nam HW, Park KJ: Development of a Web-based Genome Annotation System. RECOMB2003 10–13 April 2003; Berlin: poster section 10–13 April 2003; Berlin: poster section
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.