AntiBP2: improved version of antibacterial peptide prediction
© Lata et al. 2010
Published: 18 January 2010
Skip to main content
© Lata et al. 2010
Published: 18 January 2010
Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, which were collected from Antibacterial Peptide Database (APD). We have also developed methods to predict and classify these antibacterial peptides using Support Vector Machine (SVM).
During analysis we observed that certain residues are preferred over other in antibacterial peptide, particularly at the N and C terminus. These observation and increased data of antibacterial peptide in APD encouraged us to again develop a new and more robust method for predicting antibacterial peptides in protein from their amino acid sequence or given peptide have antibacterial properties or not. First, the binary patterns of the 15 N terminus residues were used for predicting antibacterial peptide using SVM and achieved accuracy of 85.46% with 0.705 Mathew's Correlation Coefficient (MCC). Then we used the binary pattern of 15 C terminus residues and achieved accuracy of 85.05% with 0.701 MCC, latter on we developed prediction method by combining N & C terminus and achieved an accuracy of 91.64% with 0.831 MCC. Finally we developed SVM based model using amino acid composition of whole peptide and achieved 92.14% accuracy with MCC 0.843. In this study we used five-fold cross validation technique to develop all these models and tested the performance of these models on an independent dataset. We further classify antibacterial peptides according to their sources and achieved an overall accuracy of 98.95%. We further classify antibacterial peptides in their respective family and got a satisfactory result.
Among antibacterial peptides, there is preference for certain residues at N and C terminus, which helps to discriminate them from non-antibacterial peptides. Amino acid composition of antibacterial peptides helps to demarcate them from non-antibacterial peptide and their further classification in source and family. Antibp2 will be helpful in discovering efficacious antibacterial peptide, which we hope will be helpful against antibiotics resistant bacteria. We also developed user friendly web server for the biological community.
In the past few decades, a large number of bacterial strains have evolved ways to adapt or become resistant to the currently available antibiotic . The widespread resistance of bacterial pathogens to conventional antibiotics has prompted renewed interest in the use of alternative natural microbial inhibitors such as antimicrobial peptides. Antimicrobial peptides (AMPs) are a family of host-defense peptides most of which are gene-encoded and produced by living organisms of all types [2–8]. Antimicrobial peptides (AMPs) are small molecular weight proteins with broad spectrum antimicrobial activity against bacteria, viruses, and fungi [3, 10]. These evolutionarily conserved peptides are usually positively charged and have both a hydrophobic and hydrophilic side that enables the molecule to be soluble in aqueous environments yet also enter lipid-rich membranes. Once in a target microbial membrane, the peptide kills target cells through diverse mechanisms .
Antimicrobial peptides have a broad spectrum of activity and can act as antibacterial, antifungal, antiviral and sometimes even as anticancer peptide . These antibacterial peptides have other properties like antibacterial activity, mitogen activity or act as signaling molecules including pathogen-lytic activities . Extensive work has been done in the field of antibacterial peptide, describing their identification, characterization, mechanism of action etc. keeping in mind their numerous biotechnological applications [11–13]. Lot of work has been done to collect and compile these peptides in form of a database [14–17].
These antibacterial peptides have very low sequence homology, despite their common function . Previously we developed a very robust method AntiBP , for predicting antibacterial peptide using SVM, QM (quantitative matrix) and artificial neural network (ANN). Growth of antibacterial peptides in APD database in the last 2 years motivated us to develop a prediction method based on the newer and larger (almost double) dataset. We once again analyzed the antibacterial peptides and developed SVM based models to predict antibacterial peptides, because our previous study show that SVM over perform than other method. In AntiBP2 we also extracted clean dataset of antibacterial peptide families from Swiss-Prot and developed classification models for them. In the following text, we first discuss the method developed to distinguish antibacterial peptides from non-antibacterial peptides (prediction part) and in the next step describe the method for classifying these peptides on the basis of source and classes (classification part).
Performance of prediction methods developed on NT15, CT15, NTCT15 and whole peptide dataset.
Performance of NT15, CT15, NTCT15 and whole peptide dataset model on independent dataset.
Peformance SVM models in classification of antibacterial peptides according to their source.
Classification of insect antibacterial peptides into families.
Classification of frog antibacterial peptides into families.
Classification of mammal antibacterial peptides into families.
A great deal of interest is shown nowadays in antibacterial peptides or the so called "nature's antibiotics", which seem to be promising to overcome the growing problem of antibiotic resistance [23–25]. The design of novel peptides with antimicrobial activities requires the development of methods for narrowing down the candidate peptides so as to enable rational experimentation by wet-lab scientists. Attempts have been made to develop methods and strategies for designing effective antimicrobial peptides [26, 27]. AntiBP is one such method meant to discover efficacious antibacterial peptides that we hope could prove to be a boon to combat the dreadful antibiotic resistant bacteria. Enormous growth of antibacterial peptide data in the databases motivated us to develop an improved version of AntiBP using the same strategy. The new version was name AntiBP2.
The N and C terminus sequence logos of AntiBP2 dataset were almost similar to those in the previous method AntiBP. This indicates that though there seems to be an absence of great homology or conservation among antibacterial peptides but the pattern of positional preference of certain residues remains constant. We once again developed the prediction method to classify antibacterial peptide from the non-antibacterial peptide. But this time the method was developed using a training data that was double in size to the one previously used. We developed both whole peptide based compositional models as well as binary pattern based terminus approaches. This time we retained the whole peptide based method also as it becomes difficult to predict peptides that are less than 15 residues in length by the binary pattern based terminal models. In this method also we achieved impressive results with all the above approaches but the best performers were the NTCT15 and whole peptide based prediction models (achieving ~91% accuracy). This was followed by the NT15 based prediction model while the CT15 based model being the poorest performer among all. This trend is just similar to what was seen in AntiBP. The performance evaluation of prediction models on the independent dataset followed the trend shown during development of prediction models (in sync with the trend followed by the AntiBP method). The NTCT15 model performed the best followed by NT15 and CT15 models in respective order.
In AntiBP2 we have also developed models that could classify antibacterial peptides further into families with high accuracy. First we successfully made an attempt to develop classification models that could assign the source of origin to predicted antibacterial peptides. The classification models to classify the antibacterial peptides further into corresponding families were also developed. The results attained in all the classification methods clearly indicate that although the antibacterial peptides do no show a greater conservation or homology, but they become more and more as we go down to the level of a particular family. This is evident from the high accuracies achieved for each family in various classification models. Therefore, AntiBP2 is an efficient method that can predict and classify the antibacterial peptides. We hope that our method would help the wet lab scientists to design improved and efficacious antibacterial peptides in future.
There is a rapid growth in the field of antibacterial peptide research in response to the demand for novel antibacterial agents. AntiBP2 is one such efficient method that can predict and classify the antibacterial peptides and help to find newer antibacterial peptides more speedily and conveniently. We hope that our method would promote the research to design improved and efficacious antibacterial peptides in future.
The positive dataset for this method was once again fetched from the antimicrobial peptide database APD . We retrieved a total of 999 unique antibacterial peptides from this database. We used this dataset to build the whole peptide composition based SVM models to predict antibacterial peptides of any length.
As there is no source of experimentally proven non-antibacterial peptides, so we adopted the same strategy that was used to generated the negative dataset in AntiBP. We chose to extract random peptides from proteins belonging to all intracellular locations except from the secretary proteins (because antibacterial peptides are mostly secreted outside the cell). Though some of these randomly selected peptides could be antibacterial in nature but the possibilities are remote. To do this we used the data which was used in MitPred . MitPred dataset had proteins belonging to various intracellular locations (nucleus, cytoplasm, ER, golgi complex, mitochondria). These proteins were then mixed and shuffled thoroughly so that the negative dataset does not have overrepresentation of proteins belonging to any particular location. Now we selected those proteins that were >100 amino acids in length. This was done as many of the antibacterial peptides in the positive dataset having >90 residues in length. Now for peptide in the positive dataset, we calculated its length and cut a random peptide of corresponding length from the negative dataset protein. Thus we got 999 negative peptides in result.
We created NT15 and CT15 datasets by taking first fifteen and last fifteen residues respectively from the antibacterial peptides as done in AntiBP . For NTCT15 dataset we concatenated the CT15 peptides with their corresponding NT15 counterparts. To reduce the redundancy in the positive dataset, duplicates were removed and we were left with 782 NT15, 786 CT15 peptides and 861 NTCT15 peptides.
The strategy to generate the negative datasets for NT15, CT15 and NTCT15 datasets was the same as used in AntiBP. Once again the dataset having thoroughly mixed and shuffled proteins belonging to various subcellular locations was taken. For NT15 and CT15 negative datasets 15 residues long peptides were cut randomly from this dataset. From these peptides we selected 786 peptides to be used as negative dataset against both, NT15 and CT15 datasets. The negative dataset for NTCT15 dataset was created by extracting 861 random peptides (30 residues in length) from the non-secretary protein dataset.
These datasets for classification of antibacterial peptides were extracted from the protein sequence database Swiss-Prot. These include peptides belonging to bacteria, insects, frogs, mammals and peptides categories into plants. The antibacterial peptides belonging to insects further belonged to 5 families i.e. apidaecin, attacin, cecropins, invertebrate defensins and lebocin. The antibacterial peptides belonging to mammals contained alpha-defensin, beta-defensin, cathelicidin, hepcidin and histatin. Frog antibacterial peptides also had sequences from bombinin, brevinin, caerin, dermaseptin, dermorphin, phylloseptin, pleurain, tryptophillin. As the number of peptides in dermorphin, phylloseptin, pleurain and tryptophillin were very less therefore, these were combined into a single class named as "Other".
We took 466 peptides from the family classification dataset (which was fetched from Swiss-Prot) which were not present in our main dataset (taken from APD database). This dataset was not used either for training or testing the method. These peptides served as the independent dataset for evaluating the performance of the prediction models.
As the SVM based technique performed the best in the method AntiBP , we therefore exploited SVM to develop the prediction method in this case. In this study, all SVM models have been developed using a freely available program SVM_Light . This program allows users to run SVM using various kernels and parameters. In this study, the accuracy was computed at a cut-off score where sensitivity and specificity are nearly equal.
Where, TP and TN are correctly predicted antibacterial peptides and non-antibacterial peptides respectively. FP and FN are wrongly predicted antibacterial peptides and non-antibacterial peptides respectively. Sensitivity (Sn) or percent coverage of antibacterial peptide is the percentage of antibacterial peptide predicted as antibacterial peptide; specificity (Sp) or percent coverage of non-antibacterial is the percentage of non-antibacterial peptide predicted as non-antibacterial peptide; overall accuracy (Ac) is the percentage of correctly predicted antibacterial and non antibacterial. The five fold cross validation technique was used for evaluation of all the three methods.
Though it is seen that the terminus approaches are useful to scan the antibacterial peptide in a larger protein sequence but it becomes difficult of predict peptide which are less than 15 residues. Therefore, a whole peptide based SVM model was also developed in order to predict antibacterial peptides of any length. Amino acid composition of the amino acid residues was fed to train the SVM.
Again the binary patterns of NT15, CT15 and NTCT15 datasets were used to develop prediction methods as described in AntiBP. The performance was evaluated using Five-fold cross validation technique.
Multiclass SVM was exploited to develop the classification models and thus models were developed to classify the antibacterial peptides belonging to different sources e.g. Bacteria, Insect, Frog, mammals and plants. N SVMs model were constructed for N-class classification. For antibacterial peptide classification, the number of classes was equal to 5. Five 1-v-r SVMs models were constructed for classification of antibacterial peptides. The ith SVM was trained with all the samples of ith class labelled positive and all other samples labelled negative. An unknown example was classified into the class that corresponds to the SVM with the highest output score. The results for the family prediction are given in Table 2.
Antibacterial peptides belonging to various sources were further classified into families. Classification models were developed for peptides belonging to insects, frogs and mammals. To classify Insect antibacterial peptides into families 5 1-vs-r SVMs were developed. In a similar way 5 1-vs-r SVM models were developed to classify frog and mammalian antibacterial peptides into their respective families. The detailed results of classification of insect, frog and mammalian peptides are given in results section (Table 3, 4 and 5).
We developed a web server AntiBP2  freely available for predicting and classify antibacterial peptides using models developed in this study. This web server was developed on SUN server (model T-1000) under Solaris environment using PERL programming languages.
Authors are thankful to Council of Scientific and Industrial Research (CSIR) and Department of Biotechnology (DBT), Govt. of India for financial support. Sneh Lata and Nitish Kumar Mishra is senior research fellow and financially supported by CSIR, New Delhi, India.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S1.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.