Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation
© E-komon et al; licensee BioMed Central Ltd. 2012
Received: 1 July 2011
Accepted: 27 April 2012
Published: 27 April 2012
Outer membrane proteins (OMPs) of Pasteurella multocida have various functions related to virulence and pathogenesis and represent important targets for vaccine development. Various bioinformatic algorithms can predict outer membrane localization and discriminate OMPs by structure or function. The designation of a confident prediction framework by integrating different predictors followed by consensus prediction, results integration and manual confirmation will improve the prediction of the outer membrane proteome.
In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: those of avian strain Pm70 and porcine non-toxigenic strain 3480. Predicted proteins in each group were filtered by optimized criteria for consensus prediction: at least two positive predictions for the subcellular localization predictors, three for the transmembrane β-barrel protein predictors and one for the lipoprotein predictors. The consensus predicted proteins were integrated from each group into a single list of proteins. We further incorporated a manual confirmation step including a public database search against PubMed and sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interactions to enhance the confidence of prediction. As a result, we were able to confidently predict 98 putative OMPs from the avian strain genome and 107 OMPs from the porcine strain genome with 83% overlap between the two genomes.
The bioinformatic framework developed in this study has increased the number of putative OMPs identified in P. multocida and allowed these OMPs to be identified with a higher degree of confidence. Our approach can be applied to investigate the outer membrane proteomes of other Gram-negative bacteria.
The Gram-negative bacterium Pasteurella multocida is responsible for economically significant infections of a wide range of animal species. The organism causes a variety of disease syndromes which include pneumonic pasteurellosis of ruminants and pigs, porcine progressive atrophic rhinitis (PAR), fowl cholera, bovine haemorrhagic septicaemia (HS), and human infections via carnivorous bites or scratches . Like all Gram-negative bacteria, the cell envelope of P. multocida consists of a symmetrical inner membrane and an asymmetrical outer membrane, separated by the periplasmic space and peptidoglycan layer . The outer membrane consists of an inner phospholipid layer and an outer LPS leaflet . It serves as a selective barrier that controls the passage of nutrients and waste products into and out of the cell and is the interface between pathogen and host. The outer membrane harbours two classes of proteins, integral membrane proteins and peripheral lipoproteins, which together account for 2-3% of the total encoded proteins [4, 5]. Integral outer membrane proteins (OMPs) typically have a β-barrel structure whereas lipoproteins are mostly anchored to the inner leaflet of the outer membrane [6, 7]. The biosynthesis and translocation of these two groups of proteins have previously been reviewed [6, 8–10]. Outer membrane proteins play varied and important roles for bacteria, allowing them to adapt to different environments and host niches . These roles include biogenesis and integrity of the outer membrane, nonspecific porin activity, energy-dependent transport, adherence and membrane-associated enzymatic activity . In P. multocida, certain OMPs play important roles in virulence and have been utilized as vaccine antigens .
The majority of OMPs can be bioinformatically differentiated and predicted by using their amino acid compositions [12–14], specific protein modifications and sorting mechanisms [15, 16], and unique sequences and structural patterns [17–21]. Predictors of outer membrane-located proteins employ a variety of algorithms and methods having different accuracies and sensitivity levels of prediction [22–44]. These predictors can be categorized into three groups: (1) subcellular localization or global predictors which can differentiate between proteins from different compartments; (2) transmembrane β-barrel protein predictors which distinguish β-barrel structures from transmembrane α-helical proteins predominantly found in the inner membrane; and (3) lipoprotein predictors which can discriminate between inner membrane and outer membrane lipoprotein signal peptides .
Bioinformatic predictors have been used to identify OMPs in several Gram-negative bacterial species [5, 32, 46, 47]. However, disagreement between the predicted results from individual programs is frequently observed. A combination of different predictors, together with consensus prediction, has been shown to increase the coverage and accuracy of the predicted outer membrane proteome [45, 48] including that of transmembrane β-barrel proteins . Heinz et al.  also employed a manual confirmation step to remove false positives and increase the confidence of the predicted outer membrane proteome.
In a previous P. multocida study , three predictors (two subcellular localization and one lipoprotein) were used to predict 129 proteins as secreted, outer membrane or lipoprotein from the publicly available genome of P. multocida avian strain Pm70 . However, certain predicted proteins were not confirmed as OMPs by all three predictors. Boyce et al.  identified 35 proteins by proteomics from the P. multocida avian strain X-73 but only one third of these proteins were predicted to be OMPs by a combination of two subcellular localization predictors. Therefore, our understanding of the outer membrane proteome of P. multocida remains elusive.
In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: the avian strain Pm70 and the unannotated genome of porcine non-toxigenic strain 3480. The predicted proteins in each group were filtered by optimized criteria for the consensus prediction and the consensus predicted proteins from each group were integrated into a single list of proteins. We further incorporated a manual confirmation step which included a public database search against PubMed together with various sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interaction to enhance the confidence of prediction. Using these approaches, we were able to confidently predict the outer membrane proteomes of the two P. multocida strains.
Prediction of OMPs using different predictors
Bioinformatic predictors used for the OMP prediction
Method of predictor
Proteome Analyst v. 3.0 (PA)
PSORTb v. 2.0
CELLO v. 2.5
Trans-membrane β-barrel structure
Outer membrane lipoprotein
LIPOP v. 1.0
The transmembrane β-barrel protein predictors identified 329 putative β-barrel proteins from the avian strain genome and 336 proteins from the porcine strain genome (Figure 2B). TMB-Hunt identified the highest number of predicted proteins (168) from the avian strain genome, while MCMBB identified the highest number of predicted proteins (184) from the porcine strain genome. BOMP identified the lowest number of predicted proteins (40 and 48) from the avian and porcine strain genomes, respectively. For the avian strain genome, 231 proteins were predicted by only a single program: 70, 113, 43 and 5 proteins by MCMBB, TMB-Hunt, TMBETADISC-RBF and BOMP, respectively. Similarly, 225 proteins were identified by only a single predictor from the porcine strain genome: 84, 84, 46 and 11 proteins by MCMBB, TMB-Hunt, TMBETADISC-RBF and BOMP, respectively. Nineteen proteins were predicted by all four programs in both the avian and porcine strain genomes. The use of two or three programs predicted a total of 79 proteins from the avian strain genome and 92 proteins from the porcine strain genome.
The lipoprotein predictors identified 86 proteins from the avian strain genome and 82 proteins from the porcine strain genome (Figure 2C). LIPO predicted 73 proteins from the avian strain genome and 75 from the porcine strain genome whereas LIPOP predicted 69 proteins from the avian strain genome and 67 from the porcine strain genome. Together, LIPO and LIPOP predicted 56 and 60 proteins from the avian and porcine strain genomes, respectively. However, LIPO identified 17 unique lipoproteins from the avian strain genome and 15 from the porcine strain genome, whereas LIPOP identified 13 unique lipoproteins from the avian strain genome and seven from the porcine strain genome.
Agreement between pairs of predictors
Comparison of statistical parameters used for selection of the consensus criteria obtained with the training dataset of 526 Gram-negative bacterial protein sequences of known localization and for validation of the consensus criteria obtained with the test dataset of 529 Gram-negative bacterial protein sequences of known localization
Subcellular localization predictor group
At least one positive prediction
At least two positive predictions
At least three positive predictions
All four positive predictions
Transmembrane β-barrel protein predictor group
At least one positive prediction
At least two positive predictions
At least three positive predictions
All four positive predictions
Outer membrane lipoprotein predictor group
At least one positive prediction
All two positive predictions
At least two positive predictions for subcellular localization predictors
At least three positive predictions for transmembrane β-barrel protein predictors
At least one positive prediction for outer membrane protein predictors
Integration of the three predictor groups
The P. multocida proteins predicted by each group of predictors were filtered using these optimized criteria and resulted in 65 consensus predicted proteins from the avian strain genome and 73 proteins from the porcine strain genome for the subcellular localization predictors; 47 and 53 proteins from the avian and porcine strain genomes, respectively, for the β-barrel transmembrane protein predictors; and 86 and 82 proteins from the avian and porcine strain genomes, respectively, for the lipoprotein predictors (Figure 1). Integration of the consensus-predicted proteins from these three groups subsequently yielded 140 proteins from the avian strain genome and 147 proteins from the porcine strain genome (Figure 1). Of the proteins predicted from the avian and porcine strain genomes, 27 proteins from the avian strain genome and 24 proteins from the porcine strain genome were filtered out by the subcellular localization predictor group but not by the β-barrel transmembrane protein and/or the lipoprotein predictor groups. Similarly, 36 proteins from the avian strain genome and 34 proteins from the porcine strain genome were filtered out by the β-barrel transmembrane protein predictor group but not by the subcellular localization and/or the lipoprotein predictor groups. No proteins from either genome were removed from the lipoprotein predictor group.
Manual confirmation of the predicted proteins
In the final step, published information available on the predicted proteins was searched, using text mining and sequence analysis, to confirm their outer membrane location. Forty-two proteins (30%) from the avian strain genome and 40 proteins (27%) from the porcine strain genome were removed at the manual confirmation stage. Thirty-one of these proteins were identified in both genomes and included 19 cytoplasmic or inner membrane proteins, 11 periplasmic proteins, two secreted proteins and one phage protein. In this way, 98 proteins from the avian strain genome and 107 proteins from the porcine strain genome were confirmed as being confidently-predicted OMPs (Figure 1). These proteins accounted for 4.9% of the avian strain genome and 4.7% of the porcine strain genome. Details of the confidently predicted OMPs from the avian strain genome are given in Additional File 5. Eighty-nine (91%) of the predicted OMPs in the avian strain genome were also detected in the porcine strain genome, indicating that these two outer membrane proteomes are very similar. Eighteen (17%) of the predicted OMPs from the porcine strain genome had no homologous proteins in the avian strain genome; most of these were hypothetical proteins. However, these proteins included an Omp100 adhesin/invasin homologue in Aggregatibacter aphrophilus and two uncharacterized TonB-dependent receptors.
Analysis of filtered-out predicted proteins
Proteins of the avian strain genome that were filtered out at the consensus prediction stage were integrated and further analyzed by manual confirmation (Figure 1). Ninety-seven (60%) of the proteins predicted by the subcellular localization predictors and 277 (84%) of the proteins predicted by the transmembrane β-barrel predictors were filtered out; no proteins were filtered out by the lipoprotein predictors. In total, 339 of the proteins predicted by the subcellular localization and transmembrane β-barrel protein predictors were filtered out by consensus prediction. Further analyses of these proteins by manual confirmation revealed that 39 were putative OMPs and/or periplasmic proteins. However, 20 of these had previously been predicted by the lipoprotein predictor group and were therefore removed from the list of filtered-out proteins. A further six proteins that had previously been filtered out by the transmembrane β-barrel predictor group had passed the criteria of the subcellular localization predictor group and were also removed from the list of filtered-out proteins. Thus, 13 (4%) of the filtered-out proteins likely represented true OMPs. Manual confirmation of these 13 proteins showed that seven were indeed putative OMPs. These included HbpA/DppA (PM0592), NlpD (PM1507), RcpB (PM0851), MltA (PM0928), ComEA (PM1665), NlpI (PM1113) and a putative OMP (PM1623). The remaining six proteins were putative periplasmic proteins. Thus, only seven (2%) of the 339 filtered-out proteins were putative OMPs, whereas 332 (98%) proteins could be confidently classified as non-OMPs. These results clearly demonstrate that consensus prediction removes the majority (98%) of non-OMPs and substantially reduces the time that needs to be spent on manual confirmation.
Functions of the confidently predicted OMPs
The functions of the 98 confidently predicted OMPs in the avian strain genome are shown in Additional File 6. These functions include outer membrane biogenesis and integrity (12 proteins), transport and receptor (25 proteins), adherence (7 proteins) and enzymatic activity (9 proteins). Forty-one proteins have unknown functions (although 17 are named) and 27 of these are lipoproteins. Interestingly, two or three copies of genes encoding certain proteins were predicted. For example, three ompH genes and two genes of lspB, hsf, hgbB, plpE and hlpB were predicted. Similar observations, including three ompH genes and two genes of lspB, hsf, hgbA, and plpE, were made for the porcine strain genome. Two proteins, HexD and Wza, were predicted from both genomes but they appear to have similar functions in capsular polysaccharide transport. Twelve TonB-dependent receptors including HemR (hemin receptor), PfhR and HasR (heme receptors), HmbR, HgbA and two HgbB (haemoglobin receptors), and PM0803, PM1075, PM1081, PM1282 and PM1428 were predicted in the avian strain genome; notably, most of these are involved in iron uptake. Similarly, 14 TonB-dependent receptors were identified in the porcine strain genome including HemR, PfhR and HasR, HmbR, two HgbAs, HgbB, PM0803, PM1075, PM1081, PM1282, PM1428 and two uncharacterized porcine strain-specific proteins (PMpPor1882 and PMpPor2194).
Physicochemical properties of putative OMPs
Analysis of physicochemical parameters (Additional File 5) highlighted the properties of the putative OMPs. The predicted proteins had molecular masses ranging from 7.1 to 276.2 kDa (52.4 ± 43 kDa average) and an average pI value of 8.1 ± 1.5. The average size of the predicted lipoproteins was smaller than that of the other proteins. Some proteins had very large sizes such as Hsf_1 (276 kDa) and the hypothetical lipoprotein PM0659 (214 kDa). The average GRAVY score  was -0.35 ± 0.24 which indicated that the proteins were relatively hydrophilic compared to the predicted inner membrane proteins (data not shown). The predicted OMPs had more β-sheet strands (3-44 strands) than α-helices (0-3 helices).
Different prediction methods
Each prediction method used in the present study (Table 1) is based on different algorithms and training datasets. The subcellular localization predictors aimed to determine all cellular components (secreted, outer membrane, periplasmic, inner membrane and cytoplasmic proteins) of the genome of P. multocida. PA analyzes keywords obtained from various databases using machine-learned classifiers and provides a user-friendly graphical explanation of each prediction . PSORTb combines multiple prediction components and each component performs a specific function including homology prediction, transmembrane prediction, a signal peptide prediction, and a specific motif prediction . SOSUI-GramN uses only the total sequence and physicochemical properties of the N- and C-terminal signal sequences for its prediction . CELLO uses a supervised-learning method (support vector machines, SVMs) to detect specific amino acid compositions and motifs . Of 162 proteins predicted by the subcellular localization predictors from the avian strain genome, 15% were predicted by all four predictors, 25% by two or three predictors and 60% by a single predictor. Similarly, of 197 predicted proteins from the porcine strain genome, 11% of proteins were predicted by all four predictors, 26% by two or three predictors and 63% were predicted by a single predictor. Therefore, approximately 40% of the proteins predicted by the subcellular localization predictors were predicted by at least two predictors. Although PA and PSORTb have been widely used and reported as highly efficient predictors , SOSUI-GramN and CELLO identified additional OMPs (e.g., RcpC, NanB, TadD, LppC and PM1515) which the first two predictors did not. Overall, the use of multiple subcellular localization predictors increased both the prediction coverage and the confidence of prediction.
Conversely, the transmembrane β-barrel protein and lipoprotein predictors identified specific groups of OMPs. The four transmembrane β-barrel protein predictors discriminate between β-barrel proteins and non-β-barrel proteins. BOMP detects the C-terminal signal sequence and typical β-barrel pattern of the total amino acid sequence . MCMBB uses a fast algorithm to determine alternating patterns of the transmembrane β-barrel proteins . TMB-Hunt and TMBETADISC-RBF identify transmembrane β-barrel proteins based on amino acid composition profiles using different algorithms [29, 41]. MCMBB and TMB-Hunt predicted more proteins than BOMP and TMBETADISC-RBF (Figure 2, Additional Files 1 and 2). The explanation for this could be due to differences in the algorithms, scoring schemes and performance levels. By using these four transmembrane β-barrel protein predictors, 30% and 33% of transmembrane proteins were predicted by at least two predictors from the avian and porcine strain genomes, respectively; the remaining transmembrane proteins were predicted by a single predictor. The use of multiple transmembrane β-barrel protein predictors again resulted in an increase in the confidence of prediction.
For the lipoprotein predictors, LIPO and LIPOP detect outer membrane lipoproteins by using their conserved lipo-box sequences. Together, LIPO and LIPOP predicted 65% of lipoproteins from the avian strain genome and 73% of lipoproteins from the porcine strain genome. These results indicate a high level of agreement between the two predictors and a high level of confidence.
Our findings confirm results obtained with Escherichia coli which showed that the use of multiple predictors increases the efficiency of subcellular localization prediction as well as specific-feature (β-barrel and lipid modified proteins) prediction when compared with the use of a single program . Mirus and Schleiff  compared different transmembrane β-barrel protein predictors and showed that the combinatory approach improved the reliability of the prediction. Moreover, we have also confirmed that the combined use of different predictors improves the coverage of predicted OMPs and our findings are consistent with previous work in other bacterial species [5, 32, 47]. However, a higher number of predictors were used in the present study.
Filtration, integration and confirmation of the prediction results
In the present study, we used a combination of subcellular localization, transmembrane β-barrel protein and lipoprotein predictors, followed by consensus prediction with optimized criteria, integration and manual confirmation (data mining and sequence analyses) to predict OMPs in the available avian and porcine P. multocida genomes. Consensus prediction was validated on a modified PA dataset containing 1055 Gram-negative bacterial protein sequences (of < 25% similarity) which were divided into training and test datasets. The consensus criteria were selected by comparing statistical parameters obtained using various criteria in the training dataset; the selected criteria were validated using the test dataset. The selected criteria were chosen to optimize or maximize the accuracy, recall/sensitivity, specificity and MCC scores. The criterion (i.e. at least two predictors) selected for the subcellular localization predictors was chosen by maximizing the accuracy, specificity and MCC scores (Figure 5 and Table 2). For the transmembrane β-barrel protein predictor group, the accuracy, specificity and MCC scores increased as the number of predictors increased (i.e. maximum values were achieved for at least 4 predictors). However, there was a corresponding reduction in the recall/sensitivity scores (Figure 5). Thus, although more false-positive proteins were removed as the number of predictors increases, there was also an increase in the loss of true positive proteins. Therefore, the consensus criterion (at least three predictors) for the transmembrane β-barrel protein predictors was selected by optimization of recall/sensitivity in conjunction with the other statistical parameters. The MCC scores of the selected criteria for the transmembrane β-barrel protein and the outer membrane lipoprotein predictors were moderate (Figure 5). The reason for these moderate scores could be due to the ability to predict specific subgroups of OMPs (e.g. transmembrane β-barrel and outer membrane lipoprotein) with these predictors. When the three groups of predictors were combined, the prediction performance was enhanced (Table 2).
Applying the consensus prediction to the predicted OMPs of P. multocida significantly reduced the number of false positive proteins, but this advantage has to be measured against the loss of a small number (seven) of true positive proteins. If absolutely necessary, the identities and locations of the filtered-out proteins can be checked using the additional manual confirmation step (Figure 1). Applying the consensus method and manual confirmation enhances the confidence and reliability of the predicted proteins [45, 48, 50]. Viratyosin et al.  developed a computational framework incorporating consensus prediction of the subcellular localization predictors and homology information for subcellular localization prediction of the Leptospira interrogans genome and identified 63 putative OMPs. Similarly, Heinz et al.  used multiple prediction phases, including screening of the inner membrane proteins, manual confirmation of the PSORTb database, and prediction of β-barrel, β-helix and lipoproteins, to identify the OMPs in Chlamydiae. Our study provides a simple framework which improves the confidence of prediction of the outer membrane proteome of P. multocida compared to previous studies.
By using the consensus prediction followed by integration of the results for three predictor groups (Figure 1), the number of predicted proteins decreased from approximately 400 to 140 for the avian strain genome and to 147 for the porcine strain genome. The manual confirmation step further reduced the numbers to 98 and 107 confidently-predicted putative OMPs for the avian and porcine P. multocida genomes, respectively. These values represent an average of 4.8% of the total proteome. The two predicted outer membrane proteomes were very similar, sharing 89 (83%) proteins. The majority (64) of these proteins had sequence identities above 99%, whereas 22 proteins had sequence identities in the range of 55.9 to 98%. Twelve proteins were present in either the avian or porcine genomes but not both. Of these, only three, namely, the Omp100 adhesin/invasin and two uncharacterized TonB-dependent receptors, were annotated as having putative function, in adherence and transport. The presence of these proteins in porcine isolates alone suggests a possible role in host adaptation.
We compared the confidently predicted putative OMPs from the avian strain genome obtained by our prediction framework with those predicted by a recently published subcellular localization predictor ClubSub-P which was developed based on cluster-based and consensus prediction methods . Fifty-eight proteins were predicted by the ClubSub-P program including 34 inner or outer membrane lipoproteins, 20 transmembrane β-barrel proteins and four lipidified transmembrane β-barrel proteins. Forty-eight of these proteins were predicted by our prediction framework, whereas ten proteins were not predicted by our method. The ClubSub-P predictor did not discriminate between outer and inner membrane lipoproteins. The ten proteins not predicted by our method were removed during either the consensus prediction or manual confirmation step. Therefore, our prediction framework provided better coverage of the predicted outer membrane proteome of P. multocida compared to ClubSub-P .
Of the 98 confidently predicted putative OMPs from the avian strain genome, 48 proteins were predicted by at least two groups of predictors, while the remainder were identified by only one approach (Figure 6). We were able to classify the predicted OMPs into transmembrane β-barrel, lipidified transmembrane β-barrel, and lipidified proteins. The subcellular localization predictors predicted four potential β-barrel proteins that were filtered out by the β-barrel predictor group. The loss of these potential true OMPs in the prediction may have occurred due to the stringent criteria used during the consensus prediction as increased stringency reduces the rate of false positives at the cost of an increased rate of false negatives. The manual confirmation of individual predicted proteins helped in the elimination of the false-positive proteins, such as some secreted and periplasmic proteins, and confidently confirmed that predicted proteins were targeted to the outer membrane. Moreover, it also assigned relevant functions to about 60% of the predicted proteins whose roles included outer membrane biogenesis and integrity, transport and receptor functions, adherence, and enzymatic activity. However, the remainder of the proteins, especially the lipoproteins, are hypothetical and require further characterization.
Hatfaludi et al.  reviewed the functions and classification of the OMPs of P. multocida and reported that 73 proteins were outer membrane-located based on previously published experimental research. We have confidently predicted 45 of these proteins, whereas 28 proteins were not predicted in the present study (Figure 8). One protein, TbpA, was not identified because of its absence from the avian and porcine strain genomes. The remaining 27 proteins were not included in our list of confidently predicted OMPs for a number of reasons. Nine proteins were filtered out by consensus prediction (five proteins) or shown to be non-OMPs by manual confirmation (four proteins). The proteins that were filtered out by consensus prediction included lipoprotein PM0979, a competence-related DNA-binding and uptake protein ComE1, an outer membrane-bounded sialic acid-binding protein NanP or YiaO, and Flp (Tad) operon proteins Flp1 and RcpB. The remaining 18 proteins were not identified as OMPs by any of the ten predictors in the present study. These included cytoplasmic proteins (3), inner membrane proteins (4), a periplasmic protein (1) and extracellular proteins (2). There are a number of explanations for the presence of these proteins in the list assembled by Hatfaludi et al.  including contamination during outer membrane extraction and multiple subcellular localizations of certain proteins. Significantly, of the 98 OMPs predicted from the avian strain genome in the present study, 53 OMPs (Figure 8) were not reported by Hatfaludi et al. . These included OmpH_3, Opa, Hsf_1 and _2, LolB, LppA, RlpB, PlpE, SmpA, Plp4, LppC, HexD and Wza. Clearly, these findings indicate that there is still a lack of experimental evidence relating to the structures and functions of the majority of the predicted outer membrane proteome.
Both Hatfaludi et al  and Al-hasani et al  identified the same 44 proteins that were also predicted in the present study (Figure 8). However, a further 40 proteins in our list were only predicted by Al-hasani et al  whereas one protein was only reported by Hatfaludi et al . In the present study, we have predicted 13 proteins that were not described by Hatfaludi et al  or predicted by Al-hasani et al . These include the Flp operon protein RcpC, the paraquat-inducible protein Mce/PqiB, YccT, a RplA-like protein PM1986, and nine hypothetical proteins (PM1515, PM0519, PM1772, PM1002, PM1798, PM1939, PM0015, PM0234 and PM1323). However, the functions of certain of these proteins have not been determined. Overall, the present study has improved the coverage of the predicted outer membrane proteome of P. multocida by 15% compared to that of Al-hasani et al . Our simple prediction framework has allowed us to confidently predict and increase the coverage of the outer membrane sub-proteome of P. multocida by using currently available predictors and databases. However, our ten selected predictors could be replaced or modified as improved subcellular localization or specific-feature predictors are made available. For example, the Freeman-Wimley prediction algorithm was developed to improve the prediction of transmembrane β-barrel proteins over previous algorithms to an accuracy of 99% and MCC score of 0.75 . Freeman and Wimley  demonstrated that their prediction algorithm was more accurate than BOMP and TMBETADISC-RBF, two of the methods used in the present study. This method could potentially be incorporated into our transmembrane β-barrel protein predictor group to further improve the prediction performance of our prediction method if it was available as an online user-friendly tool for genome-scale prediction. However, it should be pointed out that of the seven putative OMPs that were filtered out by consensus prediction and subsequently identified by data-mining, five (HbpA/DppA, NlpD, ComEA, Nlpl and PM1623) were transmembrane β-barrel proteins that were identified by the transmembrane β-barrel protein predictors used - BOMP (3 proteins), MCMBB (2), TMB-Hunt (1) or TMBETADISC-RBF (1). If the Freeman-Wimley method identified all of these, only ComEA and PM1623 would pass the consensus prediction criteria (i.e. prediction by at least three predictors). Recently, Goudenege et al.  created a subcellular localization database, CoBaltDB, for Bacteria and Archeae by incorporating 43 different predictors and 784 complete proteomes, but they did not give consensus localization of the predicted proteins and a decision for protein location has to be made by the users themselves. By using this database, our prediction framework could also be applied to confidently predict subcellular localization in other bacterial species.
In the present study, we have designed a simple prediction framework that allows the prediction of putative OMPs from the available P. multocida genomes with a high level of confidence. Our approach involves the use of multiple predictors divided into three groups, together with consensus prediction followed by integration and manual confirmation. We have confidently identified 98 putative OMPs from the avian strain genome and 107 putative OMPs from the porcine strain genome of P. multocida with 83% overlap between the two genomes. The coverage of the outer membrane proteome of this bacterium has been improved on previous research. The identification of previously unrecognized OMPs in strains of P. multocida from different host species will stimulate further studies into the molecular basis of the pathogenesis of this organism. In a separate study, the authors have analyzed the outer membrane proteomes of eight P. multocida isolates using complementary proteomic methods . In this study, more than half of the predicted outer membrane proteome has been demonstrated experimentally. Fifty-four putative OMPs have been identified representing 52% of the predicted avian outer membrane proteome and 48% of the predicted porcine sub-proteomes. This study not only provides a basis for furthering our understanding of the outer membrane proteome of P. multocida but can also be applied to other Gram-negative bacteria.
P. multocida genomes
The publicly available genome of the avian P. multocida subsp. multocida strain Pm70 [GenBank: AE004439] and the unannotated genome of the porcine non-toxigenic P. multocida strain 3480 [Project ID: 32177] were used for all bioinformatic analyses. The avian strain genome containing 2,015 protein-coding genes was retrieved from NCBI. The 2,260 protein-coding genes of the unannotated porcine genome (kindly provided by Dr. A. Gillaspy, University of Oklahoma) were manually predicted using GeneMark  and automatically named using Blast2GO .
Selection of bioinformatic predictors
The scheme for the bioinformatic prediction of the OMPs is shown in Figure 1. Three groups of predictors, involving 10 genome-scale programs (Table 1), were used to predict putative OMPs from the two genomes. Subcellular localization or global predictors included the programmes Proteome Analyst (PA) , PSORTb , CELLO  and SOSUI-GramN ; transmembrane β-barrel protein predictors included TMB-Hunt , TMBETADISC-RBF , BOMP  and MCMBB ; and outer membrane lipoprotein predictors included LIPO  and LIPOP . Predicted results of each protein in the HTML or Excel formats from individual programmes were parsed using in-house built perl scripts.
Analysis of agreement between pairs of bioinformatic predictors
The advantages of using multiple predictors over a single predictor can be evaluated by analysis of agreement between pairs of selected programs . This analysis determines the level of agreement between different predictors by use of the following formula:
where (P 1 ∩ P 2 ) L is the number of predicted proteins shared between predictor P 1 and P 2 for a subcellular location, L, and P' L is the total number of predicted proteins from a lower coverage program of the predictor pair for that location. An agreement score (A) of one means that all proteins predicted by both methods (P 1 and P 2 ) are localized on a location, L. A score of zero means that there are no shared predicted proteins between the two predictors for a location, L. In-house built perl scripts were used to analyze the predicted results of each program before pairwise comparison and calculation of the agreement score.
Criteria optimization, consensus prediction and integration
The criteria that maximized the above parameters were selected and evaluated on the test dataset. The selected optimal criteria were then used for filtering the predicted P. multocida proteins. Subsequently, consensus predictions from the three groups were integrated (Figure 1) and a single list of predicted OMPs generated.
Manual confirmation of the predicted proteins
After the consensus prediction, each predicted protein was manually confirmed as being outer membrane-associated by using public database searches and sequence analyses (Figure 1). The PubMed database http://www.ncbi.nlm.nih.gov/pubmed was used to retrieve published experimental information relevant to the predicted proteins. The UniProt protein database http://www.uniprot.org/ was searched for homology and domain/motif, protein-protein interactions, and functional and structural predictions. Structural homology was examined by using the HHpred program http://toolkit.tuebingen.mpg.de/hhpred, ). The STRING protein interaction database http://string-db.org/ was used to identify whether the predicted proteins interacted with any known proteins or were members of any characterized pathways. Taken together, these analyses confirmed the proteins as confidently predicted putative OMP candidates (Figure 1).
Analysis of filtered-out predicted proteins
The use of filtering criteria aims to reduce the number of false positive proteins. However, it may increase the probability of losing a small proportion of true positive proteins. Therefore, predicted proteins that were filtered out by the consensus criteria were integrated and also examined by manual confirmation to identify such lost true positive proteins (Figure 1).
Physicochemical properties of the predicted OMPs
Physicochemical properties, e.g. molecular weight, length of protein sequence, theoretical pI, grand average of hydropathicity (GRAVY) score, aliphatic index, charge, number of β-strands and helices, of the putative OMPs were predicted by the ProtParam program http://expasy.org/tools/protparam.html, TMBETA-NET  and TMHMM .
Availability and requirements
Project name: None
Project home page: None
Operating system: Platform independent
Programming language: Java, Perl
Other requirements: Excel
License: None for usage
Any restrictions to use by non-academics: None
TE was supported by a Higher Educational Strategic Scholarship for the Frontier Research Network awarded by the Royal Thai Government, Thailand.
- Harper M, Boyce JD, Adler B: Pasteurella multocid pathogenesis: 125 years after Pasteur. FEMS Microbiol Lett 2006, 265: 1–10. 10.1111/j.1574-6968.2006.00442.xView ArticlePubMedGoogle Scholar
- St Michael F, Li J, Vinogradov E, et al.: Structural analysis of the lipopolysaccharide of Pasteurella multocid strain VP161: identification of both Kdo-P and Kdo-Kdo species in the lipopolysaccharide. arbohyd Res 2005, 340: 59–68. 10.1016/j.carres.2004.10.017View ArticleGoogle Scholar
- Corterton JW, Ingram JM, Cheng KJ: Structure and function of the cell envelope of Gram-negative bacteria. Bacteriological Rev 1974, 38: 87–110.Google Scholar
- Lin J, Huang S, Zhang Q: Outer membrane proteins: key players for bacterial adaptation in host niches. Microbes Infect 2002, 4: 325–331. 10.1016/S1286-4579(02)01545-9View ArticlePubMedGoogle Scholar
- Chung JW, Ng-Thow-Hing C, Budman LI, et al.: Outer membrane proteome of Actinobacillus pleuropneumonia : LC-MS/MS analyses validate in silico predictions. Proteomics 2007, 7: 1854–1865. 10.1002/pmic.200600979View ArticlePubMedGoogle Scholar
- Ruiz N, Kahne D, Silhavy TJ: Advances in understanding bacterial outer-membrane biogenesis. Nat Rev Microbiol 2006, 4: 57–66. 10.1038/nrmicro1322View ArticlePubMedGoogle Scholar
- Schulz GE: The structure of bacterial outer membrane proteins. Biochim Biophys Acta 2002, 1565: 308–317. 10.1016/S0005-2736(02)00577-1View ArticlePubMedGoogle Scholar
- Bos MP, Tommassen J: Biogenesis of the Gram-negative bacterial outer membrane. Curr Opin Microbiol 2004, 7: 610–616. 10.1016/j.mib.2004.10.011View ArticlePubMedGoogle Scholar
- Tokuda H, Matsuyama S-ichi: Sorting of lipoproteins to the outer membrane in E. col . Biochim Biophys Acta 2004, 1693: 5–13. 10.1016/j.bbamcr.2004.02.005View ArticlePubMedGoogle Scholar
- Knowles TJ, Scott-Tucker A, Overduin M, Henderson IR: Membrane protein architects: the role of the BAM complex in outer membrane protein assembly. Nat Rev Microbiol 2009, 7: 206–214. 10.1038/nrmicro2069View ArticlePubMedGoogle Scholar
- Dabo SM, Taylor JD, Confer a W: Pasteurella multocid and bovine respiratory disease. Anim Health Res Rev 2007, 8: 129–150. 10.1017/S1466252307001399View ArticlePubMedGoogle Scholar
- Gromiha MM: Motifs in outer membrane protein sequences: applications for discrimination. Biophys Chem 2005, 117: 65–71. 10.1016/j.bpc.2005.04.005View ArticlePubMedGoogle Scholar
- Gromiha MM, Suwa M: Influence of amino acid properties for discriminating outer membrane proteins at better accuracy. Biochim Biophys Acta 2006, 1764: 1493–1497. 10.1016/j.bbapap.2006.07.005View ArticlePubMedGoogle Scholar
- Gao Q-B, Ye X-F, Jin Z-C, He J: Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Anal Biochem 2010, 398: 52–59. 10.1016/j.ab.2009.10.040View ArticlePubMedGoogle Scholar
- Juncker AS, Willenbrock H, Von Heijne G, et al.: Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci 2003, 12: 1652–1662. 10.1110/ps.0303703PubMed CentralView ArticlePubMedGoogle Scholar
- Fariselli P: SPEPlip: the detection of signal peptide and lipoprotein cleavage sites. Bioinformatics 2003, 19: 2498–2499. 10.1093/bioinformatics/btg360View ArticlePubMedGoogle Scholar
- Mirus O, Schleiff E: Prediction of beta-barrel membrane proteins by searching for restricted domains. BMC Bioinformatics 2005, 6: 254. 10.1186/1471-2105-6-254PubMed CentralView ArticlePubMedGoogle Scholar
- Jackups R, Cheng S, Liang J: Sequence motifs and antimotifs in beta-barrel membrane proteins from a genome-wide analysis: the Ala-Tyr dichotomy and chaperone binding motifs. J Mol Biol 2006, 363: 611–623. 10.1016/j.jmb.2006.07.095View ArticlePubMedGoogle Scholar
- Waldispuhl J, Berger B, Clote P, Steyaert J-M: Predicting transmembrane beta-barrels and interstrand residue interactions from sequence. Proteins 2006, 65: 61–74. 10.1002/prot.21046View ArticlePubMedGoogle Scholar
- Valavanis IK, Bagos PG, Emiris IZ: Beta-barrel transmembrane proteins: Geometric modelling, detection of transmembrane region, and structural properties. Comput Biol Chem 2006, 30: 416–424. 10.1016/j.compbiolchem.2006.09.001View ArticlePubMedGoogle Scholar
- Emanuelsson O, Brunak S, von Heijne G, Nielsen H: Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2007, 2: 953–971. 10.1038/nprot.2007.131View ArticlePubMedGoogle Scholar
- Zhai Y, Saier MH: The beta-barrel finder (BBF) program, allowing identification of outer membrane beta-barrel proteins encoded within prokaryotic genomes. Protein Sci 2002, 11: 2196–2207.PubMed CentralView ArticlePubMedGoogle Scholar
- Bagos PG, Liakopoulos TD, Spyropoulos IC, Hamodrakas SJ: A Hidden Markov Model method, capable of predicting and discriminating beta-barrel outer membrane proteins. BMC Bioinformatics 2004, 5: 29. 10.1186/1471-2105-5-29PubMed CentralView ArticlePubMedGoogle Scholar
- Szafron D, Lu P, Greiner R, et al.: Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations. Nucleic Acids Res 2004, 32: W365-W371. 10.1093/nar/gkh485PubMed CentralView ArticlePubMedGoogle Scholar
- Yu C-sheng, Lin C-jen: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 2004, 13: 1402–6. 10.1110/ps.03479604PubMed CentralView ArticlePubMedGoogle Scholar
- Bhasin M, Garg A, Raghava GPS: PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005, 21: 2522–2524. 10.1093/bioinformatics/bti309View ArticlePubMedGoogle Scholar
- Gardy JL: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res 2003, 31: 3613–3617. 10.1093/nar/gkg602PubMed CentralView ArticlePubMedGoogle Scholar
- Gardy JL, Laird MR, Chen F, et al.: PSORTb v. 2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21: 617–623. 10.1093/bioinformatics/bti057View ArticlePubMedGoogle Scholar
- Garrow AG, Agnew A, Westhead DR: TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins. Nucleic Acids Res 2005, 33: W188-W192. 10.1093/nar/gki384PubMed CentralView ArticlePubMedGoogle Scholar
- Bulashevska A, Eils R: Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinformatics 2006, 7: 298. 10.1186/1471-2105-7-298PubMed CentralView ArticlePubMedGoogle Scholar
- Berven FS, Flikka K, Jensen HB, Eidhammer I: BOMP: a program to predict integral beta-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic Acids Res 2004, 32: W394-W399. 10.1093/nar/gkh351PubMed CentralView ArticlePubMedGoogle Scholar
- Berven FS, Karlsen OA, Straume AH, et al.: Analysing the outer membrane subproteome of Methylococcus capsulatu (Bath) using proteomics and novel biocomputing tools. Arch Microbiol 2006, 184: 362–377. 10.1007/s00203-005-0055-7View ArticlePubMedGoogle Scholar
- C-sheng Yu, Chen Y-ching Lu, C-hao Hwang J-kang: Prediction of protein subcellular localization. Proteins 2006, 64: 643–651. 10.1002/prot.21018View ArticleGoogle Scholar
- Gromiha MM, Yabuki Y, Suwa M: TMB finding pipeline: novel approach for detecting beta-barrel membrane proteins in genomic sequences. J Chem Inf Model 2007, 47: 2456–2461. 10.1021/ci700222sView ArticlePubMedGoogle Scholar
- Wu Z, Feng E, Wang Y, Chen L: Discrimination of outer membrane proteins by a new measure of information discrepancy. Protein Peptide Lett 2007, 14: 37–44. 10.2174/092986607779117254View ArticleGoogle Scholar
- Fyshe A, Liu Y, Szafron D, Greiner R, Lu P: Improving subcellular localization prediction using text classification and the gene ontology. Bioinformatics 2008, 24: 2512–2517. 10.1093/bioinformatics/btn463View ArticlePubMedGoogle Scholar
- Gromiha MM, Yabuki Y: Functional discrimination of membrane proteins using machine learning techniques. BMC Bioinformatics 2008, 9: 135. 10.1186/1471-2105-9-135PubMed CentralView ArticlePubMedGoogle Scholar
- Hu J, Yan C: A method for discovering transmembrane beta-barrel proteins in Gram-negative bacterial proteomes. Comput Biol Chem 2008, 32: 298–301. 10.1016/j.compbiolchem.2008.03.010View ArticlePubMedGoogle Scholar
- Imai K, Asakawa N, Tsuji T, et al.: SOSUI-GramN: high performance prediction for sub-cellular localization of proteins in Gram- negative bacteria. Bioinformation 2008, 2: 417–421. 10.6026/97320630002417PubMed CentralView ArticlePubMedGoogle Scholar
- Yan C, Hu J, Wang Y: Discrimination of outer membrane proteins with improved performance. BMC Bioinformatics 2008, 9: 47. 10.1186/1471-2105-9-47PubMed CentralView ArticlePubMedGoogle Scholar
- Ou Y-Y, Gromiha MM, Chen S-A, Suwa M: TMBETADISC-RBF: Discrimination of beta-barrel membrane proteins using RBF networks and PSSM profiles. Comput Biol Chem 2008, 32: 227–231. 10.1016/j.compbiolchem.2008.03.002View ArticlePubMedGoogle Scholar
- Remmert M, Linke D, Lupas AN, Söding J: HHomp-prediction and classification of outer membrane proteins. Nucleic Acids Res 2009, 37: W446-W451. 10.1093/nar/gkp325PubMed CentralView ArticlePubMedGoogle Scholar
- Shen H-B, Chou K-C: Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. J Theor Biol 2010, 264: 326–333. 10.1016/j.jtbi.2010.01.018View ArticlePubMedGoogle Scholar
- Yu NY, Wagner JR, Laird MR, et al.: PSORTb 3.0: Improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 2010, 26: 1608–1615. 10.1093/bioinformatics/btq249PubMed CentralView ArticlePubMedGoogle Scholar
- Díaz-Mejía JJ, Babu M, Emili A: Computational and experimental approaches to chart the Escherichia col cell-envelope-associated proteome and interactome. FEMS Microbiol Rev 2009, 33: 66–97. 10.1111/j.1574-6976.2008.00141.xPubMed CentralView ArticlePubMedGoogle Scholar
- Boyce JD, Cullen PA, Nguyen V, Wilkie I, Adler B: Analysis of the Pasteurella multocid outer membrane sub-proteome and its response to the in vivo environment of the natural host. Proteomics 2006, 6: 870–880. 10.1002/pmic.200401342View ArticlePubMedGoogle Scholar
- Huntley JF, Conley PG, Hagman KE, Norgard MV: Characterization of Francisella tularensi outer membrane proteins. J Bacteriol 2007, 189: 561–574. 10.1128/JB.01505-06PubMed CentralView ArticlePubMedGoogle Scholar
- Viratyosin W, Ingsriswang S, Pacharawongsakda E, Palittapongarnpim P: Genome-wide subcellular localization of putative outer membrane and extracellular proteins in Leptospira interrogan serovar Lai genome using bioinformatics approaches. BMC Genomics 2008, 9: 181. 10.1186/1471-2164-9-181PubMed CentralView ArticlePubMedGoogle Scholar
- Bagos PG, Liakopoulos TD, Hamodrakas SJ: Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method. BMC Bioinformatics 2005, 6: 7. 10.1186/1471-2105-6-7PubMed CentralView ArticlePubMedGoogle Scholar
- Heinz E, Tischler P, Rattei T, et al.: Comprehensive in silico prediction and analysis of chlamydial outer membrane proteins reflects evolution and life style of the Chlamydiae. BMC Genomics 2009, 10: 634. 10.1186/1471-2164-10-634PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Hasani K, Boyce J, McCarl VP, et al.: Identification of novel immunogens in Pasteurella multocid . Microb Cell Fact 2007, 6: 3. 10.1186/1475-2859-6-3PubMed CentralView ArticlePubMedGoogle Scholar
- May BJ, Zhang Q, Li LL, et al.: Complete genomic sequence of Pasteurella multocid , Pm70. Proc Natl Acad Sci USA 2001, 98: 3460–3465. 10.1073/pnas.051634598PubMed CentralView ArticlePubMedGoogle Scholar
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157: 105–132. 10.1016/0022-2836(82)90515-0View ArticlePubMedGoogle Scholar
- Gardy JL, Brinkman FSL: Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 2006, 4: 741–751. 10.1038/nrmicro1494View ArticlePubMedGoogle Scholar
- Paramasivam N, Linke D: ClubSub-P: cluster-based subcellular localization prediction for Gram-negative bacteria and archaea. Front Microbio 2011, 2: 1–14.View ArticleGoogle Scholar
- Hatfaludi T, Al-Hasani K, Boyce JD, Adler B: Outer membrane proteins of Pasteurella multocid . Vet Microbiol 2010, 144: 1–17. 10.1016/j.vetmic.2010.01.027View ArticlePubMedGoogle Scholar
- Freeman TC Jr, Wimley WC: A highly accurate statistical approach for the prediction of transmembrane beta-barrels. Bioinformatics 2010, 26(16):1965–1974. 10.1093/bioinformatics/btq308PubMed CentralView ArticlePubMedGoogle Scholar
- Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F: CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources. BMC Microbiol 2010, 10: 88. 10.1186/1471-2180-10-88PubMed CentralView ArticlePubMedGoogle Scholar
- E-komon T, Burchmore R, Davies R: Comparative outer membrane proteomic analyses of Pasteurella multocida isolates from different host species. Proteomics manuscript submitted manuscript submittedGoogle Scholar
- Besemer J, Borodovsky M: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 2005, 33: W451-W454. 10.1093/nar/gki487PubMed CentralView ArticlePubMedGoogle Scholar
- Conesa A, Götz S, García-Gómez JM, et al.: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21: 3674–3676. 10.1093/bioinformatics/bti610View ArticlePubMedGoogle Scholar
- Campanella JJ, Bitincka L, Smalley J: MatGAT: An application that generates similarity/identity matrices using protein or DNA sequences. BMC Bioinformatics 2003, 4: 29. 10.1186/1471-2105-4-29PubMed CentralView ArticlePubMedGoogle Scholar
- Söding J, Biegert A, Lupas AN: The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005, 33: W244-W248. 10.1093/nar/gki408PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, Von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305: 567–580. 10.1006/jmbi.2000.4315View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.