Skip to main content

A large scale prediction of bacteriocin gene blocks suggests a wide functional spectrum for bacteriocins



Bacteriocins are peptide-derived molecules produced by bacteria, whose recently-discovered functions include virulence factors and signaling molecules as well as their better known roles as antibiotics. To date, close to five hundred bacteriocins have been identified and classified. Recent discoveries have shown that bacteriocins are highly diverse and widely distributed among bacterial species. Given the heterogeneity of bacteriocin compounds, many tools struggle with identifying novel bacteriocins due to their vast sequence and structural diversity. Many bacteriocins undergo post-translational processing or modifications necessary for the biosynthesis of the final mature form. Enzymatic modification of bacteriocins as well as their export is achieved by proteins whose genes are often located in a discrete gene cluster proximal to the bacteriocin precursor gene, referred to as context genes in this study. Although bacteriocins themselves are structurally diverse, context genes have been shown to be largely conserved across unrelated species.


Using this knowledge, we set out to identify new candidates for context genes which may clarify how bacteriocins are synthesized, and identify new candidates for bacteriocins that bear no sequence similarity to known toxins. To achieve these goals, we have developed a software tool, Bacteriocin Operon and gene block Associator (BOA) that can identify homologous bacteriocin associated gene blocks and predict novel ones. BOA generates profile Hidden Markov Models from the clusters of bacteriocin context genes, and uses them to identify novel bacteriocin gene blocks and operons.

Results and conclusions

We provide a novel dataset of predicted bacteriocins and context genes. We also discover that several phyla have a strong preference for bacteriocin genes, suggesting distinct functions for this group of molecules.

Software Availability


Natural Product discovery has been a cornerstone of many pharmaceuticals and therapeutics. It is estimated that about 80 % of all drugs are either natural products or derived analogs [1]. These compounds encompass antibiotics (penicillin, tetracycline, erythromycin), anti-infectives (avermectin, quinine, artemisinin), pharmaceuticals (lovastatin, cyclosporine, rapamycin) and anticancer drugs (taxol, doxorubicin) [2]. Yet, despite this long history of success, pharmaceutical efforts in natural products research has decreased steadily between 2001 and 2008 [3]. Financial pressure from drug companies as well as difficulties in isolation and identification of natural compounds have severely limited the discovery rate of these important sources.

Bacteriocins are a large class of peptide-based antibiotics that encompass an extraordinary amount of chemical, structural, and functional diversity [4]. The structures of these compounds have revealed that a large number comprise a class of highly modified polypeptides. Interestingly, many bacteriocins are synthesized ribosomally as precursor peptides and are subsequently modified post-translationally to yield their biologically active forms. Post-translational modification serves to confer specific chemical properties that could not be obtained by peptide synthesis alone. Furthermore, post-translational modifications can be used as a mechanism to control the activation of the toxic activities of the bacteriocin, and thus exert a level of control and host immunity [5].

Genome mining has been an important technological resource in the discovery of novel natural products, including bacteriocins. Bacteriocin-like peptides are highly attractive candidates for genome mining, as these natural products are genetically encoded with nearby genes encoding their corresponding modifying enzymes. Proximity to genes encoding known modifying enzymes can often aid in the identification of peptide biosynthesis gene clusters [6]. In many cases, several metabolites have been identified from “cryptic” or “orphan gene clusters” [7]. These cryptic gene clusters have demonstrated that new, as yet uncharacterized enzymology is likely to be involved in the assembly of the final natural product, likely leading to a greater diversity of bacterial peptides than previously appreciated.

Several web-based gene mining and annotation tools have been developed to aid in the identification, characterization, and classification of novel bacteriocins. These include mining tools such as BAGEL [8] and bacteriocin repositories such as BACTIBASE [9]. Anti-SMASH is a recently developed website that expands genome mining to not only bacteriocins but a host of other genetically-identifiable antibiotics and other secondary metabolites [10] Although gene mining resources such as BAGEL have existed for many years, one major limitation in bacteriocin gene mining that is becoming evident as more bacteriocins are identified is the lack of homology in the genes encoding the actual precursor peptides for bacteriocins. Detecting novel bacteriocin producing genes through bioinformatic methods focused on precursor peptide discovery therefore remains a significant challenge.

The bacteriocins’ structure and sequence diversity make them difficult to detect using sequence homology algorithms such as BLAST [11]. In addition, their short sequence length makes them difficult to detect with ORF calling tools. Programs such as BAGEL have made great strides in identifying bacteriocins by searching for well-defined conserved motifs within the bacteriocin toxin sequences and adjacent context genes using Pfam databases [8]. However, the number of hypothetical bacteriocin genes that can be identified using such a method is highly limiting as these motifs are not necessarily known or conserved. Given that bacteriocin genes are highly diverse, it is likely that mining genomes for potential bacteriocin genes using sequence or profile similarity are likely to miss a large number of as-yet identified bacteriocin compounds.

Genes that encode bacteriocin precursors are often proximal in genomic sequence to accessory genes that are required to modify and secrete the bacteriocin peptide. These co-located gene blocks often contain genes that encode enzymes which perform post-translational modification and maturation of the bacteriocin product. Additionally, many of these gene blocks contain genes encoding various transporter proteins, presumably linked to specific export of the mature bacteriocin. Unlike the bacteriocin genes, these context genes are conserved across taxa, as they belong to conserved families of enzymes and other modifying proteins, as well as to transporters such as ATP-binding cassette (ABC) family of transporters. In the case of the large family of thiazole and oxazole modified bacteriocins, such as Microcin B17 and Streptolysin S, each gene block shares both conserved modification proteins as well as transport machinery genes that are located close to the bacteriocin gene [6, 12].

Here we present a new methodology that takes advantage of the conservation of context genes to identify locations of gene blocks associated with bacteriocins. Our approach is to identify context genes in addition to toxin genes without restricting our tool to finding homologs in annotated databases. To the best of our knowledge, there is no available public database that contains the complete bacteriocin gene clusters and their associated context genes. Therefore, BOA is useful for the construction of such a tool as well as for mining genomes for putative bacteriocins. We provide bacteriocin gene block predictions for 2773 genomes, in which the bacteriocin gene blocks may be browsed. The method is implemented as a software tool named Bacteriocin Operon and gene block Associator (BOA).


We used the bacterial and archaeal genome files from GenBank 2014 [13] for our dataset. Since we are interested in identifying bacteriocin associated gene clusters, we did not analyze partial contig files, and only whole bacterial chromosomes and plasmids were analyzed. Bacteriocins identified by BAGEL3 and seven experimentally identified bacteriocin associated gene blocks were used as a standard of truth or a “gold standard” data set. All of the context genes within these gene blocks were placed into five functional categories toxins, modifiers, immunity, transport, and regulation. Toxin genes refer to the open reading frames encoding the toxin precursor peptide; modifiers perform post-translational modification to the protoxin, and can include enzymes which are involved in amino acid modification (cyclodehydration, lanthionine synthesis, as well as leader peptide processing enzymes); immunity genes that prevent the toxin from affecting the host bacterial cells; transport genes create transporter proteins to move the toxin outside of the cell, and regulator genes control the expression of toxin proteins and other genes in the operon [6].

An overview of the pipeline is shown in Fig. 1. The method in detail is as follows:

  1. 1.

    Construct a set of experimentally-verified context genes. This we call the Literature Curated Set. The LC Set includes seven known bacteriocin associated gene blocks. The known gene blocks comprised enterococcal cytolysin and enterocin AS-48 from Enterococcus faecalis, microcin J25 from Escherichia coli, Nisin A from Lactococcus lactis subsp. lactis, streptolysin S and salivaricin A from Streptococcus pyogenes, and thiocillin from Bacillus cereus [1420]. These blocks are representative of several important classes of bacteriocins, namely lantibiotics (enterococcal cytolysin, Nisin A, salivaricin A), thiopeptides (thiocillin), thiazole/oxazole-modified microcins (TOMMs; streptolysin S), lassoed tail peptides (microcin J25), and circular bacteriocins (enterocin AS-48). While bacteriocin biosynthetic gene blocks are widely distributed among prokarya, the major structurally-related groups are few and their relevant context genes are largely represented in the LC Set. The genes were categorized as toxins, modifiers, immunity, transport, or regulation. The genes making up the LC Set are available in the supplemental material.

    Fig. 1
    figure 1

    An overview of the BOA Pipeline. The stages in the pipeline are elaborated upon in the ‘Methods’ section. (1): construction of the LC-Set and the BAGEL set; (2) BLAST LC and BAGEL genes against all bacterial & archaeal genomes evalue= 10−5; (3) select the ORFs within ±50 kb of homologs to toxin genes (4) assign ORFs to one of the following classes (left to right): toxin, modifier, immunity, transport, regulation; (5) build pHMMs from each category: cluster sequences using CD-HIT, align sequences in each cluster using MAFFT, then use hmmbuild from the HMMER suite to construct HMMs; (6) run hmmsearch from the HMMER suite against the genome files to extract more sequences from each category, remove predicted false positives using a threshold score as explained in Methods (7) use a clique filter to identify genes that are close together

  2. 2.

    BLAST the LC Set genes and the BAGEL toxin genes against the bacterial genome set.

  3. 3.

    Select ORFs with (a) e-value <10−5 and (b) within ± 50 kb of the homologs to the toxin genes (whether from the LC set or the BAGEL set).

  4. 4.

    Assign each homologous ORF according to the category of the gene to which it is found to be similar: toxin, modifier, immunity, transport, or regulation. Ambiguities are resolved by taking the best hit based on the BLAST E-value.

  5. 5.

    Build profile HMMs: cluster the sequences in each bin independently using CD-HIT [21], then use MAFFT [22] to perform a multiple alignment in each homology cluster, and HMMER [23] to build pHMMs from the multiple alignments.

  6. 6.

    Run the resulting pHMM’s against the bacterial genome files. Since most of the HMMER hits are probably false positives, we set a score threshold to filter them out. We determined this threshold by obtaining HMMER scores for all of the BAGEL bacteriocins found on the bacterial genomes and choosing the lowest BAGEL score as the threshold. See Fig. 2

    Fig. 2
    figure 2

    Determining the threshold for similarity-based search of toxin genes. Toxin gene candidates were derived as described in the text. To determine an adequate threshold for inferring homology, we examined the distribution of HMMER scores for homologs for predicted toxin genes (red) and BAGEL-derived toxin genes (blue). BAGEL toxin gene scores were used to set a minimum threshold of acceptance for HMMER scores for predicted genes

  7. 7.

    Use a clique filter (see below) to identify those genes that are close together and therefore candidates for bacteriocin gene blocks. See Fig. 3.

    Fig. 3
    figure 3

    Using a clique filter to identify putative bacteriocin gene blocks. Drawing is not to scale, and other genes may exist in between those shown. a, b and c are all cliques formed from these genes. Clique a has an identifiable homolog to a known toxin, and is considered a viable candidate for a toxin gene block. Clique b does not have all necessary functions, and therefore is not considered to be a candidate. Clique c contains all necessary functions and therefore is considered a candidate for a bacteriocin gene block or operon even though there is no homology-detected toxin gene

To identify gene blocks that are candidates for bacteriocin biosynthesis, we used a clique filter. A clique is a complete subgraph where any two nodes are connected by an edge. We created graphs from the ±50 kb regions where genes are represented by nodes and for every pair of genes that are within 25 kb of each other, an edge is created between them. The detected cliques are estimated gene blocks such that all of the genes are within 25 kb of each other. This 25 kb threshold was based on the size of known bacteriocin gene clusters given in our gold standard data set. Figure 3 illustrates the use of the clique filter to identify potential bacteriocin gene blocks.

A major challenge in finding toxin genes is that due to the short length and low complexity of these peptides, many genes will be missed because of ORF calling errors or lack of similarity to known toxin genes. To overcome this problem, we organized all detected cliques into gene blocks with homologs to known toxin genes and gene blocks without known toxin genes. Cliques with known toxin genes are required to have at least one toxin gene and one transport gene. Cliques without any known toxin genes are required to have at least one of each of a modifier, transport, immunity, and regulator genes. In this way, we ensure that the cliques we identify have the needed components to present a putative bacteriocin biosynthetic locus, and the toxin gene can later be searched using less restrictive procedures.

Results and Discussion

The many characterized bacteriocins have seldom been experimentally validated in parallel in the multiple species which putatively code for their production, restricting our standard-of-truth data set to a small group of well-studied bacteriocins relative to the large number of organisms that produce them. In addition, estimating the false positive rate is difficult, requiring excessive de novo experimental validation. Therefore, to evaluate BOA’s performance, we compared the bacteriocins that we have found to BAGEL bacteriocins found in bacterial genomes.

To compare BOA against BAGEL, the toxins shared between BOA and BAGEL were identified using BLAST using the default parameters. As shown in Fig. 4 and Table 1, BOA only missed 22 (5 %) bacteriocins that BAGEL detected, while predicting 457 (95 %) in agreement with BAGEL. In addition, BOA was able to predict over 1003 more putative bacteriocins on bacterial genomes than BAGEL. In addition, BOA identified 83 regions that are highly likely to be associated with bacteriocin production.

Fig. 4
figure 4

95 % (457 out of 479) of BAGEL toxins were predicted by BOA. BOA predicted an additional 1003 toxins throughout bacterial genomes that are not listed in BAGEL. Twenty-two BAGEL toxins were not predicted by BOA

Table 1 Number of gene blocks detected by BAGEL and BOA

The detected gene blocks were classified into five groups: (1) gene blocks with all five functional classes (toxin, modifier, immunity, regulator, transport), (2) gene blocks with only four functions, (3) three functions, (4) two functions, and (5) unknown toxins. Figure 5 shows these findings.

Fig. 5
figure 5

Gene blocks were classified by the number of detected functions. a number of total genes found; b gene counts per detected block

Previously it has been established that every bacteriocin locus needs at minimum a toxin gene and an immunity gene [24]. The gene blocks in the first four groups in Fig. 5 have at least one toxin and one transport gene. The final group of gene blocks do not have any identified bacteriocin genes, but each detected gene block is required to have all of the other genes. This final group contains likely candidates for bacteriocin-associated gene blocks that do not yet have a known, identified bacteriocin. From these findings, it is evident that genes categorized as transport genes were identified to be the most common type of context gene.

Interestingly, the three species harboring the greatest number of predicted bacteriocin-associated gene blocks in our screen inhabit different ecological niches (Table 2). Streptococcus equi subsp. zooepidemicus is a common colonizer of the respiratory tract with the capacity for opportunistic infection in a variety of domesticated animals and sometimes severe infections in humans following zoonotic transmission [25]. Streptomyces griseus is a soil-dwelling bacterium that has been studied and utilized in the biotechnology industry for production of numerous secondary metabolites including the first aminoglycoside antibiotic, streptomycin [26]. Finally, Leifsonia xyli is a pathogenic obligate colonizer of the xylem of host plants, causing economically damaging ratoon stunting disease in sugarcane [27]. The radically different environments in which these bacteria reside suggest that predicted bacteriocins must have distinct functions, specific organism targets, or both. Likewise, functional validation and characterization of these kinds of predicted bacteriocins must take into account the niche in which its producing organism resides, probing functionalities that target ecologically relevant target organisms.

Table 2 Bacterial species with the highest number of BOA-detected gene blocks

Among the gene blocks identified by BOA was the recently described and experimentally characterized caynothecamide biosynthetic locus of Cyanothece sp. PCC 7425 (GenBank: CP001344.1) [28]. This gene block, part of the patellamide family, has nine predicted precursor peptide ORFs with conserved N-termini and divergent C-termini likely resulting from repeated precursor duplication and divergence. Other bacteriocin clusters have been described with only one precursor peptide duplication, and such examples may represent an early step toward the substrate elaboration displayed by the cyanothecamides [29]. Interestingly, most of the cyanothecamide putative precursors lack the canonical pentapeptide motif required for patellamide maturation, suggesting that inclusion of these sequences in the BOA gold standard set could expand the set of identified putative toxin genes to include other non-canonical substrates. Only two of the nine cyanothecamide putative toxin genes have been experimentally implicated as precursors to identifiable mature patellamide-like compounds [30]. Yet, the capacity for biosynthetic machinery to modify substrate peptides with suitable N-terminal domains despite drastic variability in the C-terminal portion of the peptide has been demonstrated in other bacteriocins [31, 32]. The features of this particular gene block raise the possibility that bacteriocin loci encoding post-translationally modified peptides could, through elaboration of sequence diversity in multiple cognate peptide substrates, confer a greater breadth of functional diversity to producing organisms than previously appreciated [33].

Within the genome of the important human pathogen Group A Streptococcus from which two members of our gold standard set were obtained (Streptolysin S and Salivaricin A), BOA also identified the gallidermin-related lantibiotic Streptin [34]. Despite the experimental validation of Streptin as an active bacteriocin, little further insight has been gained into the role of Streptin with respect to pathogenic infection or colonization dynamics [35]. Identification and subsequent experimental validation of bacteriocins in important human pathogens like Group A Streptococcus will likely yield insights into the biology and biochemistry of pathogenic colonization, especially given the current explosion of interest in the human microbiome and probiotic disease interventions.

Of the 1054 species with identified bacteriocins, only 11 species were from the domain Archaea out of 360 Archaea genomes in GenBank. From the 1043 bacterial species with bacteriocins identified, the majority of them are identified as either Proteobacteria or Firmicutes. The exact breakdown of the phyla and their corresponding mean function counts is shown in Table 3, while a breakdown at the class level is shown in Table 4. It is important to note that our finding does not imply that most bacteriocin producing bacteria are Firmicutes and Proteobacteria, or that bacteriocins are rare in Archaea. It is more likely that previous research in identifying bacteriocins was biased towards the former two phyla.

Table 3 Mean counts per gene block over all phyla that have more than 5 genomes
Table 4 Mean counts per gene block over all classes that have more than 10 genomes

From Table 3, it is apparent that mean distribution of gene types between phyla are very different. For instance, Firmicutes have significantly more immunity genes than any of the other phyla. Also, gene blocks found in Firmicutes, Actinobacteria and Cyanobacteria have a higher toxin gene count than other bacterial phyla. A full survey of putative bacteriocin gene blocks is provided in the supplementary data. Figure 6 shows an overview of the distribution of putative bacteriocin blocks identified by BOA.

Fig. 6
figure 6

A tree of all of the species with detected bacteriocins. The five inner rings show gene abundances for immunity, modifier, regulator, toxin and transport genes. The outer ring shows the total number of bacteriocin-associated gene blocks detected for each bacterial species. This information is available in tabular form in the supplemental material


To our knowledge, BOA is the first time a curated data set has been established for bacteriocin context genes. Even with seven different bacteriocin gene blocks as a gold standard set, our method has identified several hundred putative bacteriocin gene blocks, most of which have not been previously annotated. We believe that even more homologous gene blocks can be identified with a larger validated database of context genes. Additionally, upon manual inspection of some predicted blocks, some nearby putative ORFs appeared likely to be involved in predicted bacteriocin biosynthesis but were not identified by BOA. This may permit a manually curated strategy whereby one may subjectively designate putative context genes from a BOA-predicted bacteriocin gene block and feed the more richly-annotated gene block back into BOA as a new member of the now-expanded gold standard set. Such an approach could serve to iteratively extend the phylogenetic boundaries of BOA in a controlled way each time the limits of similarity are reached. We are currently exploring the merits of this approach. The widespread prevalence and diversity of bacteria having bacteriocins and their highly varied lifestyles suggest early ancestry and a subsequent adaptation of these gene blocks to the specific functional needs of the bacteria producing them. Previous studies have shown that bacteriocin context genes tend to be in bacteria that share an environmental niche despite phylogenetic disparity, suggesting that functional adaptation is likely to be a major mechanism for bacteriocin design and production [12].

BOA was able to identify the majority of bacteriocin gene clusters that BAGEL identified. BOA also predicted over seven times more bacteriocins in whole bacterial genomes than BAGEL, including many identifiable bacteriocin gene blocks with experimental validation. Because BOA encompasses a large number of taxa, the information in BOA can also be used to explore the evolutionary development of bacteriocin gene blocks and how different biosynthetic loci have evolved in different clades. Finally, BOA has assembled the first dataset that contains information about homologous bacteriocin genes and their associated gene clusters.


  1. Farnsworth NR, Akerele O, Bingel AS, Soejarto DD, Guo Z. Medicinal plants in therapy. Bull World Health Organ. 1985; 63:965–81.

    CAS  PubMed  PubMed Central  Google Scholar 

  2. AL H. Natural products in drug discovery. Drug Discov Today. 2008; Oct;13(19–20):894–901.

    Google Scholar 

  3. Li JW-HW, Vederas JC. Drug discovery and natural products: end of an era or an endless frontier?Sci (New York, NY). 2009; 325(5937):161–5. doi:10.1126/science.1168243.

    Article  CAS  Google Scholar 

  4. Willey JM, van der Donk WA. Lantibiotics: peptides of diverse structure and function. Ann Rev Microbiol. 2007; 61:477–501.

    Article  CAS  Google Scholar 

  5. Guder A, Wiedemann I, Sahl HG. Posttranslationally modified bacteriocins–the lantibiotics. Biopolymers. 2000; 55(1):62–73. doi:10.1002/1097-0282.

    Article  CAS  PubMed  Google Scholar 

  6. Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep. 2013; 30(1):108–60. doi:10.1039/c2np20085f.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Perez RH, Zendo T, Sonomoto K. Novel bacteriocins from lactic acid bacteria (LAB): various structures and applications. Microb Cell Fact. 2014 Suppl1:S3.

  8. van Heel AJ, de Jong A, Montalbán-López M, Kok J, Kuipers OP. Bagel3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 2013; 41(W1):448–53. doi:10.1093/nar/gkt391.

    Article  Google Scholar 

  9. Hammami R, Zouhir A, Hamida JB, Fliss I. Bactibase: a new web-accessible database for bacteriocin characterization. BMC Microbiol. 2007; 7(1):89. doi:10.1186/1471-2180-7-89.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R, Takano E, et al. antismash 2.0–a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013; 41(W1):204–12. doi:10.1093/nar/gkt449.

    Article  Google Scholar 

  11. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389–402. doi:10.1093/nar/25.17.3389.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lee SW, Mitchell DA, Markley AL, Hensler ME, Gonzalez D, Wohlrab A, et al. Discovery of a widely distributed toxin biosynthetic gene cluster. Proc Nat Acad Sci. 2008; 105(15):5879–84. doi:10.1073/pnas.0801338105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2014; 42(D1):32–7. doi:10.1093/nar/gkt1030.

    Article  CAS  Google Scholar 

  14. Coburn PS, Gilmore MS. The enterococcus faecalis cytolysin: a novel toxin active against eukaryotic and prokaryotic cells. Cell Microbiol. 2003; 5(10):661–9.

    Article  CAS  PubMed  Google Scholar 

  15. Gálvez A, Maqueda M, Martínez-Bueno M, Valdivia E. Bactericidal and bacteriolytic action of peptide antibiotic as-48 against gram-positive and gram-negative bacteria and other organisms. Res Microbiol. 1989; 140(1):57–68.

    Article  PubMed  Google Scholar 

  16. Salomón RA, Farías RN. Microcin 25, a novel antimicrobial peptide produced by escherichia coli. J Bacteriol. 1992; 174(22):7428–35.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Kaletta C, Entian KD. Nisin, a peptide antibiotic: cloning and sequencing of the nisa gene and posttranslational processing of its peptide product. J Bacteriol. 1989; 171(3):1597–601.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Nizet V, Beall B, Bast DJ, Datta V, Kilburn L, Low DE, et al. Genetic locus for streptolysin s production by group a streptococcus. Infect Immun. 2000; 68(7):4245–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Ross KF, Ronson CW, Tagg JR. Isolation and characterization of the lantibiotic salivaricin a and its structural gene sala from streptococcus salivarius 20p3. Appl Environ Microbiol. 1993; 59(7):2014–21.

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Shoji J, Hinoo H, Wakisaka Y, Koizumi K, Mayama M. Isolation of three new antibiotics, thiocillins i, ii and iii, related to micrococcin p. studies on antibiotics from the genus bacillus. viii. J Antibiot. 1976; 29(4):366–74.

    Article  CAS  PubMed  Google Scholar 

  21. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England). 2006; 22(13):1658–9. doi:10.1093/bioinformatics/btl158.

    Article  CAS  Google Scholar 

  22. Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772–80. doi:10.1093/molbev/mst010.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Eddy SR. Accelerated profile hmm searches. PLoS Comput Biol. 2011; 7(10):1002195. doi:10.1371/journal.pcbi.1002195.

    Article  CAS  Google Scholar 

  24. Dimov SHN, Ivanova P. Genetics of bacteriocins biosynthesis by lactic acid bacteria. Biotechnol Biotechnol Equip. 2005; 19(supplement 2):4–10.

    Article  CAS  Google Scholar 

  25. Minces LR, Brown PJ, Veldkamp PJ. Human meningitis from streptococcus equi subsp. zooepidemicus acquired as zoonoses. Epidemiol Infect. 2011; 139(3):406–10.

    Article  CAS  PubMed  Google Scholar 

  26. Ohnishi Y, Ishikawa J, Hara H, Suzuki H, Ikenoya M, Ikeda H, et al. Genome sequence of the streptomycin-producing microorganism streptomyces griseus ifo 13350. J Bacteriol. 2008; 190(11):4050–60. doi:10.1128/jb.00204-08.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ghai M, Singh V, Martin LA, McFarlane SA, van Antwerpen T, Rutherford RS. A rapid and visual loop-mediated isothermal amplification assay to detect leifsonia xyli subsp. xyli targeting a transposase gene. Lett Appl Microbiol. 2014; 59(6):648–57.

    Article  CAS  PubMed  Google Scholar 

  28. Donia MS, Schmidt EW. Linking chemistry and genetics in the growing cyanobactin natural products family. Chem Biol. 2011; 18(4):508–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Tabata A, Nakano K, Ohkura K, Tomoyasu T, Kikuchi K, Whiley RA, et al. Novel twin streptolysin s-like peptides encoded in the sag operon homologue of beta-hemolytic streptococcus anginosus. J Bacteriol. 2013; 195(5):1090–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Houssen WE, Koehnke J, Zollman D, Vendome J, Raab A, Smith MC, et al. The discovery of new cyanobactins from cyanothece pcc 7425 defines a new signature for processing of patellamides. Chembiochem : Eur J Chem Biol. 2012; 13(18):2683–9.

    Article  CAS  Google Scholar 

  31. Mitchell DA, Lee SW, Pence MA, Markley AL, Limm JD, Nizet V, et al. Structural and functional dissection of the heterocyclic peptide cytotoxin streptolysin s. J Biol Chem. 2009; 284(19):13004–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Haft DH, Basu MKK, Mitchell DA. Expansion of ribosomally produced natural products: a nitrile hydratase- and Nif11-related precursor family. BMC Biology. 2010; 8:70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Sardar D, Pierce E, McIntosh JA, Schmidt EW. Recognition sequences and substrate evolution in cyanobactin biosynthesis. ACS Synth Biol. 2015; 4(2):167–76. doi:10.1021/sb500019b.

    Article  CAS  PubMed  Google Scholar 

  34. Karaya K, Shimizu T, Taketo A. New gene cluster for lantibiotic streptin possibly involved in streptolysin s formation. J Biochem. 2001; 129(5):769–75.

    Article  CAS  PubMed  Google Scholar 

  35. Wescombe PA, Tagg JR. Purification and characterization of streptin, a type a1 lantibiotic produced by streptococcus pyogenes. Appl Environ Microbiol. 2003; 69(5):2737–47. doi:10.1128/aem.69.5.2737-2747.2003.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We are grateful to Sean Eddy and Rob Finn for the use of the HMMER logo. Some images used in Fig. 1 are reproduced from Wikimedia Commons under CC-BY 3.0 or 4.0 license. We gratefully acknowledge the support of the Miami University High Performance Computing Facility. This work was supported, in part, by National Science Foundation grant ABI-1146960 (IF) and NIH 1DP2OD008468-01. (SWL). SDF was supported, in part by training grant NIH T32GM075762. IF dedicates his contribution to the memory of Gary R Janssen.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Iddo Friedberg.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

IF, SWL and JTM conceived the idea. IF and JTM designed the experiment. JTM wrote the code and ran the analyses. SWL and SDF provided the gold-standard data and interpreted the results. All authors wrote the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Morton, J.T., Freed, S.D., Lee, S.W. et al. A large scale prediction of bacteriocin gene blocks suggests a wide functional spectrum for bacteriocins. BMC Bioinformatics 16, 381 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: