Research Article | Open | Published:
A large scale prediction of bacteriocin gene blocks suggests a wide functional spectrum for bacteriocins
BMC Bioinformaticsvolume 16, Article number: 381 (2015)
Bacteriocins are peptide-derived molecules produced by bacteria, whose recently-discovered functions include virulence factors and signaling molecules as well as their better known roles as antibiotics. To date, close to five hundred bacteriocins have been identified and classified. Recent discoveries have shown that bacteriocins are highly diverse and widely distributed among bacterial species. Given the heterogeneity of bacteriocin compounds, many tools struggle with identifying novel bacteriocins due to their vast sequence and structural diversity. Many bacteriocins undergo post-translational processing or modifications necessary for the biosynthesis of the final mature form. Enzymatic modification of bacteriocins as well as their export is achieved by proteins whose genes are often located in a discrete gene cluster proximal to the bacteriocin precursor gene, referred to as context genes in this study. Although bacteriocins themselves are structurally diverse, context genes have been shown to be largely conserved across unrelated species.
Using this knowledge, we set out to identify new candidates for context genes which may clarify how bacteriocins are synthesized, and identify new candidates for bacteriocins that bear no sequence similarity to known toxins. To achieve these goals, we have developed a software tool, Bacteriocin Operon and gene block Associator (BOA) that can identify homologous bacteriocin associated gene blocks and predict novel ones. BOA generates profile Hidden Markov Models from the clusters of bacteriocin context genes, and uses them to identify novel bacteriocin gene blocks and operons.
Results and conclusions
We provide a novel dataset of predicted bacteriocins and context genes. We also discover that several phyla have a strong preference for bacteriocin genes, suggesting distinct functions for this group of molecules.
Natural Product discovery has been a cornerstone of many pharmaceuticals and therapeutics. It is estimated that about 80 % of all drugs are either natural products or derived analogs . These compounds encompass antibiotics (penicillin, tetracycline, erythromycin), anti-infectives (avermectin, quinine, artemisinin), pharmaceuticals (lovastatin, cyclosporine, rapamycin) and anticancer drugs (taxol, doxorubicin) . Yet, despite this long history of success, pharmaceutical efforts in natural products research has decreased steadily between 2001 and 2008 . Financial pressure from drug companies as well as difficulties in isolation and identification of natural compounds have severely limited the discovery rate of these important sources.
Bacteriocins are a large class of peptide-based antibiotics that encompass an extraordinary amount of chemical, structural, and functional diversity . The structures of these compounds have revealed that a large number comprise a class of highly modified polypeptides. Interestingly, many bacteriocins are synthesized ribosomally as precursor peptides and are subsequently modified post-translationally to yield their biologically active forms. Post-translational modification serves to confer specific chemical properties that could not be obtained by peptide synthesis alone. Furthermore, post-translational modifications can be used as a mechanism to control the activation of the toxic activities of the bacteriocin, and thus exert a level of control and host immunity .
Genome mining has been an important technological resource in the discovery of novel natural products, including bacteriocins. Bacteriocin-like peptides are highly attractive candidates for genome mining, as these natural products are genetically encoded with nearby genes encoding their corresponding modifying enzymes. Proximity to genes encoding known modifying enzymes can often aid in the identification of peptide biosynthesis gene clusters . In many cases, several metabolites have been identified from “cryptic” or “orphan gene clusters” . These cryptic gene clusters have demonstrated that new, as yet uncharacterized enzymology is likely to be involved in the assembly of the final natural product, likely leading to a greater diversity of bacterial peptides than previously appreciated.
Several web-based gene mining and annotation tools have been developed to aid in the identification, characterization, and classification of novel bacteriocins. These include mining tools such as BAGEL  and bacteriocin repositories such as BACTIBASE . Anti-SMASH is a recently developed website that expands genome mining to not only bacteriocins but a host of other genetically-identifiable antibiotics and other secondary metabolites  Although gene mining resources such as BAGEL have existed for many years, one major limitation in bacteriocin gene mining that is becoming evident as more bacteriocins are identified is the lack of homology in the genes encoding the actual precursor peptides for bacteriocins. Detecting novel bacteriocin producing genes through bioinformatic methods focused on precursor peptide discovery therefore remains a significant challenge.
The bacteriocins’ structure and sequence diversity make them difficult to detect using sequence homology algorithms such as BLAST . In addition, their short sequence length makes them difficult to detect with ORF calling tools. Programs such as BAGEL have made great strides in identifying bacteriocins by searching for well-defined conserved motifs within the bacteriocin toxin sequences and adjacent context genes using Pfam databases . However, the number of hypothetical bacteriocin genes that can be identified using such a method is highly limiting as these motifs are not necessarily known or conserved. Given that bacteriocin genes are highly diverse, it is likely that mining genomes for potential bacteriocin genes using sequence or profile similarity are likely to miss a large number of as-yet identified bacteriocin compounds.
Genes that encode bacteriocin precursors are often proximal in genomic sequence to accessory genes that are required to modify and secrete the bacteriocin peptide. These co-located gene blocks often contain genes that encode enzymes which perform post-translational modification and maturation of the bacteriocin product. Additionally, many of these gene blocks contain genes encoding various transporter proteins, presumably linked to specific export of the mature bacteriocin. Unlike the bacteriocin genes, these context genes are conserved across taxa, as they belong to conserved families of enzymes and other modifying proteins, as well as to transporters such as ATP-binding cassette (ABC) family of transporters. In the case of the large family of thiazole and oxazole modified bacteriocins, such as Microcin B17 and Streptolysin S, each gene block shares both conserved modification proteins as well as transport machinery genes that are located close to the bacteriocin gene [6, 12].
Here we present a new methodology that takes advantage of the conservation of context genes to identify locations of gene blocks associated with bacteriocins. Our approach is to identify context genes in addition to toxin genes without restricting our tool to finding homologs in annotated databases. To the best of our knowledge, there is no available public database that contains the complete bacteriocin gene clusters and their associated context genes. Therefore, BOA is useful for the construction of such a tool as well as for mining genomes for putative bacteriocins. We provide bacteriocin gene block predictions for 2773 genomes, in which the bacteriocin gene blocks may be browsed. The method is implemented as a software tool named Bacteriocin Operon and gene block Associator (BOA).
We used the bacterial and archaeal genome files from GenBank 2014  for our dataset. Since we are interested in identifying bacteriocin associated gene clusters, we did not analyze partial contig files, and only whole bacterial chromosomes and plasmids were analyzed. Bacteriocins identified by BAGEL3 and seven experimentally identified bacteriocin associated gene blocks were used as a standard of truth or a “gold standard” data set. All of the context genes within these gene blocks were placed into five functional categories toxins, modifiers, immunity, transport, and regulation. Toxin genes refer to the open reading frames encoding the toxin precursor peptide; modifiers perform post-translational modification to the protoxin, and can include enzymes which are involved in amino acid modification (cyclodehydration, lanthionine synthesis, as well as leader peptide processing enzymes); immunity genes that prevent the toxin from affecting the host bacterial cells; transport genes create transporter proteins to move the toxin outside of the cell, and regulator genes control the expression of toxin proteins and other genes in the operon .
An overview of the pipeline is shown in Fig. 1. The method in detail is as follows:
Construct a set of experimentally-verified context genes. This we call the Literature Curated Set. The LC Set includes seven known bacteriocin associated gene blocks. The known gene blocks comprised enterococcal cytolysin and enterocin AS-48 from Enterococcus faecalis, microcin J25 from Escherichia coli, Nisin A from Lactococcus lactis subsp. lactis, streptolysin S and salivaricin A from Streptococcus pyogenes, and thiocillin from Bacillus cereus [14–20]. These blocks are representative of several important classes of bacteriocins, namely lantibiotics (enterococcal cytolysin, Nisin A, salivaricin A), thiopeptides (thiocillin), thiazole/oxazole-modified microcins (TOMMs; streptolysin S), lassoed tail peptides (microcin J25), and circular bacteriocins (enterocin AS-48). While bacteriocin biosynthetic gene blocks are widely distributed among prokarya, the major structurally-related groups are few and their relevant context genes are largely represented in the LC Set. The genes were categorized as toxins, modifiers, immunity, transport, or regulation. The genes making up the LC Set are available in the supplemental material.
BLAST the LC Set genes and the BAGEL toxin genes against the bacterial genome set.
Select ORFs with (a) e-value <10−5 and (b) within ± 50 kb of the homologs to the toxin genes (whether from the LC set or the BAGEL set).
Assign each homologous ORF according to the category of the gene to which it is found to be similar: toxin, modifier, immunity, transport, or regulation. Ambiguities are resolved by taking the best hit based on the BLAST E-value.
Build profile HMMs: cluster the sequences in each bin independently using CD-HIT , then use MAFFT  to perform a multiple alignment in each homology cluster, and HMMER  to build pHMMs from the multiple alignments.
Run the resulting pHMM’s against the bacterial genome files. Since most of the HMMER hits are probably false positives, we set a score threshold to filter them out. We determined this threshold by obtaining HMMER scores for all of the BAGEL bacteriocins found on the bacterial genomes and choosing the lowest BAGEL score as the threshold. See Fig. 2
Use a clique filter (see below) to identify those genes that are close together and therefore candidates for bacteriocin gene blocks. See Fig. 3.
To identify gene blocks that are candidates for bacteriocin biosynthesis, we used a clique filter. A clique is a complete subgraph where any two nodes are connected by an edge. We created graphs from the ±50 kb regions where genes are represented by nodes and for every pair of genes that are within 25 kb of each other, an edge is created between them. The detected cliques are estimated gene blocks such that all of the genes are within 25 kb of each other. This 25 kb threshold was based on the size of known bacteriocin gene clusters given in our gold standard data set. Figure 3 illustrates the use of the clique filter to identify potential bacteriocin gene blocks.
A major challenge in finding toxin genes is that due to the short length and low complexity of these peptides, many genes will be missed because of ORF calling errors or lack of similarity to known toxin genes. To overcome this problem, we organized all detected cliques into gene blocks with homologs to known toxin genes and gene blocks without known toxin genes. Cliques with known toxin genes are required to have at least one toxin gene and one transport gene. Cliques without any known toxin genes are required to have at least one of each of a modifier, transport, immunity, and regulator genes. In this way, we ensure that the cliques we identify have the needed components to present a putative bacteriocin biosynthetic locus, and the toxin gene can later be searched using less restrictive procedures.
Results and Discussion
The many characterized bacteriocins have seldom been experimentally validated in parallel in the multiple species which putatively code for their production, restricting our standard-of-truth data set to a small group of well-studied bacteriocins relative to the large number of organisms that produce them. In addition, estimating the false positive rate is difficult, requiring excessive de novo experimental validation. Therefore, to evaluate BOA’s performance, we compared the bacteriocins that we have found to BAGEL bacteriocins found in bacterial genomes.
To compare BOA against BAGEL, the toxins shared between BOA and BAGEL were identified using BLAST using the default parameters. As shown in Fig. 4 and Table 1, BOA only missed 22 (5 %) bacteriocins that BAGEL detected, while predicting 457 (95 %) in agreement with BAGEL. In addition, BOA was able to predict over 1003 more putative bacteriocins on bacterial genomes than BAGEL. In addition, BOA identified 83 regions that are highly likely to be associated with bacteriocin production.
The detected gene blocks were classified into five groups: (1) gene blocks with all five functional classes (toxin, modifier, immunity, regulator, transport), (2) gene blocks with only four functions, (3) three functions, (4) two functions, and (5) unknown toxins. Figure 5 shows these findings.
Previously it has been established that every bacteriocin locus needs at minimum a toxin gene and an immunity gene . The gene blocks in the first four groups in Fig. 5 have at least one toxin and one transport gene. The final group of gene blocks do not have any identified bacteriocin genes, but each detected gene block is required to have all of the other genes. This final group contains likely candidates for bacteriocin-associated gene blocks that do not yet have a known, identified bacteriocin. From these findings, it is evident that genes categorized as transport genes were identified to be the most common type of context gene.
Interestingly, the three species harboring the greatest number of predicted bacteriocin-associated gene blocks in our screen inhabit different ecological niches (Table 2). Streptococcus equi subsp. zooepidemicus is a common colonizer of the respiratory tract with the capacity for opportunistic infection in a variety of domesticated animals and sometimes severe infections in humans following zoonotic transmission . Streptomyces griseus is a soil-dwelling bacterium that has been studied and utilized in the biotechnology industry for production of numerous secondary metabolites including the first aminoglycoside antibiotic, streptomycin . Finally, Leifsonia xyli is a pathogenic obligate colonizer of the xylem of host plants, causing economically damaging ratoon stunting disease in sugarcane . The radically different environments in which these bacteria reside suggest that predicted bacteriocins must have distinct functions, specific organism targets, or both. Likewise, functional validation and characterization of these kinds of predicted bacteriocins must take into account the niche in which its producing organism resides, probing functionalities that target ecologically relevant target organisms.
Among the gene blocks identified by BOA was the recently described and experimentally characterized caynothecamide biosynthetic locus of Cyanothece sp. PCC 7425 (GenBank: CP001344.1) . This gene block, part of the patellamide family, has nine predicted precursor peptide ORFs with conserved N-termini and divergent C-termini likely resulting from repeated precursor duplication and divergence. Other bacteriocin clusters have been described with only one precursor peptide duplication, and such examples may represent an early step toward the substrate elaboration displayed by the cyanothecamides . Interestingly, most of the cyanothecamide putative precursors lack the canonical pentapeptide motif required for patellamide maturation, suggesting that inclusion of these sequences in the BOA gold standard set could expand the set of identified putative toxin genes to include other non-canonical substrates. Only two of the nine cyanothecamide putative toxin genes have been experimentally implicated as precursors to identifiable mature patellamide-like compounds . Yet, the capacity for biosynthetic machinery to modify substrate peptides with suitable N-terminal domains despite drastic variability in the C-terminal portion of the peptide has been demonstrated in other bacteriocins [31, 32]. The features of this particular gene block raise the possibility that bacteriocin loci encoding post-translationally modified peptides could, through elaboration of sequence diversity in multiple cognate peptide substrates, confer a greater breadth of functional diversity to producing organisms than previously appreciated .
Within the genome of the important human pathogen Group A Streptococcus from which two members of our gold standard set were obtained (Streptolysin S and Salivaricin A), BOA also identified the gallidermin-related lantibiotic Streptin . Despite the experimental validation of Streptin as an active bacteriocin, little further insight has been gained into the role of Streptin with respect to pathogenic infection or colonization dynamics . Identification and subsequent experimental validation of bacteriocins in important human pathogens like Group A Streptococcus will likely yield insights into the biology and biochemistry of pathogenic colonization, especially given the current explosion of interest in the human microbiome and probiotic disease interventions.
Of the 1054 species with identified bacteriocins, only 11 species were from the domain Archaea out of 360 Archaea genomes in GenBank. From the 1043 bacterial species with bacteriocins identified, the majority of them are identified as either Proteobacteria or Firmicutes. The exact breakdown of the phyla and their corresponding mean function counts is shown in Table 3, while a breakdown at the class level is shown in Table 4. It is important to note that our finding does not imply that most bacteriocin producing bacteria are Firmicutes and Proteobacteria, or that bacteriocins are rare in Archaea. It is more likely that previous research in identifying bacteriocins was biased towards the former two phyla.
From Table 3, it is apparent that mean distribution of gene types between phyla are very different. For instance, Firmicutes have significantly more immunity genes than any of the other phyla. Also, gene blocks found in Firmicutes, Actinobacteria and Cyanobacteria have a higher toxin gene count than other bacterial phyla. A full survey of putative bacteriocin gene blocks is provided in the supplementary data. Figure 6 shows an overview of the distribution of putative bacteriocin blocks identified by BOA.
To our knowledge, BOA is the first time a curated data set has been established for bacteriocin context genes. Even with seven different bacteriocin gene blocks as a gold standard set, our method has identified several hundred putative bacteriocin gene blocks, most of which have not been previously annotated. We believe that even more homologous gene blocks can be identified with a larger validated database of context genes. Additionally, upon manual inspection of some predicted blocks, some nearby putative ORFs appeared likely to be involved in predicted bacteriocin biosynthesis but were not identified by BOA. This may permit a manually curated strategy whereby one may subjectively designate putative context genes from a BOA-predicted bacteriocin gene block and feed the more richly-annotated gene block back into BOA as a new member of the now-expanded gold standard set. Such an approach could serve to iteratively extend the phylogenetic boundaries of BOA in a controlled way each time the limits of similarity are reached. We are currently exploring the merits of this approach. The widespread prevalence and diversity of bacteria having bacteriocins and their highly varied lifestyles suggest early ancestry and a subsequent adaptation of these gene blocks to the specific functional needs of the bacteria producing them. Previous studies have shown that bacteriocin context genes tend to be in bacteria that share an environmental niche despite phylogenetic disparity, suggesting that functional adaptation is likely to be a major mechanism for bacteriocin design and production .
BOA was able to identify the majority of bacteriocin gene clusters that BAGEL identified. BOA also predicted over seven times more bacteriocins in whole bacterial genomes than BAGEL, including many identifiable bacteriocin gene blocks with experimental validation. Because BOA encompasses a large number of taxa, the information in BOA can also be used to explore the evolutionary development of bacteriocin gene blocks and how different biosynthetic loci have evolved in different clades. Finally, BOA has assembled the first dataset that contains information about homologous bacteriocin genes and their associated gene clusters.
Farnsworth NR, Akerele O, Bingel AS, Soejarto DD, Guo Z. Medicinal plants in therapy. Bull World Health Organ. 1985; 63:965–81.
AL H. Natural products in drug discovery. Drug Discov Today. 2008; Oct;13(19–20):894–901.
Li JW-HW, Vederas JC. Drug discovery and natural products: end of an era or an endless frontier?Sci (New York, NY). 2009; 325(5937):161–5. doi:10.1126/science.1168243.
Willey JM, van der Donk WA. Lantibiotics: peptides of diverse structure and function. Ann Rev Microbiol. 2007; 61:477–501.
Guder A, Wiedemann I, Sahl HG. Posttranslationally modified bacteriocins–the lantibiotics. Biopolymers. 2000; 55(1):62–73. doi:10.1002/1097-0282.
Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat Prod Rep. 2013; 30(1):108–60. doi:10.1039/c2np20085f.
Perez RH, Zendo T, Sonomoto K. Novel bacteriocins from lactic acid bacteria (LAB): various structures and applications. Microb Cell Fact. 2014 Suppl1:S3.
van Heel AJ, de Jong A, Montalbán-López M, Kok J, Kuipers OP. Bagel3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 2013; 41(W1):448–53. doi:10.1093/nar/gkt391.
Hammami R, Zouhir A, Hamida JB, Fliss I. Bactibase: a new web-accessible database for bacteriocin characterization. BMC Microbiol. 2007; 7(1):89. doi:10.1186/1471-2180-7-89.
Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R, Takano E, et al. antismash 2.0–a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013; 41(W1):204–12. doi:10.1093/nar/gkt449.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389–402. doi:10.1093/nar/25.17.3389.
Lee SW, Mitchell DA, Markley AL, Hensler ME, Gonzalez D, Wohlrab A, et al. Discovery of a widely distributed toxin biosynthetic gene cluster. Proc Nat Acad Sci. 2008; 105(15):5879–84. doi:10.1073/pnas.0801338105.
Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2014; 42(D1):32–7. doi:10.1093/nar/gkt1030.
Coburn PS, Gilmore MS. The enterococcus faecalis cytolysin: a novel toxin active against eukaryotic and prokaryotic cells. Cell Microbiol. 2003; 5(10):661–9.
Gálvez A, Maqueda M, Martínez-Bueno M, Valdivia E. Bactericidal and bacteriolytic action of peptide antibiotic as-48 against gram-positive and gram-negative bacteria and other organisms. Res Microbiol. 1989; 140(1):57–68.
Salomón RA, Farías RN. Microcin 25, a novel antimicrobial peptide produced by escherichia coli. J Bacteriol. 1992; 174(22):7428–35.
Kaletta C, Entian KD. Nisin, a peptide antibiotic: cloning and sequencing of the nisa gene and posttranslational processing of its peptide product. J Bacteriol. 1989; 171(3):1597–601.
Nizet V, Beall B, Bast DJ, Datta V, Kilburn L, Low DE, et al. Genetic locus for streptolysin s production by group a streptococcus. Infect Immun. 2000; 68(7):4245–54.
Ross KF, Ronson CW, Tagg JR. Isolation and characterization of the lantibiotic salivaricin a and its structural gene sala from streptococcus salivarius 20p3. Appl Environ Microbiol. 1993; 59(7):2014–21.
Shoji J, Hinoo H, Wakisaka Y, Koizumi K, Mayama M. Isolation of three new antibiotics, thiocillins i, ii and iii, related to micrococcin p. studies on antibiotics from the genus bacillus. viii. J Antibiot. 1976; 29(4):366–74.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England). 2006; 22(13):1658–9. doi:10.1093/bioinformatics/btl158.
Katoh K, Standley DM. Mafft multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013; 30(4):772–80. doi:10.1093/molbev/mst010.
Eddy SR. Accelerated profile hmm searches. PLoS Comput Biol. 2011; 7(10):1002195. doi:10.1371/journal.pcbi.1002195.
Dimov SHN, Ivanova P. Genetics of bacteriocins biosynthesis by lactic acid bacteria. Biotechnol Biotechnol Equip. 2005; 19(supplement 2):4–10.
Minces LR, Brown PJ, Veldkamp PJ. Human meningitis from streptococcus equi subsp. zooepidemicus acquired as zoonoses. Epidemiol Infect. 2011; 139(3):406–10.
Ohnishi Y, Ishikawa J, Hara H, Suzuki H, Ikenoya M, Ikeda H, et al. Genome sequence of the streptomycin-producing microorganism streptomyces griseus ifo 13350. J Bacteriol. 2008; 190(11):4050–60. doi:10.1128/jb.00204-08.
Ghai M, Singh V, Martin LA, McFarlane SA, van Antwerpen T, Rutherford RS. A rapid and visual loop-mediated isothermal amplification assay to detect leifsonia xyli subsp. xyli targeting a transposase gene. Lett Appl Microbiol. 2014; 59(6):648–57.
Donia MS, Schmidt EW. Linking chemistry and genetics in the growing cyanobactin natural products family. Chem Biol. 2011; 18(4):508–19.
Tabata A, Nakano K, Ohkura K, Tomoyasu T, Kikuchi K, Whiley RA, et al. Novel twin streptolysin s-like peptides encoded in the sag operon homologue of beta-hemolytic streptococcus anginosus. J Bacteriol. 2013; 195(5):1090–9.
Houssen WE, Koehnke J, Zollman D, Vendome J, Raab A, Smith MC, et al. The discovery of new cyanobactins from cyanothece pcc 7425 defines a new signature for processing of patellamides. Chembiochem : Eur J Chem Biol. 2012; 13(18):2683–9.
Mitchell DA, Lee SW, Pence MA, Markley AL, Limm JD, Nizet V, et al. Structural and functional dissection of the heterocyclic peptide cytotoxin streptolysin s. J Biol Chem. 2009; 284(19):13004–12.
Haft DH, Basu MKK, Mitchell DA. Expansion of ribosomally produced natural products: a nitrile hydratase- and Nif11-related precursor family. BMC Biology. 2010; 8:70.
Sardar D, Pierce E, McIntosh JA, Schmidt EW. Recognition sequences and substrate evolution in cyanobactin biosynthesis. ACS Synth Biol. 2015; 4(2):167–76. doi:10.1021/sb500019b.
Karaya K, Shimizu T, Taketo A. New gene cluster for lantibiotic streptin possibly involved in streptolysin s formation. J Biochem. 2001; 129(5):769–75.
Wescombe PA, Tagg JR. Purification and characterization of streptin, a type a1 lantibiotic produced by streptococcus pyogenes. Appl Environ Microbiol. 2003; 69(5):2737–47. doi:10.1128/aem.69.5.2737-2747.2003.
We are grateful to Sean Eddy and Rob Finn for the use of the HMMER logo. Some images used in Fig. 1 are reproduced from Wikimedia Commons under CC-BY 3.0 or 4.0 license. We gratefully acknowledge the support of the Miami University High Performance Computing Facility. This work was supported, in part, by National Science Foundation grant ABI-1146960 (IF) and NIH 1DP2OD008468-01. (SWL). SDF was supported, in part by training grant NIH T32GM075762. IF dedicates his contribution to the memory of Gary R Janssen.
The authors declare that they have no competing interests.
IF, SWL and JTM conceived the idea. IF and JTM designed the experiment. JTM wrote the code and ran the analyses. SWL and SDF provided the gold-standard data and interpreted the results. All authors wrote the manuscript. All authors read and approved the final manuscript.