Skip to main content

Table 1 Regular expression patterns used for the nine selected genes.

From: Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages

Gene Search pattern
Major head (?<!minor)\b(head|capsid)\b
Major tail (?<!minor)\btail\b
Terminase (large subunit) terminase|\bterL\b
Holin \bholin\b
Lysin \blysin\b
Tape measure \btape\b|minor tail
Integrase integrase
Portal protein \bportal\b
Prohead protease prohead AND protease
  1. † Not a direct regular expression; "Prohead" and "protease" were searched separately and the results were combined using the AND operation provided by SynFPS.
  2. These patterns were matched against the CDS annotations of the phages retrieved from GenBank. Note that the search results were then refined via manual inspection. \w – alphanumeric character; \b – word boundary; | – 'or'; * – zero or more of the preceding character.