Skip to main content

Table 1 Regular expression patterns used for the nine selected genes.

From: Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages

Gene

Search pattern

Major head

(?<!minor)\b(head|capsid)\b

Major tail

(?<!minor)\btail\b

Terminase (large subunit)

terminase|\bterL\b

Holin

\bholin\b

Lysin

\blysin\b

Tape measure

\btape\b|minor tail

Integrase

integrase

Portal protein

\bportal\b

Prohead protease

prohead AND protease†

  1. † Not a direct regular expression; "Prohead" and "protease" were searched separately and the results were combined using the AND operation provided by SynFPS.
  2. These patterns were matched against the CDS annotations of the phages retrieved from GenBank. Note that the search results were then refined via manual inspection. \w – alphanumeric character; \b – word boundary; | – 'or'; * – zero or more of the preceding character.