Skip to main content

Table 1 Sequence datasets used to generate training sets.

From: TMB-Hunt: An amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins

Training dataset

Sources

Initial number sequences

Sequences >120 AA

Size after redundancy removal

ntm

PDB-REPRDB [32]

3159

2290

1763

ahtm

Sanger all-alpha membrane datasets A, B and C [33]

189

166

132

bbtm

TC-DB [35], Uniprot [34] and PDB [5]

1126

1107

196

  1. Three training datasets were generated using sequences from various sources. Datasets were filtered for sequences of <120 AA and clustered to remove redundancy.