Skip to main content

Table 1 Characteristics of the experimental datasets. n is the number of mutated positions and k is the number of residues at each position

From: Application of fourier transform and proteochemometrics principles to protein engineering

Dataset

Size of dataset

n

k

Theoretical size of sequence space

Length of protein sequence

Cyt P450

242

8

3

6561

464–466

GLP-2

31

31

2

2.147 billion

33

Enterotoxin

12

40

2

1099.5 billion

233

TNF

21

17

[2, 7, 4, 6, 2, 9, 9, 9, 9, 9, 2, 2, 2, 2, 6, 8, 7]

213.3 billion

157

  1. The theoretical size of sequence space S is calculated as the product all k values for all mutated positions