Skip to main content

Table 1 B. subtilis mutants and activity fold changes

From: Addressing the unmet need for visualizing conditional random fields in biological data

 

Position

     

Mutant

4

7

10

24

27

29

Fold

CRF4

CRF6

CRF12

BsADK

C

C

T

C

D

G

1

1

1

1

Chim

C

C

T

D

T

G

-

10-68

10-150

10-248

Tetra

H

S

T

D

T

G

inactive ++

101

10-70

10-146

Di

C

C

R

C

D

E

normal-

101

10-2

10-2

Hexa

H

S

R

D

T

E

normal--

103

102

10-46

  1. Residues changed from wild-type are indicated in bold.
  2. The relative activities of these mutants show that not only the identities of the residues in each position, but also the relationships between these residues play a key role in enzyme activity. Position 24, for example, has an almost equal probability of containing a C, or D residue, across the ADK family. The functional consequences of changing a C to a D in a specific protein however, must be calculated in the context of the other residues in that specific protein for which relational dependencies exist. Our assays show that activity correlates well with the predictions from a CRF defined using the network formed from these relationships. The Di mutant retains activity, only slightly impaired from wild-type. The Tetra mutant shows barely detectable activity. The Hexa mutant recovers a significant amount of activity, but remains an order of magnitude less active than wild-type. CD thermal denaturation shows little difference in stability between the wild-type and Di mutants and only a small destabilization in the Tetra and Hexa mutants. All of these activity changes agree with predictions for the modified B. subtilis sequence, by a CRF defined by the interdependencies between the residues of these motifs—with one caveat. If only these residues are used to define the CRF, it predicts that the Hexa mutant will have better activity than the wild-type protein. This caveat highlights the danger of assuming that only the very strongest co-evolutions are necessary to define an adequate CRF. The CRF defined with the 6 residues most obviously involved (CRF6), fails to evaluate those residues in the context of the rest of the specific B. subtilis residues in the protein. Because the hydrophilic residue motif is more prevalent in the training set, the CRF predicts that a mutant containing it, will be more likely to be functional. This failure is exactly why network models of interdependency are critical for developing accurate predictive methods for protein sequence → function.