Skip to main content

Table 5 Correspondence between extreme superfamily-specific words and Swiss-Prot annotations in the initial data set

From: Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs

   

Statistics in the initial dataset

Comparison with Swiss-Prot

Word

Occ a

Lp max

nb sf* / nb sf b

Superfamilies c

Annot

Match/total (Precision (%)) d

Sensitivity (%)

URNH

43

54.95

1/17

48726*

Disulfide

7/14 (50)

4

RNHB

59

51.33

1/28

48726*

Disulfide

9/20 (45)

6

UQHS

53

75.07

1/16

52058*

Repeat

12/22 (55)

41

SUQH

70

63.42

1/25

52058*

Repeat

11/26 (42)

38

QHSG

37

51.75

1/12

52058*

Repeat

4/10 (40)

14

HSGI

63

76.26

1/18

52058*

Repeat

5/12 (42)

17

QXUS

43

52.05

1/10

51735*

Repeat

1/15 (7)

 

ZSGI

99

52.22

1/49

52058*

Repeat

7/36 (19)

 

GSUS

169

140.49

3/59

141571*, 52047, 52058

Repeat

6/38 (16)

 

GZDO

115

84.72

3/49

47473*, 52833, 52935

Repeat

1/35 (3)

 

DODQ

73

157.01

1/17

47473*

CA_BIND

15/23 (65)

75

ZDOD

48

91.27

1/13

47473*

CA_BIND

11/16 (69)

58

YUOD

111

184.67

1/11

52540*

NP_BIND

39/41(95)

35

UODO

142

210.14

4/14

52540*,53659, 54211, 55729

NP_BIND

49/60 (82)

38

OEIJ

33

53.84

1/4

51735*

NP_BIND

6/7 (86)

14

EIJU

48

51.68

1/13

51735*

NP_BIND

7/15 (47)

20

USLG

121

137.35

2/47

141571*, 51206

NP_BIND

2/22 (9)

 

UZCI

99

63.70

2/28

103025*, 56784

NP_BIND

1/13 (8)

 

RUDO

27

55.55

1/4

53335*

Binding

5/10 (50)

18

UGRU

37

60.07

1/8

53335*

Binding

4/12 (33)

 

EGZD

48

51.68

1/5

51735*

   

GRUD

33

70.55

1/6

53335*

   

SLGS

60

118.45

1/17

141571*

   
  1. This comparison is made on a subset on the initial set: 1487 proteins that can be mapped to biological annotations using the PDB/UniProt Mapping database. a: word occurrence. b: nb sf denotes the number of SCOP superfamilies in which the structural word is seen. c: superfamilies in which the word is over-represented. d: match and total denote the number of fragments annotated and the total number of fragments, respectively. Bold font indicates a match/total ratio greater than 40%. Italic font indicates a match/total ratio lower than 40%. Abbreviations used: NP BIND = nucleotide phosphate-binding site, CA_BIND = calcium-binding site. SCOP ids: 103025 = Folate-binding domain, 141571 = Pentapeptide repeat-like, 47473 = EF-hand, 48726 = Immunoglobulin, 51206 = cAMP-binding domain-like, 51735 = NAD(P)-binding Rossmann-fold domains, 52047 = RNI-like, 52058 = L domain-like, 52540 = P-loop-containing nucleoside triphosphate hydrolases, 52833 = Thioredoxin-like, 52935 = PK C-terminal domain-like, 53335 = S-adenosyl-L-methionine-dependent methyltransferases, 53659 = Isocitrate/isopropylmalate dehydrogenase-like, 54211 = Ribosomal protein S5 domain 2-like, 55729 = Acyl-CoA N-acyltransferases (Nat), 56784 = HAD-like. "*" denotes the superfamily in which the word is most over-represented.