Skip to main content

Table 1 Quantification of the structural word extraction from the non-redundant data set.

From: Mining protein loops using a structural alphabet and statistical exceptionality

Words
Number of words 3310 166 2214 930
(%) (11.7%) (5.0%) (66.9%) (28.1%)
Number of fragments 249953 11435 129781 108737
(%) (60.2%) (4.6%) (51.9%) (43.5%)
Nb fragments/word 75.5 68.9 58.6 116.9*
All-loop coverage rate 72.7% 5.1% 46.5% 40.2%
Short-loop coverage rate 70.3% 4.4% 38.9% 39.3%
Long-loop coverage rate 74.9% 5.7% 53.9% 41.1%
Loops containing at least one word 84.8% 9.8% 60.3% 58.2%
Short loops containing at least one word 79.7% 6.1% 48.1% 49.4%
Long loops containing at least one word 97.8% 19.1% 90.9% 80.4%
  1. 1: words seen more than 30 times. 2: under-represented words, 3: non-significant words, 4: over-represented words, '*': significantly higher occurrence according to a Kruskal-Wallis test. Coverage rates are given on a per structural letter basis. Numbers within brackets denote the percentage of words/fragments with respect to the 28274 words/415071 fragments of the whole data set (column 1) and with respect to the 3310 words/249953 fragments in W set≥30 (columns 2 to 4).