Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

Figure 1

Distribution of 3-grams in UniProt. Histogram of the counts C i for all types (1 ≤ i ≤ 203) of 3-grams, shown for grids of size ΔC i = 103. The mean <C i > and standard deviation σC are 7,822 and 6,682, respectively. The inset shows the portion of the curve for C i ≤ 15,000 in more detail. The ordinate is the percentage of 3-grams within intervals of ΔC i = 500. The peak (6.07%) occurs at the interval 2500 ≤ C i ≤ 3000. 3-grams that are one standard deviation away from the mean towards lower counts are termed 'rare' 3-grams. Their counts are lower than [<C i > - σC] = 1,140. There is a total of 480 such 3-grams (i.e. 6% of all the 8,000 types of 3-grams), and their cumulative frequency of occurrence evaluated from the ratio of their total count to C tot is 0.595%.

Back to article page