Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Subfamily specific conservation profiles for proteins based on n-gram patterns

Figure 1

(a) Logarithmic histogram of NP{4,2} patterns in UniProt. All theoretical NP{4,2} patterns are present. The distribution is relatively flat over the majority of the range. (b) Histogram of the overlap between shared NP{4,2} patterns. The majority of patterns have an overlap of 5 which decreases exponentially as the degree of overlap approaches 0. (c) Noise level for carbonic anhydrase (P00918). The y-axis represents the expectation by random chance in the SPT data set for pairs of NP{4,2} patterns with overlaps ranging from 0–5. The noise level decreases significantly for overlapping pairs compared to NP{4,2} patterns by themselves (overlap = 0). (d) Distribution of offset differences between shared NP{4,2} patterns in different sequences. More than 80% of shared NP{4,2} pairs have an offset of 0. The remainder are distributed in a random fashion over the range of possible offsets. Pairs of shared NP{4,2} patterns with zero offset represent n-gram pattern local alignments (NPLAs).

Back to article page