Fig. 2From: Recognizing RNA structural motifs in HT-SELEX data for ribosomal protein S15 a Histogram of normalized Levenshtein distance from the top 4 high frequency sequences (Seq. ID: 98, 101, 290, 669) shows a clear cluster cutoff at distance 10%. Within the cluster, there is a decrease in the frequency of sequences further from the center indicating sequence clusters containing high frequency sequences are valid. b Plot of the CD-HIT clustering data represented as cluster size vs mean percent identity to cluster seed (diffuseness). In red are the clusters containing high frequency sequences with more than 100 read counts. In blue are clusters containing high frequency sequences with more than 100 read counts, which have been experimentally examined for binding to S15 (Table 6). In green are sequences experimentally tested that are from the clusters that do not contain high frequency sequencesBack to article page