Skip to main content

Table 5 A typical substring-cluster caused by low complexity sequence

From: Simultaneous identification of long similar substrings in large sets of sequences

CLU SEQ POS identical substring
1 1 2      t|aaaaaaaaaaaaaaaaaaaa|aaaaat...
1 1 3     ta|aaaaaaaaaaaaaaaaaaaa|aaaat... 
1 1 4    taa|aaaaaaaaaaaaaaaaaaaa|aaat...  
1 1 5   taaa|aaaaaaaaaaaaaaaaaaaa|aat...   
1 1 6  taaaa|aaaaaaaaaaaaaaaaaaaa|at...    
1 1 7 taaaaa|aaaaaaaaaaaaaaaaaaaa|t...     
  1. The sequence "taaaaaaaaaaaaaaaaaaaaaaaaat" generates the left maximal substring-cluster 1 for match length 20. The common substring is formed by a run of 20 letters "a";