As we stated above, Crochemore's algorithm cannot be directly used in weighted sequence, but it enlightens us to borrow the idea of partitioning. By improving the method for computing repeated patterns in weighted sequences we proposed in [22], we first simulate the definition for *E*
_{
p
}-classes of non-weighted strings, and give the corresponding weighted version:

**Definition 7** Consider a factor *f* of length *p* in a weighted sequence *X*[1, *n*]. An *E*
_{
p
}-class associated with *f* is the set *C*
_{
f
}(*p*) of all position-probability pairs, denoted by (*i*, *π*
_{
i
}(*f*)), such that *f* occurs at position *i* with probability *π*
_{
i
}(*f*) ≥ 1/*k*.

*C*
_{
f
}(*p*) is an ordered list that contains all the positions of *X* where *f* occurs. Note that only the occurrences of those real factors are considered. For this reason, the probability of each appearance of a factor should be recorded and kept for the next iteration.

Although tandem repeats are special cases of repeats in weighted sequences, the following facts draw a distinction between the algorithms for computing tandem repeats and the repeats we proposed before.

**Fact 1**
*The occurrences of a tandem repeat are not overlapping*.

**Fact 2**
*If a factor f is a tandem repeat of X, any consecutive alignment of f should not be reported as a tandem repeat again*.

For instance, a string *AT AT AT AT* will report a tandem repeat (1, *AT*, 4), not (1, *AT AT*, 2). According to the above facts, tandem repeats can be timely filtered during the construction of equivalence classes.

Note that in this construction process, a position *i* is allowed to go to several but no more than |*Σ*| different *E*
_{
p
}-classes, due to the uncertainty of weighted sequences. Though, we follow to use the notion "partition" to describe the process of building *E*
_{
p
}-classes from *E*
_{
p
}
_{−1}-classes, which can be computed based upon the following corollary:

**Corollary 1**
*Let p* ∈ {1, 2, . . . , *n*}, *i*, *j* ∈ {1, 2, ... , *n* − *p*}. *Then:*

((*i*, *π*
_{
i
}(*f*)), (*j*, *π*
_{
j
}(*f*))) ∈ *C*
_{
f
}(*p*) *iff* ((*i*, *π*
_{
i
}(*f*'),
*and* ((*i* + *p* − 1, *π*
_{
i
}
_{+}
_{
p
}
_{−1}(*σ*)), (*j* + *p* − 1, *π*
_{
j
}
_{+}
_{
p
}
_{−1}(*σ*))) ∈ *C*
_{
σ
}(1)

*where*
*σ* ∈ *Σ*, *f*
*and*
*f*'*are two factors of length p and p* − 1 *respectively, such that f* = *f*'*σ*
*and*
*π*
_{
i
}(*f*) ≥ 1/*k*, *π*
_{
j
}(*f*) ≥ 1/*k*.

Our algorithm for picking all the tandem repeats of *X* then operates as follows:

1. "Partition" all the *n* positions of *X* to build *E*
_{1} and detect all the tandem repeats of length 1: For every character *σ* ∈ *Σ*, create a class *C*
_{
σ
}(1) that is an ordered list of couples (*i*, *π*
_{
i
}(*σ*)), where *i* is an occurrence of *σ* in *X* with probability not less than 1/*k*. Each class composed of more than one element forms *E*
_{1}. Those *C*
_{
σ
}(1)s in which the distance between two or more adjacent position *i* is 1 report the tandem repeats of length 1.

2. Iteratively compute *E*
_{
p
}-classes from *E*
_{
p
}
_{−1}-classes using the above corollary for *p* ≥ 2, and find all the tandem repeats of length *p*: Take each class *C*(*p* − 1) of *E*
_{
p
}
_{−1}, partition *C*(*p* − 1) so that any two positions *i*, *j* ∈ *C*(*p* − 1) go to the same *E*
_{
p
}-class if positions *i* + *p* − 1, *j* + *p* − 1 belongs to a same *E*
_{1}-class, and this *E*
_{
p
}-class represents a real factor of *X*.

3. For each *E*
_{
p
}-class *C*(*p*) partitioned by *C*(*p* − 1), test if the factor associated with *C*(*p*) is a tandem repeat of *X*: If the cardinality of *C*(*p*) is at least two and any distance between two or more adjacent positions in *C*(*p*) equals *p*, add the corresponding triple into the tandem repeat set
. Eliminate those *C*(*p*)s who are singletons, and keep the rest to proceed the iterative computation at stage *p* + 1.

4. The computation stops at stage *L*, once no new *E*
_{
L
}
_{+1}-classes can be created or each *E*
_{
L
}-class is a singleton.

**Algorithm 1** Compute all the tandem repeats of a weighted sequence

**Input:** a weighted sequence *X*[1, *n*], *k* ≥ 2 ∈ *R*

**Output:** all the tandem repeats of *X*

1: **Algorithm** Compute-Tandem-Repeats(*X*, *k*)

2: **for**
*i* ← 1 **to**
*n*
**do**

3: *l* ← 0

4: **for**
*j* ← 1 **to** |*Σ*| **do**

5: **for** each *σ*
_{
j
}∈ *X*[*i*] **do**

6: **while**
**do**

7: add(*i* + *l*, *π*
_{
i
}+*l*(σ_{j})) to
(1)

8: *l* ← *l* +1

9: **if**
*l* > 1 **then**

10:

11: *p* ← 1

12: **while**
**and** there is a non-singleton class *C*(*p* − 1) of *E*
_{
p
}
_{−1} or *E*
_{
p
}
_{−1}≠ Ø **do**

13: (*C*
_{
f
}(*p* − 1), *f*) ← extract a pair from *E*
_{
p
}
_{−1} list

14: SUB ← Create-Equiv-Class(*C*
_{
f
}(*p* − 1), *f*)

15: *p* ← *p* + 1

16: add SUB to *E*
_{
p
}

We use a doubly linked list to store each equivalence class, which needs *O*(*n*) space for a bounded-size alphabet. The computation for tandem repeats is demonstrated as Algorithm 1, which repeatedly calls function Create-Equiv-Class. Algorithm 2 depicts the procedure to construct all possible *E*
_{
p
}-classes from a certain *E*
_{
p
}
_{−1}-class, and report those tandem repeats of length *p*. It is easy to see that Algorithm 1 takes *O*(*n*
^{2}) time for a constant-size alphabet, since each refinement of *E*
_{
p
} from *E*
_{
p
}
_{−1} costs linear time, and there are *O*(*n*) stages in total. The running time of Algorithm 2 is proportional to the size of the given *E*
_{
p
}
_{−1}-class, since tandem repeats of length *p* are reported along with the partitioning of the given *E*
_{
p−
}
_{1}-class. Taking all the *E*
_{
p
}
_{−1}-classes into account, stage *p* requires *O*(*n*) time and *O*(*n*) extra space. Thus the overall time complexity of finding all tandem repeats of every possible length amounts to *O*(*n*
^{2}).

**Algorithm 2** Identify tandem repeats of length *p*

**Input:** An *E*
_{
p
}
_{−1}-pair: class *C*
_{
f
}(*p* − 1), a factor *f* corresponding to *C*
_{
p−1}

Output: All the E_{
p
}-pairs derived from the input

1: **Function** Create-Equiv-Class(*C*
_{
f
} (*p* − 1), *f*)

2: **for** each (*i*, *π*
_{
i
} (*f*)) ∈ *C*
_{
f
}(*p* − 1) **do**

3: *l* ← 0

4: **for** each *σ*
_{
j
}∈ *X* [*i* + *p* − 1] **do**

5: *f*
_{
j
}← *fσ*
_{
j
}

6: *π*
_{
i
} (*f*
_{
j
}) ← *π*
_{
i
} (*f*) × *π*
_{
i
} + *p* − 1(*σ*
_{
j
})

7: **while**
**do**

8: add(*i* + *l*, *π*
_{
i
}+*l*(*j*)) to
(*p*)

9: *l* ← *l* +1

10: **if**
*l* > 1 **then**

11:

12: **for** each *j*
**do**

13: **if** |
(*p*)| = 1 **then**

14: delete
(*p*)

15: **else**

16: add

17: **return**
*E*
_{
p
}

**Theorem 1**
*The all-tandem-repeats problem can be solved in O(n*
^{2}
*) time*.