- Methodology article
- Open Access

# A method for rapid similarity analysis of RNA secondary structures

- Na Liu
^{1, 2}Email author and - Tianming Wang
^{2, 3}

**7**:493

https://doi.org/10.1186/1471-2105-7-493

© Liu and Wang; licensee BioMed Central Ltd. 2006

**Received:**24 March 2006**Accepted:**08 November 2006**Published:**08 November 2006

## Abstract

### Background

Owing to the rapid expansion of RNA structure databases in recent years, efficient methods for structure comparison are in demand for function prediction and evolutionary analysis. Usually, the similarity of RNA secondary structures is evaluated based on tree models and dynamic programming algorithms. We present here a new method for the similarity analysis of RNA secondary structures.

### Results

Three sets of real data have been used as input for the example applications. Set I includes the structures from *5S rRNAs.* Set II includes the secondary structures from *RNase P* and *RNase MRP*. Set III includes the structures from *16S rRNAs*. Reasonable phylogenetic trees are derived for these three sets of data by using our method. Moreover, our program runs faster as compared to some existing ones.

### Conclusion

The famous Lempel-Ziv algorithm can efficiently extract the information on repeated patterns encoded in RNA secondary structures and makes our method an alternative to analyze the similarity of RNA secondary structures. This method will also be useful to researchers who are interested in evolutionary analysis.

## Keywords

- Dynamic Programming Algorithm
- Linear Sequence
- Rhodospirillum Rubrum
- Tree Edit Distance
- Base Pairing Probability

## Background

RNA secondary structures play an important role in determining the functions of RNA molecules. Some of them have been accepted as good data for evolutionary analysis. With the completion of the sequencing of the genomes of human and other species, major structural biology resources have been harnessed to predict functions. More and more RNA structures are accumulated and we know little about their functions. This calls for the development of cost-effective computational methods to predict RNA functions, which will provide preliminary information for biologists and guide biological experiments. Earlier studies usually adopt dynamic programming algorithms and tree models. Shapiro et al [1] proposed to compare RNA secondary structures by using tree models. Hofacker et al [2] compared RNA secondary structures by aligning the corresponding base pairing probability matrices that were computed by McCaskill's partition function algorithm [3]. Because these methods rely on dynamic programming algorithms, they are compute-intensive. Constructing tree models is based on the idea that the stems or helices dominantly stabilize the secondary structures. So they ignore their primary sequences and focus on so-called elementary units (stem and loop, etc) for the similarity analysis. There are other works, in which tree models were constructed to analyze the similarity of RNA secondary structures [4–8]. Recently Liao et al [9] have proposed to use graphs to represent RNA secondary structures and then derive some invariants from graphs to compare RNA secondary structures. This idea is from the study of DNA sequences [10–13]. It has been stated [10] that invariants actually reflect some characterizations of biological structures or sequences and may be regarded as indicators. Some information will be lost, however, and how to obtain and select suitable invariants to characterize biological sequences so as to compare DNA sequences effectively is still unsolved. What's more, the graphical representations don't work well when the size of the RNA secondary structure is large. Obviously, for complex RNA secondary structures, more information is lost, which will affect the similarity analysis. Popular tools for optimal alignment of RNA secondary structures include RNAdistance [1], RNAforester [14] etc. RNAdistance uses the tree models to coarsely represent RNA secondary structures, and compares RNA secondary structures based on tree edit distance measure. RNAforester supports the computation of pairwise and multiple alignment of structures based on tree alignment measure.

In this paper we propose a novel method for the similarity analysis of RNA secondary structures, where pseudoknots are also taken into account. In our approach, each secondary structure is transformed into a linear sequence. The linear sequence not only contains the information on the corresponding RNA primary structure, but also contains the information on the base pairing.

Furthermore, standard and famous Lempel-Ziv algorithm [15] is employed for the similarity analysis. Of course, we have tested the validity of our method by analyzing three sets of real data. The results obtained by our method are comparable to those given by other authoritative methods. What's more, the whole process is easy to operate. It can yield results rapidly.

## Results

### Materials

Three sets of real data are used to test our method. RNA secondary structures in set II are from *RNase P* and *RNase MRP*. They are distantly related and there is little sequence homology between them. These secondary structures are used to test distant RNA secondary structures. They are mainly obtained from the *RNase P* Database [16] and the remaining secondary structures are obtained from [17]. The names of the RNA secondary structures from *RNase P* are: *Synechocystis sp.PCC6803, Anacystis nidulans PCC6301, Pseudoanabaena sp.PCC6903, Anabaena sp.PCC7120, Porphyra purpurea chloroplast, Thermotoga maritima, Agrobacterium tumefaciens, Rhodospirillum rubrum, Bacillus subtilis, Reclinomonas americana mitochondria, Sulfolobus acidocaldarius, Methanococcus jannaschii, Halobacterium cutirubrum, Human (nuclear) P*. The RNA secondary structures from *RNase MRP* are obtained from [17], whose names are: *Human, Bovine, Mouse, Rat*. RNA secondary structures in set I are from *5S rRNAs*. They are provided by Maciej Szymanski, who has developed the 5S Ribosomal RNA Database [18]. The names of the *5S rRNAs* used in our study are *Halobacterium spl, Pyrodictium occultum, Sulfolobus spl, Actinia equina, Dicyema misakiense, Basidiobolus magnus, Chrysaora quinque, Christiansenis pallida* and *Planocera recticulata*. RNA secondary structures in set III are from *16S rRNAs*. The names of the *16S rRNAs* are *Thermoproteus tenax, Halobacterium, Bacteoides, Bacillus, Mus musculus, Synechococcus, Thermotoga, Saccharomyces cerevisiae, Homo sapiens, Escherichia coli, Methanococus vannielli, Thermococcus celer, Vairimorpha* and *Methanobacterium*.

### The similarity analysis of set I and set II by using our method

## Discussion

Lempel-Ziv algorithm is an algorithm that is related to minimal length encoding. Its successful application to the evolutionary analysis of DNA sequences has indicated that Lempel-Ziv algorithm is an alternative to the similarity analysis of biological sequences. To our knowledge, the concept of applying Lempel-Ziv algorithm to the similarity analysis of RNA secondary structures hasn't been adopted by any other researcher. The introduction of our method in Method section indicates that this is a relatively simple and rapid method for the similarity analysis of RNA secondary structures. We owe the efficiency of this method mainly to the Lemple-Ziv algorithm, which can effectively extract the repeated patterns encoded in linear sequences.

At first, we compare Figure 1 with Figure 3. From Figure 1, we observe that: 1. *Actinia equina, Chrysaora quinque and Planocera recticulata* are grouped closely (they belong to *Animalia*); 2. *Basidiobolus magnus* and *Christiansenis pallida* are grouped closely (they belong to *fungi*); 3. *Pyrodictium occultum, Halobacterium spl* and *Sulfolobus spl* (they belong to *Archaebacteria*) are clearly separated from the rest; 4. *Dicyema misakiense* is placed closer to *Animalia* than to *fungi* (it belongs to *mesozoa*). The relationship described by our method is in accordance with the one described in [21, 22]. In contrast to Figure 1, we find in Figure 3, obtained by using RNAforester program, that *Halobacterium spl* is separated from the cluster that *Pyrodictium occultum* and *Sulfolobus spl* belong to. Obviously this is not reasonable. Then we compare Figure 2 with Figure 4. From Figure 2, we observe that our result is consistent with the theory that is suggested in [23–26]: MRP evolved from a Eukaryotic Nuclear P in the nucleus of an early Eukaryote. Figure 2 indicates that mrpRNA are more similar to eukaryotic pRNA than to prokaryotic pRNA. Furthermore, *Synechocystis sp.PCC6803*, *Anacystis nidulans PCC6301*, *Pseudoanabaena sp.PCC6903*, *Anabaena sp.PCC7120* and *Porphyra purpurea chloroplast* are grouped closely, named cluster I for convenience; *Thermotoga maritima, Agrobacterium tumefaciens* and *Rhodospirillum rubrum* are grouped closely, named cluster II. Cluster I and cluster II are adjacent. In Figure 4, *Halobacterium cutirubrum* is put far away from *Methanococcus jannaschii*. Furthermore, *Anacystis nidulans P* is separated far from *Synechocystis sp.P* and *Anabaena sp.P. Bacillus subtilis* and *Reclinomonas americana mitochondria* aren't placed closely. This conformation doesn't accord with the one demonstrated by Collins et al.

In general, our method can compare secondary structures reasonably, with the results consistent with those from [23–26]. For the two data sets, our algorithm performs better than RNAforester program. Additionally, our analysis results favor the proposal that RNA secondary structures are useful materials for evolutionary analysis.

It's obvious that there exists unreasonable topology that depicts the similarity relationship of these RNA secondary structures in Figure 5. For example, *Thermotoga maritima, Agrobacterium tumefaciens* and *Rhodospirillum rubrum* are placed close to the RNase MRP RNAs and are separated far away from the branch for *Synechocystis sp.P* and *Anabaena sp.P*, etc, which simultaneously leads to the separation of *Sulfolobus acidocaldarius* from *Methanococcus jannaschii* and *Halobacterium cutirubrum*. In nature, *Thermotoga maritima, Agrobacterium tumefaciens* and *Rhodospirillum rubrum* belong to Eubacterial RNase P and should be grouped close to *Synechocystis sp.P* and *Anabaena sp.P*, etc. Figure 5 has favored our claim, i.e. our characteristic sequences do grasp some information on RNA secondary structures (base pairing).

*O*(|

*F*

_{1}||

*F*

_{2}|

*deg*(

*F*

_{1}))

*deg*(

*F*

_{2}))[27] is equivalent to

*O*(

*n*

_{1}

*n*

_{2}), where

*n*

_{1}

*n*

_{2}is the product of the sizes of the two RNA secondary structures being compared.

Time/Space complexities of our method and the RNAforester program.

Algorithm Name | Running Time | Space requirement | Reference |
---|---|---|---|

| O(| | O(| | [27] |

| O( | O( |

Execution times required by our algorithm and the RNAforester program.

Species Name | Execution Time required by RNAforester | Execution Time required by our method |
---|---|---|

| 12.62 s | 1.52 s |

| 35.36 s | 6.96 s |

| 1583.12 s | 176.68 s |

*16S rRNAs*, whose secondary structures are more complex and the sizes of which are relatively larger. The result is shown in Figure 6, drawn by Treeview. Their similarity relationship has been reasonably derived by our method.

*Thermoproteus tenax, Halobacterium, Methanococus vannielli, Thermococcus celer*and

*Methanobacterium*have been clustered together, which is consistent with the fact that they are of

*Archaea*.

*Mus musculus, Saccharomyces cerevisiae, Homo sapiens, Vairimorpha*are clustered together, which is consistent with the fact that they are of

*Eucaya*. The left are of

*Bacteria*.

## Conclusion

Here we have proposed a new method to analyze the similarity of RNA secondary structures (pseudoknots are taken into account). It is a simple method that yields results reasonably and rapidly. Our algorithm is not necessarily an improvement as compared to some existing methods, but an alternative for the similarity analysis of RNA secondary structures. The new method doesn't require sequence alignment and the construction of tree models. It is based on linear characteristic sequences that we define for RNA secondary structures and the famous Lempel-Ziv algorithm that relates to minimal length encoding. The characteristic sequences contain the information from RNA primary structures and the base pairs formed in RNA secondary structures. The Lempel-Ziv algorithm effectively extracts the information on the repeated patterns encoded in long sequences. The example applications of our method to three sets of real data and its comparison with other methods verify the validity of our method. From the comparisons, we conclude that our method performs well on distantly related RNA secondary structures. In our approach, complicated computation is avoided. The whole process is easy to operate. What's more, the size of RNA secondary structure is not problematic.

Of course, there is defect in our approach: when non-linear RNA secondary structures are transformed into linear characteristic sequences, some information may be lost. However, our test has indicated that our method can yield results reasonably, i.e. our method can extract some key information from RNA secondary structures.

## Methods

### Lempel-Ziv algorithm and LZ complexity

Let *S*, *Q* and *R* be sequences over a finite alphabet Λ, *l*(*S*) be the length of *S, S*(*i*) be the *ith* element of *S* and *S*(*i,j*) be the subsequence of *S* that starts at position *i* and ends at position *j*. Note that *S*(*i,j*) = ∅, for *i* > *j*. The contatenation of *Q* and *R* forms a new sequence *S* = *QR*, where *Q* is called a prefix of *S*, and *S* is called an extension of *Q* if there exists an integer *i* such that *Q* = *S*(1, *i*).

An extension *S* = *QR* of *Q* is reproducible from *Q* denoted by *Q* → *S*, if there exists an integer *p* ≤ *l*(*Q*) such that *R*(*k*) = *S*(*p*+*k*-1), for *k* = 1,2,.., *l*(*R*). For example: *AACUT* → *AACUTACU* with *p* = 2. A non-null sequence *S* is producible from its prefix *S*(1, *j*), denoted by *S*(1, *j*) ⇒ *S*, if *S*(1, *j*) → *S*(1, *l*(*S*) - 1). For example: *CCUA* ⇒ *CCU AU AUT* with *p* = 3.

The difference between producibility and reproducibility is that the former allows for an extra "different" symbol at the end of the extension process which is not permitted in the latter. Therefore an extension which is reproducible is always producible but the reverse may not always be true.

Any non-null sequence *S* can be built from a production process by iterative self-deleting-building process where at the *ith* step *S*(1, *h*_{i-1}) ⇒ *S*(1, *h*_{
i
}), ∅ = *S*(1, 0) ⇒ *S*(1, 1). An m-step production process of *S* leads to a parsing of *S* into *H*(*S*) = *S*(1, *h*_{1}) • *S*(*h*_{1} + 1, *h*_{2}) •......• *S*(*h*_{m-1}+ 1, *h*_{
m
}), which is called the history of *S*, and *H*_{
i
}(*S*)= *S*(*h*_{i-1}+ 1, *h*_{
i
}) is called the *ith* component of *H*(*S*).

A component *H*_{
i
}(*S*) and the corresponding production step *S*(1, *h*_{i-1}) ⇒ *S*(1, *h*_{
i
}) are called exhaustive if *S*(1, *h*_{i-1}) → *S*(1, *h*_{
i
}) is not true. A history is called exhaustive if each of its components (with a possible exception of the last one) is exhaustive. What's more important, the exhaustive history of any non-null sequence is unique. For example, for the sequence *S* = *UUCGAGGUCGGA*, its exhaustive history is *EH*(*S*)= *U*•*UC*•*G*•*A*•*GG*•*UCGG*•*A*.

Let *c*(*S*) be the number of components in the exhaustive history of *S*. It is the least possible number of steps needed to generate *S* according to the whole Lempel-Ziv algorithm, so *c*(*S*) becomes an important complexity indicator.

### Linear characteristic sequences of RNA secondary structures

*R*=

*r*

_{1}

*r*

_{2}.....

*r*

_{ n }, where

*r*

_{ i }is called the

*i*

^{ th }(ribo)nucleotide. Each

*r*

_{ i }belongs to the alphabet {A, C, G, U}. The secondary structure of an RNA molecule is the collection of base pairs that occur in its 3D structure. For each secondary structure, there are two terminals: 5'-terminal and 3'-terminal. Figure 7 shows a simulated RNA secondary structure. Its RNA sequence (from 5'-terminal to 3'-terminal)is:

*GGGAAACUGGAAGGCGGGGCGAACGUCGGCCCC* *AGUGAAGUCAAAUGGAGCGUACACGGACCAUUAUGGGCUAA*.

In this section, we will define linear characteristic sequences for RNA secondary structures. In other words, we will transform non-linear RNA secondary structures into linear sequences. In our research, a group of consecutive base pairs (including one base pair) is called a helix and *i* is called the position of nucleotide *r*_{
i
}. The open regions surrounded by single stranded bases are called loops. Each helix is numbered from the 5'-terminal to 3'-terminal: The first helix is called Helix1, and the second helix is called Helix2,......etc. The rule for transformation is as such: For an RNA secondary structure S with N helices, write its RNA sequence with the letters in Helix1 upper case and the rest small case; then write in succession its RNA sequence with the letters in Helix2 upper case and the rest small case;.......then write in succession its RNA sequence with the letters in HelixN upper case and the rest small case. Now a linear sequence has been obtained by following the above-mentioned rule. We call it linear characteristic sequence of *S*, abbreviated to L(S). Take the simulated secondary structure for example, it has five Helices.

According to our rule, the linear characteristic sequence of the simulated secondary structure is as follows:

*gggaaacuggaaggcGGGGCgaacgucgGCCCCagugaagucaaauggagcguacacggaccauuaugggcuaagggaaacuggaaggcggggcgaACGUCGgccccagugaagucaaaUGGAGCguacacggaccauuaugggcuaagggaaa*

*cuggaaggcggggcgaacgucggccccagugaAGucaaauggagcguacacggaccauuaugggCUaagggaaacuggaa*

*ggcggggcgaacgucggccccagugaaguCaaauggagcguacacggaccauuaugGgcuaagggaaacuggaaggcgggg*

*cgaacgucggccccagugaagucaAAUGgagcguacacggacCAUUaugggcuaa*

which is obtained by adjoining the following five sequences in succession.

Helix1: *gggaaacuggaaggcGGGGCgaacgucgGCCCCagugaagucaaauggagcguacacggaccauuaugggcuaa*

Helix2: *gggaaacuggaaggcggggcgaACGUCGgccccagugaagucaaaUGGAGCguacacggaccauuaugggcuaa*

Helix3: *gggaaacuggaaggcggggcgaacgucggccccagugaAGucaaauggagcguacacggaccauuaugggCUaa*

Helix4: *gggaaacuggaaggcggggcgaacgucggccccagugaaguCaaauggagcguacacggaccauuaugGgcuaa*

Helix5: *gggaaacuggaaggcggggcgaacgucggccccagugaagucaAAUGgagcguacacggacCAUUaugggcuaa*

### Distance computation and pair-wise distance matrix

Lempel et al have proposed that, for any given sequences *Q* and *S*, *c*(*QS*) ≤ *c*(*Q*)*+ c*(*S*) always remains valid. This formula shows that the steps required to extend *Q* to *QS* are always less than the steps required to build *S* from ∅. Recently, Otu et al [28] concluded that the more similar the sequence *S* is to sequence *Q*, the smaller *c*(*QS*) - *c*(*Q*) is. That is *c*(*QS*) - *c*(*Q*) depends on how much *S* is similar to *Q*.

For example, let *Q*, *S*, *R* represent three short RNA sequences defined over the alphabet {*A*, *C*, *G*, *U*}, where *S* = *UUACGUAAUGU,Q* = *AGUCCCUAGGA, R* = *UACCGAUAAG*. By the rule mentioned above, the corresponding exhaustive histories of *S*, *Q*, *R*, *SR*, *QR*, *SQ* are: *EH*(*S*) = *U*•*UA*•*C*•*G*•*UAA*•*UG*•*U, EH*(*Q*) = *A*•*G*•*U*•*C*•*CCU*•*AGG*•*A, EH*(*R*) = *U*•*A*•*C*•*CG*•*AU*•*AA*•*G, EH*(*SR*) = *U*•*UA*•*C*•*G*•*UAA*•*UG*•*UUACC*•*GA*•*UAAG, EH*(*QR*) = *A*•*G*•*U*•*C*•*CCU*•*AGG*•*AU*•*AC*•*CG*•*AUAA*•*G, EH*(*SQ*) = *U*•*UA*•*C*•*G*•*UAA*•*UG*•*UAG*•*UC*•*CC*•*UAGG*•*A*. We can find that we need 2 steps to build *R* from *S*, 4 steps to build *R* from *Q*, 4 steps to build *Q* from *S*. So we say *R* is more similar to *S* than to *Q*. The reason is that *S* and *R* share the common patterns *UAC* and *UAA*.

Based on this hypothesis, Otu et al have used the Lempel-Ziv algorithm to successfully construct phylogenetic trees from DNA sequences, which verifies the efficiency of Lempel-Ziv algorithm in analyzing the similarity of linear biological sequences.

Therefore we adopt the following formula to evaluate the distance between secondary structures S and Q, which is slightly different from [28]:

The denominator in [28] is equivalent to [*c*(*L*(*S*)*L*(*Q*))+*c*(*L*(*S*)*L*(*Q*))]/2, which leads to the fact that the distance calculated by the formula proposed in [28] will always be twice as much as the distance calculated by our formula. As you know, a constant will not affect the similarity analysis at all. We choose to use the formula mentioned above mainly because its expression is simpler. The formula in [28] has been proven to be a distance metric by Out et al. Thus *Rd*(*S, Q*) also satisfies the conditions required by a distance metric.

It's obvious that the more similar S is to Q, the smaller *c*(*L*(*S*)*L*(*Q*))-*c*(*L*(*S*)) and *c*(*L*(*Q*)*L*(*S*))-*c*(*L*(*Q*)) are, and then the smaller *Rd*(*S,Q*) is.

Generally, given *n* RNA secondary structures *S*_{1}, *S*_{2},......, *S*_{
n
}, we can obtain their linear characteristic sequences by the above-mentioned rule, which are *L*(*S*_{1}), *L*(*S*_{2}),......, *L*(*S*_{
n
}). They are linear sequences defined over alphabet {A, C, G, U, a, c, g, u,} and carry the information on RNA secondary structures. Then, by using Lempel-Ziv algorithm, the distance between any pair of structures, *Rd*(*S*_{
i
}, *S*_{
j
}), may be rapidly computed. By arranging them into a matrix, a pair-wise distance matrix is obtained, denoted by *RD*. *RD* =(*Rd*(*S*_{
i
}, *S*_{
j
})) contains the information on the similarity/dissimilarity between any pair of RNA secondary structures.

Based on what Otu et al have proven, we can easily conclude that our distance metric is also valid for inferring phylogeny of species because it satisfies all the conditions for phylogeny analysis. Hence our pair-wise distance matrix may be input the programmes for phylogeny inference to study the phylogeny for RNA molecules based on their secondary structures.

## Declarations

### Acknowledgements

The authors thank all the anonymous reviewers for their support and suggestions. In particular, the authors thank Maciej Szymanski for providing all the secondary structures of 5S RNAs. The authors also thank Jiang Tian for the technical help. The work is supported by the National Natural Science Foundation of China(10571019).

## Authors’ Affiliations

## References

- Shapiro B, Zhang K:
**Comparing multiple RNA secondary structures using tree comparisons.***Comput Appl Biosci*1990,**6:**309–318.PubMedGoogle Scholar - Hofacker IL, Bernhart SHF, Stadler PF:
**Alignment of RNA base pairing probability matrices.***Bioinformatics*2004,**20:**2222–2227. 10.1093/bioinformatics/bth229View ArticlePubMedGoogle Scholar - McCaskill JS:
**The equilibrium partition function and base pair binding probabilities for RNA secondary structure.***Biopolymers*1990,**29:**1105–1119. 10.1002/bip.360290621View ArticlePubMedGoogle Scholar - Le SY, Nussinov R, Maizel JV:
**Tree graphs of RNA secondary structures and their comparisons.***Comput Biomed Res*1989,**22:**461–473. 10.1016/0010-4809(89)90039-6View ArticlePubMedGoogle Scholar - Shapiro B:
**An algorithm for comparing multiple RNA secondary structures.***Comput Appl Biosci*1988,**4**(3):387–393.PubMedGoogle Scholar - Sankoff D:
**Simultaneous solution of the RNA folding, alignment, and protosequence problems.***SIAM J Appl Math*1985,**45:**810–825. 10.1137/0145048View ArticleGoogle Scholar - Le SY, Owens J, Nussinov R, Chen JH, Shapiro B, Maizel JV:
**RNA secondary structures: comparisons and determination of frequently recurring substructures by consensus.***Comput Appl Biosci*1989,**5:**205–210.PubMedGoogle Scholar - Jiang T, Lin GH, Ma B, Zhang K:
**A general edit distance between RNA structures.***Journal of Computational Biology*2002,**9:**371–388. 10.1089/10665270252935511View ArticlePubMedGoogle Scholar - Liao B, Wang TM:
**A 3D graphical representation of RNA secondary structure.***J Biomol Struc Dynamics*2004,**21:**827–832.View ArticleGoogle Scholar - Randic M, Basak SC:
**Characterization of DNA primary sequences based on the average distances between bases.***J Chem Inf Comput Sci*2001,**41:**561–568. 10.1021/ci0000981View ArticlePubMedGoogle Scholar - Zupan J, Randic M:
**Algorithm for coding DNA sequences into "spectrum-like" and "zigzag" representations.***J Chem Inf Comput Sci*2005,**45:**309–313.View ArticleGoogle Scholar - Guo XF, Nandy A:
**Numerical characterization of DNA sequences in a 2-D graphical representation scheme of low degeneracy.***Chemical Physics Letters*2003,**369:**361–366. 10.1016/S0009-2614(02)02029-8View ArticleGoogle Scholar - Randic M, Vracko M, Lers N, Plavsic D:
**Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation.***Chemical Physics Letters*2003,**371:**202–207. 10.1016/S0009-2614(03)00244-6View ArticleGoogle Scholar - Hochsmann M, Toller T, Giegerich R, Kurtz S:
**Local similarity in RNA secondary structures.***Proceedings of the IEEE Bioinformatics Conference 2003 (CSB 2003)*2003, 159–168.Google Scholar - Lempel A, Ziv J:
**On the complexity of finite sequences.***IEEE T Inform Theory*1976,**22:**75–81. 10.1109/TIT.1976.1055501View ArticleGoogle Scholar - Brown J:
**The ribonuclease P database.***Nucleic Acids Res*1998,**26:**351–352. 10.1093/nar/26.1.351PubMed CentralView ArticlePubMedGoogle Scholar - Schmitt M, Bennett J, Dairaghi D:
**Secondary structure of RNase MRP RNA as predicted by phylogenetic comparison.***FASEB J*1993,**7:**208–213.PubMedGoogle Scholar - Szymanski M, Miroslawa Z, Barciszewska VA, Barciszewski EJ:
**5S ribosomal RNA database.***Nucleic Acids Res*2002,**30:**176–178. 10.1093/nar/30.1.176PubMed CentralView ArticlePubMedGoogle Scholar - Felsenstein J:
**PHYLIP-Phylogeny inference package (version 3.2).***Cladistics*1989,**5:**164–166.Google Scholar - Page RDM:
**TREEVIEW: An application to display phylogenetic trees on personal computers.***Computer Applications in the Biosciences*1996,**12:**357–358.PubMedGoogle Scholar - Hiro H, Osawa S:
**Evolutionary change in 5S rRNA secondary structure and a phylogenetic tree of 54 5S rRNA species.***Proc Natl Acad Sci USA*1979,**76:**381–385. 10.1073/pnas.76.1.381View ArticleGoogle Scholar - Hiro H, Osawa S:
**Evolutionary change in 5S rRNA secondary structure and a phylogenetic tree of 352 5S rRNA species.***Proc Natl Acad Sci USA*1986,**19:**163–172.Google Scholar - Morrissey JP, Tollervey D:
**Birth of the snoRNPs: the evolution of RNase MRP and the eukaryotic pre-rRNA-processing system.***TIBS*1995,**20:**78–82.PubMedGoogle Scholar - Reddy R, Shimba S:
**Structural and functional similarities between MRP and RNase P.***Mol Biol Rep*1996,**22:**81–85. 10.1007/BF00988710View ArticleGoogle Scholar - Chamberlain J, Pagan-Ramos E, Kindelberger D, Engelke D:
**An RNase P RNA subunit mutation affects ribosomal RNA processing.***Nucleic Acids Res*1996,**24:**3158–3166. 10.1093/nar/24.16.3158PubMed CentralView ArticlePubMedGoogle Scholar - Collins LJ, Moulton V, Penny D:
**Use of RNA secondary structure for studying the evolution of RNase P and RNase MRP.***J Mol Evol*2000,**51:**194–204.PubMedGoogle Scholar - Höchsmann M, Voss B, Giegerich R:
**Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach.***IEEE ICBB*2004,**1:**53–62.Google Scholar - Otu HH, Sayood K:
**A new sequence distance measure for phylogenetic tree construction.***Bioinformatics*2003,**19:**2122–2130. 10.1093/bioinformatics/btg295View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.