- Methodology article
- Open Access
RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
- Qi Liu†1, 2, 3,
- Yu Yang†4,
- Chun Chen4Email author,
- Jiajun Bu4,
- Yin Zhang4 and
- Xiuzi Ye4, 5Email author
© Liu et al; licensee BioMed Central Ltd. 2008
Received: 18 November 2007
Accepted: 31 March 2008
Published: 31 March 2008
With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression.
RNACompress employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that RNACompress achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as Gencompress, winrar and gzip. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective.
A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.
Ribonucleic acid (RNA) is an important class of molecules which performs a wide range of biological and chemical functions. Traditionally, most RNA molecules were regarded as being involved in the process of translation, including transfer RNA (tRNA) and ribosomal RNA (rRNA). Since the late 1990s, it has been widely acknowledged that there exists other type of functional RNA molecules such as non-protein-coding RNAs. These RNAs are found in organisms ranging from bacteria to mammals and affect a wide variety of processes including plasmid replication, phage development, bacterial virulence, chromosome structure, DNA transcription, RNA modification [1–5]. RNA has recently become the center of much attention because of its functions as well as catalytic properties, leading to a substantially increased interest in identifying new RNAs and obtaining their structural information [6–8]. Furthermore, the growth of RNA databases, such as NONCODE , Rfam , RNaseP  and RNAdb  has increased two to three fold annually.
To facilitate the maintenance and analysis of such RNA data, an efficient compression algorithm of RNA sequences is needed. Algorithms for compressing DNA sequences include GenCompress , DNACompress , Biocompress  and Cfact . However, these algorithms are only suitable for compressing the primary sequences of DNA. As for RNA sequences, we are more interested in designing a novel compression algorithm to compress RNA primary sequence together with its secondary structure information. RNA secondary structure is similar to an alignment of nucleic acid sequences, except that the sequence folds back on itself and "complementary bases" pair (commonly A-U, G-C, G-U) rather than identical or similar bases . The functions of RNA are closely related to its structural characteristics and as such obtaining RNA secondary structure information (both experimentally or computationally) has been an important and interesting problem for several decades .
From a strictly mathematical point of view, compression implies understanding and comprehension . Biological sequence compression is a useful tool to recover information from biological sequences. Better compression often implies better understanding. Compressing RNA sequence with secondary structure means that we can capture the essences of RNA sequence information and its structural information simultaneously. From an application point of view, we can derive the informational complexity of RNA structural data based on compression, which can be used to study the structural features and other various properties of RNAs.
In our study, we have developed an efficient grammar-based algorithm to compress RNA sequence and its secondary structure. The software RNACompress developed in Windows and Linux platforms is accessible freely at our website. We have also defined the informational complexity of RNA structural data based on compression coupled with the theory of Kolmogorov complexity . This kind of informational complexity will be used to study the relationship between binding activities and structural complexity of RNA aptamers.
To the best of our knowledge, this is the first study to be published about the compression of biological sequences with structural information. Additionally, we apply the results to study functional activities of RNAs. The key idea of our compression algorithm is to use dot-bracket notation  to represent the secondary structure of RNA and define specific context free grammars (CFG) to model RNA secondary structure together with its primary sequence during compression (decompression). Furthermore, several computational parser and coding approaches are incorporated to facilitate the whole procedure, including (1) Utilizing the LL(1) parser to derive the left-most derivation of defined grammars for RNA primary sequence and its secondary structure and (2) Using Huffman coding to encode the symbol stream of left-most derivation to achieve the most economical compression result, etc. Extensive tests have shown that our algorithm is fast, robust, effective and obtains a universally better compression ratio than the common text-based compression tools or primary-sequence-specific compression tools in the compression of RNA sequence with its structure. These results show that our program is a useful tool for RNA data maintenance and analysis.
A. Content free grammars of RNA sequence and structure
We have defined two concise content free grammars G1 and G2 to model RNA primary sequence and its secondary structure information. A CFG is very similar to a finite automaton , and has been proved to be an efficient model to study RNA secondary structure. It contains the following elements, which are defined as follows:
(1). Terminals – a symbol that represents a constant value
(2). Non-terminals – a symbol that has the capability of being further defined in terms of terminals and/or non-terminals, usually denoted by a capital letter.
(3). Production rules – rules by which non-terminals can be replaced.
In our study, two grammars are defined as:
G 1 :
S: LS | e
L: aSu | uSa | cSg | gSc | uSg | gSu | a| u| c| g
G 2 :
S: LS | e
L: (S) | •
For both grammars, S and L are non-terminals, e is empty string, and the symbols a, u, c, g, (,) and • are terminals representing the 4 different bases, left bracket, right bracket and dot, respectively.
B. Compression algorithm
Based on the two grammars we have defined, we are able to perform the compression as shown in Figure 1. In the following we also take the RNA sequence in Figure 3 as an example to demonstrate the whole compression procedure. First we discuss several computational approaches used in our work.
We start from parsing the dot-bracket sequence of RNA secondary structure using G2, and the LL(1) parser is used to derive the left-most derivation of the input sequence. A LL parser is a top-down parser for a subset of the context-free grammars . It parses the input from left to right, and constructs a left-most derivation of the sentence. Practically, there are two common ways to describe how a given string can be derived from the start symbol of a given grammar. The simplest way is to list the consecutive strings of symbols, beginning with the start symbol and ending with the string, and the rules that have been applied. If we introduce a strategy such as "always replace the left-most non-terminal first" then for context-free grammars the list of applied grammar rules is by itself sufficient. This is defined as the left-most derivation of a string .
an input buffer, a string from the grammar
a stack on which to store the terminals and non-terminals from the grammar yet to be parsed.
a parsing table which tells it what (if any) grammar rule to apply given the symbols on top of its stack and the next input token.
LL(1) parser table for dot-bracket sequence of RNA secondary structure.
If the top of the stack is a non-terminal symbol, the non-terminal symbol and the symbol on the input stream is looked up in the parsing table to determine which rule of the grammar to use. The number of the rule is written to the output stream. If the parsing table indicates that there is no such rule then it reports an error and stops.
If the top of the stack is a terminal symbol, then it is compared to the symbol on the input stream. If they are equal they are both removed. If they are not equal, the parser reports an error and stops.
If the top is # and on the input stream there is also a # then the parser reports that it has successfully parsed the input, otherwise it reports an error. In both cases the parser will stop.
These steps are repeated until the parser stops, and then it will have either completely parsed the input or written a left-most derivation to the output stream or it will have reported an error.
Map left-most derivation of G2 to G1
As mentioned above, G2 is used to guide the left-most derivation of G1 since it is ambiguous. The mapping of the left-most derivation of G2 to G1 is straightforward: '()' will be mapped to the corresponding base pairs of the RNA secondary structure and '•' will be mapped to the corresponding un-paired bases. After this mapping, a left-most derivation of G1 is obtained and the Huffman coding is performed on the symbol stream of this left-most derivation to encode them into a bit stream, as discussed follow.
Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol . Huffman coding is able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits produces a smaller average output size when the actual symbol frequencies agree with those used to create the code.
Huffman coding of production rules of grammar G1
It should be noted that for different types of RNA or RNA in different species, the frequency distribution of their base pairs or un-paired bases are different, thus the production probabilities of the rules are different. However, from a statistical perspective, we aim at designing a universal compression algorithm for all types of RNA, thus we make use of these general probabilities here. For more specific RNA types, more specific probabilities can be used.
An example of compression
Production Rules of G2
Map to G1
# S L
# S) S (
# S) S
# S) S L
# S) S) S (
# S) S) S
# S) S) S L
# S) S) S) S (
# S) S) S) S
# S) S) S) S L
# S) S) S) S .
# S) S) S) S
# S) S) S) S L
# S) S) S) S .
# S) S) S) S
# S) S) S) S L
# S) S) S) S .
# S) S) S) S
# S) S) S)
# S) S) S
# S) S)
# S) S
# S L
# S .
The final bit stream of this left-most derivation is 0 01111 0 1110 0 101 0 00 0 010 0 1111 1 1 1 0 010 1, for a total of 35 bits.
Definition of compression ratio
In our work, the compression ratio of the compression algorithm can be computed in two ways:
R1 = uncompressed_file_bytesize/compressed_file_bytesize, or R2 = (n × (H1 + H2))/o, where n is the number of the bases in input RNA sequence. H1 and H2 are the information entropy of the RNA primary sequence and secondary structure, respectively. o is the number of bits in compressed file.
Where P i and P' i are the occurrence probabilities of each bases and characters in dot-bracket notation. If we consider a RNA sequence with infinite length, then P i = 1/4 and P' i = 1/3, assuming an independent probability distribution of 4 base pairs and 3 characters, thus H1 = 2 and H2 ≈ 1.585. This means that 2 bits is enough for encode the RNA primary sequence and 1.585 bit can be used to encoding RNA secondary structure in dot-bracket notation. Note that in our implementation, the occurrence probabilities of 4 bases and 3 characters will be computed according to the particular RNA.
C. Informational complexity
The definition of informational complexity of RNA structural data underlies the concept of Kolmogorov complexity. The Kolmogorov complexity K(•) of an object o is defined by the length of the shortest program P for a Universal Turing Machine U that is needed to output o . Intuitively, K(x) represents the minimal amount of information required to generate × by an algorithm.
It is well known that there is a relationship between Kolmogorov complexity of sequences and Shannon information theory : the expected Kolmogorov complexity of a sequence x is asymptotically close to the entropy of the information source emitting x. However, Kolmogorov complexity is non-computable in the Turing sense  and in practical applications it is approximated by the length of the compressed sequence calculated by a compression algorithm .
In summary, the informational complexity of a given RNA sequence with its secondary structure is approximated by the compressed bit string using RNACompress. This definition is straightforward, yet with rigorously theoretical support. Later experiment will prove that our informational complexity can reveal the relationship between structural complexity and functional activity of RNA aptamers, which could be useful in predicting the functional utility of novel heteropolymers.
Our experiments are performed in two parts: first the compression ability of RNACompress is tested, and secondly the results are applied to reveal the relationship between binding activities and structural complexity of RNA aptamers.
A. Compression ability
Descriptions of benchmark data files
5S ribosomal RNA database 
45 metazoan rRNA sequences
GtRDB-Genomics tRNA Database
14 tRNA from various eukaryotes
1855 mammalian miRNAs obtained from the latest release of miRBase.
47509 functional RNAs identified by Evofold, utilizing a comparative genomics method based on phylogenetic stochastic context-free grammars.
97 putative antisense ncRNAs identified from cDNA and EST databases for human and mouse.
411 human snoRNAs and scaRNAs selectedd from snoRNA-LBME-db (release 3, August 2006)
Rfam database 
151 non-coding RNA structures downloaded from Rfam, as collected by Do et al. for CONTRAfold training 
Comparisons of compression ratios and running times of RNACompress, Gencompress, winrar and gzip.
It can be seen that RNACompress achieves the best compression ratio with comparable speed among the other algorithms, except for two tests file rRNA.txt and miRNA.txt. For rRNA.txt, the sequence identities are nearly 90%. Gencompress and other two common compression algorithms are efficient to capture the pattern repeats in this file, thus achieve better results. For miRNA.txt, the same reason also holds. Furthermore, microRNAs are generally short RNA molecules of about 21–23 nucleotides in length, thus their ability to be compressed are reduced compared to longer sequences. Although efficient at searching for approximate matches and reverse complements, the running time for Gencompress was found to be unpractical long when the input file is large.
Essentially, our compression algorithm is based on grammar inference and Huffman coding, and currently does not consider the repeat patterns of the input file. This is why RNACompress failed to achieve the better compression ratio when the sequence identities are high in a set of RNAs. Our algorithm is, however, very robust to different types of RNA and influenced little by the arrangement of the input file. As for the three other algorithms, if we rearrange the same set of RNAs in different order and artificially space out two highly identical sequences, their compression ratios will decrease dramatically. In addition, there also exist other algorithms that are based on different mechanisms besides searching repeat pattern, one of these is PPM , which uses a specialized form of compression based on Markov modeling. Unfortunately, these algorithms are generally computation extensive in their exchange for higher compressions.
B. Aptamer activity and complexity
Spearman Correlation Coefficients (rs) of aptamer activity onto the informational complexity.
K d (nM)a
Our defined information complexity
9 ± 1
17 ± 4
30 ± 6
76 ± 3
250 ± 20
300 ± 50
300 ± 50
300 ± 100
400 ± 200
900 ± 200
8000 ± 1000
r s of K d
Generally speaking, if we treat both RNA sequences and the representation of their secondary structures as text, any text-specific compression algorithms can be used to compress them. However, these compressions have no biological meaning and disturb the original RNA structure information, although they may achieve higher compression ratios. From a biological perspective, RNACompress is more competitive than any others because it is not only an efficient algorithm to compress RNAs, but also a nice model to represent RNA data. These kinds of compression and representation abilities are based on our grammar inference, which is inherently suitable to capture the structural essence of RNA.
currently we are focused on modeling two dominant types of base pairs in RNA secondary structure: Watson-Crick pairs and Wobble pairs. There also exists other minor variations of base-pairing in nucleic acids, such as Hoogsteen base pair (A-T) . One challenge remain problem is how to incorporate the modeling of these minor base pairs and keep the compression ratios simultaneously.
one promising way to improve the compression ability of RNACompress is to consider the repeat pattern of RNA motifs in RNA secondary structure. This is different from the repeat pattern identified in primary sequences, as used in Gencompress etc. Also it will be helpful to approximate the Kolmogorov complexity and evaluate the informational complexity more accuracy. RNA motifs are basic building blocks used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Currently many algorithms for RNA motif identifications have been proposed [6, 32, 33]. However, these efforts were moderately successfully in define simple RNA structure. A powerful algorithm to capture complex structural domains or various non-canonical pairings in RNA motifs is still needed.
another application of compression RNA secondary structure is that it is a great alignment-free tool for RNA secondary structure comparison. A universal (dis)similarity measure (USM) can be defined to measure the pair-wise distance of RNA secondary structures based on the compression, as we will demonstrate elsewhere (Qi Liu et al., RNA secondary structure comparison based on compression: a methodological study, manuscript in preparation).
In this article we have introduced a universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity. We have developed RNACompress, as a useful tool for academic users. Extensive tests have shown that RNACompress is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. RNACompress also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules. Furthermore, future studies will show that our compression algorithm can facilitate the comparisons of RNA secondary structure and studying of non-coding RNA structures, provides a new way to investigate RNA properties based on compression.
Availability and Requirements
RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure
Project home page: http://www.wigs.zju.edu.cn/education/students/liuqi/RNACompress.html
Windows 2000/XP and Linux
The authors would like to thank Dr. Thao Tran, Dr. Ying Xu at Computational Systems Biology Laboratory, University of Georiga, USA for their suggestions.
- Avner P, Heard E: X-chromosome inactivation: counting, choice and initiation. Nat Rev Genet 2001, 2(1):59–67. 10.1038/35047580View ArticlePubMedGoogle Scholar
- Frank DN, Pace NR: RIBONUCLEASE P: Unity and Diversity in a tRNA Processing Ribozyme. Annual Review of Biochemistry 1998, 67(1):153–180. 10.1146/annurev.biochem.67.1.153View ArticlePubMedGoogle Scholar
- Kiss T: Small nucleolar RNA-guided post-transcriptional modification of cellular RNAs. EMBO J 2001, 20(14):3617–3622. 10.1093/emboj/20.14.3617PubMed CentralView ArticlePubMedGoogle Scholar
- Lankenau S, Corces VG, Lankenau DH: The Drosophila micropia retrotransposon encodes a testis-specific antisense RNA complementary to reverse transcriptase. Molecular and Cellular Biology 1994, 14(3):1764–1775.PubMed CentralView ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR: A Computational Screen for Methylation Guide snoRNAs in Yeast. Science 1999, 283(5405):1168–1171. 10.1126/science.283.5405.1168View ArticlePubMedGoogle Scholar
- Batey RT, Rambo RP, Doudna JA: Tertiary motifs in RNA structure and folding. Angew Chem Int Ed 1999, 38: 2326–2343.View ArticleGoogle Scholar
- Nykanen A, Haley B, Zamore PD: ATP Requirements and Small Interfering RNA Structure in the RNA Interference Pathway. Cell 2001, 107(3):309–321. 10.1016/S0092-8674(01)00547-5View ArticlePubMedGoogle Scholar
- Zuker M: Computer prediction of RNA structure. Methods Enzymol 1989, 180: 262–288.View ArticlePubMedGoogle Scholar
- Liu C, Bai B, Skogerb G, Cai L, Deng W, Zhang Y, Bu D, Zhao Y, Chen R: NONCODE: an integrated knowledge database of non-coding RNAs. Nucleic Acids Research 2005, 33(Database Issue):D112-D115. 10.1093/nar/gki041PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR: Rfam: an RNA family database. Nucleic Acids Research 2003, 31(1):439–441. 10.1093/nar/gkg006PubMed CentralView ArticlePubMedGoogle Scholar
- Brown JW, Journals O: The ribonuclease P database. Nucleic Acids Research 2005, 26(1):351–352. 10.1093/nar/26.1.351View ArticleGoogle Scholar
- Pang KC, Stephen S, Engstrom PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS: RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Research 2005, 33(Database Issue):D125. 10.1093/nar/gki089PubMed CentralView ArticlePubMedGoogle Scholar
- Chen X, Kwong S, Li M: A compression algorithm for DNA sequences and its applications in genome comparison. Proceedings of RECOMB 2000., 107:Google Scholar
- Chen X, Li M, Ma B, Tromp J: DNACompress: fast and effective DNA sequence compression. Bioinformatics 2002, 18(12):1696–1698. 10.1093/bioinformatics/18.12.1696View ArticlePubMedGoogle Scholar
- Grumbach S, Tahi F, Inria LC: Compression of DNA sequences. Data Compression Conference, 1993 DCC'93 1993, 340–350.Google Scholar
- Rivals E, Delahaye JP, Dauchet M, Delgrange O: A guaranteed compression scheme for repetitive DNA sequences. Data Compression Conference, 1996 DCC'96 Proceedings 1996.Google Scholar
- Higgs PG: RNA secondary structure: physical and computational aspects. Quarterly Reviews of Biophysics 2001, 33(03):199–253. 10.1017/S0033583500003620View ArticleGoogle Scholar
- Li M, Badger JH, Chen X, Kwong S, Kearney P, Zhang H: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 2001, 17(2):149–154. 10.1093/bioinformatics/17.2.149View ArticlePubMedGoogle Scholar
- Unger SH: A global parser for context-free phrase structure grammars. Communications of the ACM 1968, 11(4):240–247. 10.1145/362991.363001View ArticleGoogle Scholar
- Knuth DE: Dynamic Huffman coding. Journal of Algorithms 1985, 6(2):163–180. 10.1016/0196-6774(85)90036-7View ArticleGoogle Scholar
- Steffen P, Voss B, Rehmsmeier M, Reeder J, Giegerich R: RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 2006, 22(4):500. 10.1093/bioinformatics/btk010View ArticlePubMedGoogle Scholar
- Voss B, Giegerich R, Rehmsmeier M: Complete probabilistic analysis of RNA shapes. BMC Biol 2006., 4(5):Google Scholar
- Hashiguchi K: Limitedness Theorem on Finite Automata With Distance Functions. J COMP AND SYS SCI 1982, 24(2):233–244. 10.1016/0022-0000(82)90051-4View ArticleGoogle Scholar
- Grune D, Jacobs CJH: A programmer-friendly LL (1) parser generator. Software—Practice & Experience 1988, 18(1):29–38. 10.1002/spe.4380180105View ArticleGoogle Scholar
- Knudsen B, Hein J: RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15: 446–454. 10.1093/bioinformatics/15.6.446View ArticlePubMedGoogle Scholar
- Murthy VL, Rose GD: RNABase: an annotated database of RNA structures. Nucleic Acids Research 2003, 31(1):502–504. 10.1093/nar/gkg012PubMed CentralView ArticlePubMedGoogle Scholar
- Campbell J: Grammatical Man: Information, Entropy, Language, and Life. Simon and Schuster; 1982.Google Scholar
- Cover TM TJA: Elements of Information Theory. Wiley; 1990.Google Scholar
- Moffat A: Implementing the PPM data compression scheme. Communications, IEEE Transactions on 1990, 38(11):1917–1921. 10.1109/26.61469View ArticleGoogle Scholar
- Carothers JM, Oestreich SC, Davis JH, Szostak JW: Informational Complexity and Functional Activity of RNA Structures. networks 2001, 63(57):94.Google Scholar
- Zagryadskaya EI, Doyon FR, Steinberg SV, Journals O: Importance of the reverse Hoogsteen base pair 54–58 for tRNA function. Nucleic Acids Research 2003, 31(14):3946–3953. 10.1093/nar/gkg448PubMed CentralView ArticlePubMedGoogle Scholar
- Bergig O, Barash D, Kedem K: RNA Motif Search Using the Structure to String (STR2) Method. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB'04)-Volume 00 2004, 660–661.Google Scholar
- Yao Z, Weinberg Z, Ruzzo WL: CMfinder--a covariance model based RNA motif finding algorithm. Bioinformatics 2006, 22(4):445. 10.1093/bioinformatics/btk008View ArticlePubMedGoogle Scholar
- Szymanski M, Barciszewska MZ, Erdmann VA, Barciszewski J, Journals O: 5S Ribosomal RNA Database. Nucleic Acids Research 2002, 30(1):176–178. 10.1093/nar/30.1.176PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ, Journals O: miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Research 2006, 34(Database Issue):D140-D144. 10.1093/nar/gkj112PubMed CentralView ArticlePubMedGoogle Scholar
- Torarinsson E, Havgaard JH, Gorodkin J: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 2007, 23(8):926. 10.1093/bioinformatics/btm049View ArticlePubMedGoogle Scholar
- Engstrom PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, Lavorgna G, Brozzi A, Luzi L, Tan SL, Yang L: Complex loci in human and mouse genomes. PLoS Genet 2006, 2(4):e47. 10.1371/journal.pgen.0020047PubMed CentralView ArticlePubMedGoogle Scholar
- Lestrade L, Weber MJ, Journals O: snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research 2006, 34(Database issue):D158-D162. 10.1093/nar/gkj002PubMed CentralView ArticlePubMedGoogle Scholar
- Do CB, Woods DA, Batzoglou S: CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics 2006, 22(14):e90. 10.1093/bioinformatics/btl246View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.