Skip to main content

Table 4 Descriptions of benchmark data files

From: RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

File

Type

Source

Size

Description

Sequence identities

rRNA.txt

rRNA

5S ribosomal RNA database [34]

10.8 KB

45 metazoan rRNA sequences

High

tRNA.txt

tRNA

GtRDB-Genomics tRNA Database

2.06 KB

14 tRNA from various eukaryotes

Medium

miRNA.txt

microRNA

miRBase [35]

328 KB

1855 mammalian miRNAs obtained from the latest release of miRBase.

High

evofold.txt

Mixed ncRNA

[36]

3.72 MB

47509 functional RNAs identified by Evofold, utilizing a comparative genomics method based on phylogenetic stochastic context-free grammars.

Low

asRNA.txt

Mixed ncRNA

[37]

148 KB

97 putative antisense ncRNAs identified from cDNA and EST databases for human and mouse.

Low

snoRNA.txt

snoRNA

snoRNA-LBME-db [38]

82.5 KB

411 human snoRNAs and scaRNAs selectedd from snoRNA-LBME-db (release 3, August 2006)

Low

151Rfam.txt

Mixed ncRNA

Rfam database [10]

40.7 KB

151 non-coding RNA structures downloaded from Rfam, as collected by Do et al. for CONTRAfold training [39]

Low