Table 4 Descriptions of benchmark data files

From: RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

File Type Source Size Description Sequence identities
rRNA.txt rRNA 5S ribosomal RNA database [34] 10.8 KB 45 metazoan rRNA sequences High
tRNA.txt tRNA GtRDB-Genomics tRNA Database 2.06 KB 14 tRNA from various eukaryotes Medium
miRNA.txt microRNA miRBase [35] 328 KB 1855 mammalian miRNAs obtained from the latest release of miRBase. High
evofold.txt Mixed ncRNA [36] 3.72 MB 47509 functional RNAs identified by Evofold, utilizing a comparative genomics method based on phylogenetic stochastic context-free grammars. Low
asRNA.txt Mixed ncRNA [37] 148 KB 97 putative antisense ncRNAs identified from cDNA and EST databases for human and mouse. Low
snoRNA.txt snoRNA snoRNA-LBME-db [38] 82.5 KB 411 human snoRNAs and scaRNAs selectedd from snoRNA-LBME-db (release 3, August 2006) Low
151Rfam.txt Mixed ncRNA Rfam database [10] 40.7 KB 151 non-coding RNA structures downloaded from Rfam, as collected by Do et al. for CONTRAfold training [39] Low