Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: 3'-UTR SIRF: A database for identifying clusters of short interspersed repeats in 3' untranslated regions

Figure 1

Schematic representation of the information stored in the 3'-UTR SIRF database. Sequences were extracted from the Mammalian Gene Collection (NCBI) and stored in the insdseq table of the database. REPFIND was then used to identify clusters of all perfect repeats in the 3'-UTRs of these sequences. The results of this computational analysis were stored in the 'match' table. A similar table, 'match_random' was generated on the same sequences which had their nucleotides shuffled in a random fashion. All information included in the insdseq table is from the NCBI database, except INSDSeq_Create_release, which defines when the table entry was created and INSDSeq_Update_release, which identifies when the table entry is modified. INSDSeq_ID is used as the identification number into the table. It has the same role as INSDSeq_primaryAccession, but is used because it is an integer that is more efficient for indexing. INSDSeq_ID in the match and match_random tables indicates the gene corresponding to the cluster identified by REPFIND. In addition, the P-value, sequence of the repeat (motif), number of motifs, start (cluster_start), and end (cluster_end) of each cluster are shown. These last two entries are used to calculate the size of each identified cluster.

Back to article page