Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Accurate and unambiguous tag-to-gene mapping in serial analysis of gene expression

Figure 1

Flowchart of the HGA method. The method consists of four main sequential steps: Step 1) First, all virtual potential tags in the genome are extracted and compared, and the frequency of occurrence of each tag recorded, along with its particular location on the genome (top right). Then, using the most complete and updated protein and RNA tables available for the genome, in addition with the assignments and predictions of the 3' and 5' UTR regions, all potential transcripts and intergenic regions in the genome are extracted and their locations recorded. The information obtained is crossed and a detailed genome-based annotation of virtual SAGE-tags is produced. Step 2) Based on its genomic position, its annotation and its frequency of occurrence on the genome, each virtual tag is assigned to one out of seven possible classes (center). This new classification scheme helps in the assignment of tag confidence in subsequent steps. A detailed explanation of each tag class is provided in Table 1. Step 3) All known experimental SAGE-tags (Table 2) are crossed against the previously generated classification of virtual tags and only the experimental tags belonging to the classes platinum, copper and iron are selected (bottom left). The set of tags belonging to the class platinum are further subdivided into two different groups: i) those tags that map to a transcript and are not located upstream from an internal poly(A) region and ii) those tags that map to a transcript and are next to an internal poly(A) region. The genomic annotation and classification of each tag is used to determine its probability of being observed by experiment. A detailed description of the probability functions that are derived from these data is shown in Figure 2. Step 4) The tag classification generated in step 2 is crossed against the probabilities obtained in step 3 to produce a confidence assignment (high, low or undefined) for each virtual tag in the genome (bottom right). This information can be used to unambiguously map experimental SAGE-tags to annotated transcripts and/or genomic regions, along with a confidence estimation of the mapping result. A detailed explanation of the different steps used in the HGA process is provided in the main text and in methods.

Back to article page