Skip to main content
Figure 6 | BMC Bioinformatics

Figure 6

From: Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Figure 6

Hypothetical example of how the iterative induction process would work. Extending upon the example offered earlier (Figure 7), a correlation is found by analysis of GO categories – a particular category is statistically overrepresented among genes that have CpG islands located after the start site. It is recorded as a new entity, named after its parent features and analyses that spawned the observation in the first place (10.3.8.1.6 – row #10 connected by analysis #3 with reference to row #8 and compared by analysis #1 to row #6). A new row with a unique name is then created in the SM. This new row is added to the existing SM as the next available number (row 11 in this example), with its name (10.3.8.1.6) providing a means of tracing its origin, and is populated with binary values corresponding to the location of these genomic regions. Later, when the system is re-running correlation analyses, it finds that Row 11 (which was not present during the last analysis run) is correlated with the presence of an unusually large number of transcript variants (arrows at bottom right). The correlation identified is that genes in this GO category also tend to have many splice variants. One might use this correlation to hypothesize that splicing proceeds by "silencing" exons through methylation changes.

Back to article page