Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Probabilistic topic modeling for the analysis and classification of genomic sequences

Figure 2

Training workflow. From the sequences of the input DNA dataset are extracted the words through the k-mer decomposition; then using the Latent Dirichlet Allocation (LDA) algorithm a probabilistic topic model is learned. The model provides the topic distribution of the input dataset, retrieved from the Ribosomal Database Project (RDP) online repository, and the most probable topics are labeled with a taxonomic rank using a majority voting scheme.

Back to article page