Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: KEGG orthology prediction of bacterial proteins using natural language processing

Fig. 1

Schematic overview of our pipeline. In this study, we started by collecting KO and non-KO data from the KEGG GENES database to construct our classifier (left). Subsequently, we employed the classifier to mine protein sequences for the identification of potential KOs and used an embedding-based clustering module to assign a specific K number (middle). To validate our results, we performed structural alignment between the candidate KO sequences and the known sequences in the KEGG database (right)

Back to article page