Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Geminivirus data warehouse: a database enriched with machine learning approaches

Fig. 1

Overview of the geminivirus.org framework. Initially, the geminivirus data were recovered from GenBank in the GenBank file format (1). The data were extracted, transformed, and standardized using algorithms based on rules and machine learning (ML) approaches (2). Next, the abstracts of the scientific publications were recovered from PubMed (https://www.ncbi.nlm.nih.gov/pubmed) (3) and the geographic coordinates of the isolates were retrieved from Google Maps (4). Data were merged and loaded into the relational database (5) in different dimensions such as the collection date, host range, geographic region, genomic data, associated publications, and organism data. The data were used to define the training set for building ML models to classify genera using Random Forest (RF), Multilayer Perceptron (MLP), and Sequential Minimal Optimization (SMO) learning algorithms (6). Information and analytical tools, such as basic local alignment tools (BLAST), sequence demarcation tools, and phylogenetic reconstruction, were embedded in the system (7) and an ORF Search tool for classification of ORFs based on ML procedures (8) was implemented. All analysis results are visible and freely available (9)

Back to article page