Skip to main content
Fig. 6 | BMC Bioinformatics

Fig. 6

From: Machine Learning for detection of viral sequences in human metagenomic datasets

Fig. 6

Feature importance in the Random Forest model as measured by Gini impurity decrease. Each RSCU value’s importance is averaged over a 1000 trees trained on the full metagenomics data set. Codons are grouped according to their corresponding amino acid, with the amino acids with most codons on the left. Vertical lines separate a.a.-s with 6 codons, 4, 3 and 2 codons. The variance is high due to correlations among synonymous codons. Codons TCG, CGC, CGA,GCG,GTA and CCG stand out as more important than others

Back to article page