From: NGS data vectorization, clustering, and finding key codons in SARS-CoV-2 variations
Cluster (group) | Codon (amino acid) | Cluster 1 (B, Ref) | Cluster 125 (A, Alpha) | Cluster 140 (C, Delta) | Cluster 536 (H, 490R-GH) | Cluster 650 (I, Omicron) | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Random forest | SHAP | Random forest | SHAP | Random forest | SHAP | Random forest | SHAP | Random forest | SHAP | ||
Cluster 9 (A, Ref) | AGC (SER) | 0.26 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | 0 |
CAG (GLN) | 0.21 | 1.915 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
CCU (PRO) | 0 | 0 | 0.088 | 15.69 | 0.13 | 0 | 0.032 | 0 | 0 | 0 | |
CUG (LEU) | 0 | 0 | 0.012 | 0.018 | 0.12 | 38.2 | 0 | 0 | 0 | 0 | |
GGU (GLY) | 0 | 0 | 0.01 | 0 | 0.08 | 5.627 | 0 | 0 | 0.05 | 0 | |
ACU (THR) | 0 | 0 | 0 | 0 | 0.06 | 11.35 | 0.061 | 0 | 0.03 | 0 | |
GAU (ASP) | 0 | 0 | 0.024 | 0.160 | 0.04 | 0 | 0.041 | 0 | 0 | 0 | |
ACA (THR) | 0 | 0 | 0.137 | 22.160 | 0.02 | 8.931 | 0 | 0 | 0.05 | 0 | |
GAC (ASP) | 0 | 0 | 0.074 | 0.297 | 0 | 0 | 0.068 | 0 | 0.03 | 0 | |
AAU (ASN) | 0 | 0 | 0.08 | 0.181 | 0 | 0 | 0.058 | 0 | 0.05 | 0 | |
AGA (ARG) | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | 0 | 0 | 0 | |
CCA (PRO) | 0 | 0 | 0 | 0 | 0 | 0 | 0.04 | 0 | 0 | 0 | |
UUU (PHE) | 0 | 0 | 0 | 0 | 0 | 0 | 0.04 | 7.758 | 0 | 0 | |
AUU (ILE) | 0 | 0 | 0 | 0 | 0 | 0 | 0.038 | 0 | 0 | 0 | |
UAU (TYR) | 0 | 0 | 0 | 0 | 0 | 0 | 0.032 | 0 | 0 | 0 | |
GAG (GLU) | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | 0 | 0 | 58.34 | |
AAG (LYS) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.05 | 0 | |
UCA (SER) | 0 | 0 | 0.033 | 0.043 | 0 | 0 | 0 | 0 | 0.04 | 0 | |
CAA (GLN) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.03 | 0 | |
GCU (ALA) | 0 | 0 | 0.097 | 0.144 | 0 | 0 | 0 | 0 | 0.03 | 0 |