Skip to main content

Table 1 Number of collected SARS-CoV-2 genomes in a) the main dataset (n = 1,131,185) b) the validation dataset (n = 67,399)

From: Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques

Clades

SARS-CoV-2 genomes

Continents

SARS-CoV-2 genomes

(a) The main dataset (n = 1,131,185)

Clade_G

163,511 (14.45%)

Africa

17,986 (1.59%)

Clade_GH

162,666 (14.38%)

Asia

87,711 (7.75%)

Clade_GK

154,275 (13.6%)

Europe

576,936 (51.00%)

Clade_GR

162,619 (14.37%)

North America

389,136 (34.4%)

Clade_GRA

159,190 (14.07%)

Oceania

10,761 (0.951%)

Clade_GRY

170,070 (15%)

South America

43,548 (3.84%)

Clade_GV

158,854 (14%)

Unknown

5107 (0.45%)

(b) The validation dataset (n = 67,399)

Clade_G

3161 (4.68%)

Africa

2225 (3.3%)

Clade_GH

6169 (9.15%)

Asia

12,145 (18%)

Clade_GK

22,436 (33.28%)

Europe

28,940 (42.93%)

Clade_GR

10,536 (15.63%)

North America

13,784 (20.35%)

Clade_GRA

17,844 (26.47%)

Oceania

1781 (2.64%)

Clade_GRY

6591 (9.77%)

South America

6761 (10.03%)

Clade_GV

662 (0.98%)

Unknown

1763 (2.61%)