Skip to main content

Table 1 The summary of datasets (as of 2020-07-17)

From: AutoCoV: tracking the early spread of COVID-19 in terms of the spatial and temporal patterns from embedding space by K-mer based deep learning

 

NCBI (5,210)

GISAID w/o NCBI (61,210)

Subclass

S (1,246), L (266), V (130),

G (463), GR (418), GH (2,687)

S (4,328), L (3,856), V (4,418),

G (14,982), GR (19,316), GH (14,310)

Spatial

Asia (454), Oceania (403),

Europe (280), North America (4,073)

Asia (3,805), Oceania (2,151),

Europe (41,365), North America (13,889)

Temporal

Early (178), Middle (2,632),

Late (2,400)

Early (1,175), Middle (25,058),

Late (34,977)

  1. Each dataset has three categories of SARS-CoV-2 characteristics: Pathogenic mutations (Subclass), Spatial, Temporal. The value in the parenthesis denotes the number of sequences. The detailed information about Subclass label was described in Additional file 1: Table S1