Skip to main content

Table 1 Number of variants imported from various external resources

From: Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts

Study Variant sites Variants Unique to study Variants passed Samples
1000 Genomes [1] 81,195,126 81,693,252 57,400,612 all 2,504
ESP6500 [2] 1,982,177 1,998,204 184,225 all 6,503
UK10K [47] ALSPAC/TWINS 37,258,978 37,560,436 6,155,493 all 2,432
UK10K with disease c 9,391,582 11,177,227 8,847,466 9,969,036 4,888
TCGA [4] germline c 200,691,728 219,533,884 90,884,769 n/a 4,224
TCGA somatic 876,970 890,172 696,754 all 4,205
Scripps Wellderly [48] 76,144,271 91,947,469 63,331,143 53,303,437 534
ExAC b [3] 9,579,712 10,450,724 6,581,946 8,811,372 63,352
MSSM BioBank genotyping 849,806 849,806 0 all 11,210
In-house resequencing study 29,326,393 29,671,729 10,134,258 23,610,572 142
Total observed 358,152,122 399,404,510 244,216,666 >217,796,115 82,558 b
Other resources:      
dbNSFP a [18] 30,523,109 89,617,785 73,561,239
ClinVar [12] 101,317 104,455 31,694
OMIM [49] 10,863 10,913
COSMIC [50] 1,483,983 1,525,243
PharmGKB c [51] 672 684
SwissVar d (77,047) (84,649) (34,198)
HGMD c [13] 125,744 133,464 32,178
Literature mining 890,665
Total observed + other 388,902,292 472,965,749 317,841,777 >217,796,115 82,558
  1. The first block refers to sequencing/genotyping studies, the second to sample-independent annotation databases. “Unique to study” counts variants that were observed only in that particular study. “Variants passed” refers to variants that passed quality metrics as defined by the particular study, at least one sample has to pass; n/a: individual sample quality metrics not available. Totals exclude duplicates seen in different studies. Variants in annotation databases are included only if they can be mapped to precise coordinates and allele. Since a large proportion of the variants discovered by literature mining are given at the protein level only, they were not compared to other studies
  2. adbNSFP contains hypothetical variants, see text
  3. bExAC includes samples from 1000 Genomes, ESP6500, and TCGA
  4. cNote that data from HGMD, PharmGKB, UK10K diseases and TCGA germline are not visible to external users on the RVS website
  5. dCounts for SwissVar refer to distinct amino acid changes. Further details on individual resources are provided in Additional file 4: Table S3