Skip to main content

Table 1 Data mart facts

From: ENGINES: exploring single nucleotide variation in entire human genomes

1000 Genomes Phase I

   

Populations

N genomes

Variant Sites*1

Variant Genotypes

ASW

24

14,037,711

336,905,064

CEU

90

10,983,038

988,473,420

CHB

68

9,490,259

645,337,612

CHS

25

7,588,537

189,713,425

FIN

36

8,680,985

312,515,460

GBR

43

9,376,836

403,203,948

JPT

84

10,071,464

846,002,976

LWK

67

17,279,531

1,157,728,577

MXL

17

8,513,411

144,727,987

PUR

5

6,354,128

31,770,640

TSI

92

11,368,655

1,045,916,260

YRI

78

16,567,193

1,292,241,054

TOTAL

629

28,210,483

7,394,536,423

HapMap release 28

   

Populations

N samples

Variant Sites

Variant Genotypes

ASW

53

1,543,440

81,802,320

CEU

121

2,816,160

340,755,360

CHB

139

2,635,473

366,330,747

CHD

109

1,312,139

143,023,151

GIH

101

1,409,285

142,337,785

JPT

116

2,561,639

297,150,124

LWK

110

1,527,108

167,981,880

MEX

58

1,453,424

84,298,592

MKK

156

1,532,287

239,036,772

TSI

102

1,420,285

144,869,070

YRI

153

3,151,427

482,168,331

TOTAL

1218

4,170,392

2,489,754,132

  1. The comparison of all the variability information present on the 1000 Genomes Phase I with HapMap release 28 indicates that although HapMap doubles the sample size, 1000 Genomes triples the number of genotypes due to the superior density of variants (this is particularly interesting in the YRI population which is now even more completely described than before). The number of non-monomorphic sites is reported as "variant sites".
  2. *1Variant sites refer to the number of bi-allelic markers observed in each population group. Note that the Phase I does not contain information on tri- or tetra-allelic variants while in Pilot 1 there are more than 16,000 tri-allelic SNVs plus 12 tetra-allelic SNVs (data not shown).