Skip to main content

Table 1 Performance of the SpeedGene algorithm on the simulated datasets

From: Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data

SNPs

Size

PLINK

Gzip

SpeedGene

DNAzip (Extrapolated)

Avg MAF

1 million

3.731 GB

238 MB

22 MB

18 MB

16 MB + ∼ 4.2 GB reference

0.004944

30 million

112 GB

6.985 GB

592 MB

534 MB

310 MB + ∼ 4.2 GB reference

0.004228

  1. Compressed file sizes of the simulated datasets using PLINK, Gzip, SpeedGene and DNAzip. Each dataset contains 1000 subjects.