From: Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
SNPs
Size
PLINK
Gzip
SpeedGene
DNAzip (Extrapolated)
Avg MAF
1 million
3.731 GB
238 MB
22 MB
18 MB
16 MB + ∼ 4.2 GB reference
0.004944
30 million
112 GB
6.985 GB
592 MB
534 MB
310 MB + ∼ 4.2 GB reference
0.004228