From: Data structures and compression algorithms for high-throughput sequencing technologies
 | Dataset 1 | Dataset 2 | Dataset 3 |
---|---|---|---|
Standalone Methods | Â | Â | Â |
Read Length | 6,439,584 | 1,697,990 | 59,267,219 |
Chromosome | 31,576,860 | 9,997,062 | 31,118,531 |
Strand | 6,439,584 | 1,697,990 | 31,118,531 |
# Mismatches | 12,382,598 | 2,499,664 | 55,624,291 |
Total | 50,399,042 | 14,194,716 | 117,861,353 |
   Start Location |  |  |  |
MOV†| 121,565,953 | 44,200,254 | 787,554,494 |
EG†| 236,691,716 | 86,701,276 | 1,543,990,407 |
REG†| 10,745,562 | 26,180,752 | 76,430,489 |
Huffman | 91,019,189 | 82,444,521 | 1,324,964,740 |
RHuffman | 10,311,095 | 19,066,500 | 65,905,674 |
Best Standalone | 60,710,137 | 33,261,216 | 183,767,027 |
Combined Methods | Â | Â | Â |
(C,S,M) Lookup | 64,424,309 | 33,809,380 | 158,272,463 |
REG Indexed†| 12,133,110 | 32,342,080 | 144,975,985 |
Mismatches | Â | Â | Â |
Nucleotide | 13,917,023 | 1,307,870 | 53,441,350 |
From Start | 30,028,807 | 4,177,576 | 159,433,004 |
From End | 32,671,455 | 2,333,372 | 153,865,294 |
Total Start | 43,945,830 | 5,485,446 | 212,874,354 |
Total End | 46,588,478 | 3,641,242 | 207,306,644 |
Combined†| 44,033,309 | 3,757,400 | 186,298,126 |
Best Compression | 56,078,940 | 35,983,322 | 390,541,330 |
GenCompress | 56,166,419 | 36,099,480 | 390,541,330 |