Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data

BMC Bioinformatics

Table 2 Simulated bisulfite read experiment

Evaluation metric	GNUMAP-bs	Novoalign	BSMAP	Bismark	Bismark-bt2	LAST
Overall mapping results:
Total reads aligned (%)	156.6M(97.8)	155.8M(97.4)	153.4M(95.9)	149.5M(93.4)	145.4M(90.8)	158.7M(99.2)
Correctly aligned (%)	155.2M(97.0)	154.2M(96.3)	150.2M(93.9)	149.2M(93.2)	145.2M(90.7)	155.1M(96.9)
Incorrectly aligned (%)	1.4M(0.9)	1.7M(1.1)	1.5M(1.0)	0.3M(0.2)	0.2M(0.1)	3.6M(2.3)
With ≥1 sequence variant:
Total reads aligned (%)	69.0M(97.8)	66.0M(93.6)	65.3M(92.6)	63.6M(90.2)	59.6M(84.4)	70.3M((99.7)
Correctly aligned (%)	67.7M(96.0)	65.3M(92.6)	63.9M(90.1)	63.3M(89.8)	59.4M(84.1)	66.7M(94.6)
Incorrectly aligned (%)	1.3M(1.8)	0.7M(1.0)	1.4M(2.0)	0.3M(0.4)	0.2M(0.3)	3.5M(5.1)
Predicted methylation:
Ave. absolute estimation err.	0.11	0.69	0.22	0.11	0.10	-
Standard err.	0.056	0.066	0.067	0.064	0.062	-
Computational resource:
Total compute time (16 CPUs)	39 h 50 m	29 h 25 m	4 h 28 m	46 h 16 m	97 h 26 m	58 h 20 m
Peak memory usage (GB)	44.8	14.5	9.4	5.9	7.9	15.9
Reads per second per CPU	68	92	607	448	26	753

Simulation study of 160 million (M) simulated BSRs generated from the human genome reference sequence. The GNUMAP-bs algorithm was the most sensitive aligner, especially for reads with ≥1 sequence variant (sequencing errors or mutations). The Bismark algorithm had the smallest error rate with 1.2 M fewer erroneously assigned reads than GNUMAP-bs, however GNUMAP-bs correctly aligned 6 to 10 M more reads. The BSMAP algorithm had the fastest total run time, however its sensitivity was less than the sensitivity of the GNUMAP-bs algorithm. LAST mapped nearly all reads with a sensitivity that was comparable to that of GNUMAP-bs, but the mapping error rate for LAST was much higher than it was for GNUMAP-bs.

ISSN: 1471-2105