Skip to main content

Table 2 Sequence alignment statistics for simulated genomes

From: FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses

 

SAMtools flagstat

Mapped

Correctly paired

Singletons

GRCh37

98.22%

96.34%

0.85%

S1

97.89%

95.52%

1.00%

S2

95.46%

92.95%

1.09%

S3

97.89%

95.54%

0.99%

S4H

90.09%

85.11%

2.89%

S5H

90.35%

85.45%

2.84%

S6SV

88.16%

83.22%

2.88%

  1. A comparison of the 1000Genomes reads for ERX000272 mapped against each genome. GRCh37 is the current reference genome. S1, S2 and S3 are genomes generated based on normal variation data. S4H and S5H were generated with high variation data and S6SV is based on normal variations but with the chromosome arm 19q deleted. The table columns are statistics provided by SAMtools flagstat: Mapped provides the total percentage of reads that mapped to the genome on the left; Correctly Paired provides the percentage of reads that aligned to the genome in their proper pair; and Singletons provides the percentage of reads that were orphaned in the alignment. As expected, genomes S1-3 show mapping statistics that are close to the reference genome, while the others show a significantly lower statistics due to the higher frequency and larger bp size of variations used to generate these genomes.