Skip to main content

Table 1 Average assembly statistics of all 12 data sets using BBAP with multiple approaches

From: De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline

 

PDa

FDb

SRc

PDRd

RRs

214,942

21,494,295

21,494,295

21,494,295

HQRs

143,912

14,388,844

14,388,844

14,388,844

URs

27,150

860,144

860,144

860,144

HRURs

6264

60,228

60,228

60,228

RiHRURs

116,555

13,388,423

13,388,423

13,388,423

Contigs assemblede

2.1

46.0

1.0

3.9

Max contig length

3119

1473

3,207

3148

Average contig length

2319

321

3207

1268

% of Mapped HRURs

95.9%

70.3%

67.4%

69.9%

% of Mapped RiHRURs

80.4%

68.7%

82.7%

84.5%

  1. The full data sets were used in the BBAP assembly with FD, SR, and PDR approaches, whereas partial data sets consisting of 1% of randomly selected RRs were used in the BBAP PD assembly approach
  2. aPartial data set de novo assembly
  3. bFull data set de novo assembly
  4. cSanger reference assembly
  5. dPartial data set reference assembly of the full data set
  6. eOnly minimum assembled contig length > 150 bp was shown
  7. RRs raw reads, HQRs high quality reads (quality score threshold = 20, i.e., sequencing error rate = 1%), URs unique representative reads, HRURs high redundancy unique representative reads (unique representative reads with redundancy threshold = 5), RiHRURs reads included in high redundancy unique representative reads