Skip to main content

Table 4 Assembly results for three metagenomic datasets

From: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Library Assembly run # Reads # Contigs (> 500 bp) Average contig length (> 500 bp) Contig N501 (bp) # Concatenated tag sequences allowing 3 mismatches
LIB019 A 42,825 136 (25) 329.91 (703.08) 423 10
  B 34,778 73 (25) 390.04 (694.04) 605 5
  C 35,4262 50 (26) 510.94 (768.92) 663 0
LIB020 A 17,129 89 (6) 246.40 (557.33) 306 4
  B 14,208 55 (13) 292.85 (655.85) 510 3
  C 14,3662 52 (12) 312.54 (726.33) 547 0
LIB021 A 49,282 305 (15) 238.54 (682.00) 276 29
  B 41,126 186 (18) 264.12 (691.67) 302 16
  C 42,4952 165 (20) 282.39 (782.00) 303 0
  1. The GS De Novo Assembler Software version 2.3 (Roche, Branford, CT) was used to assemble three metagenomic libraries (LIB019, LIB020 and LIB021) to illustrate how TagCleaner can improve metagenomic and other high-throughput studies. The assembly parameters were set to 95% identity over at least 35 bp. Assemblies were generated for three different parameter sets for each of the metagenomic libraries: (A) raw data; (B) tag sequences trimmed allowing three mismatches; (C) tag sequences trimmed allowing three mismatches with additional splitting of the fragment-to-fragment concatenations and continuous end tag trimming. For B and C, the minimum sequence length was set to 40 bp, sequence duplicates were removed and all other parameters were kept at their default values.
  2. 1 The N50 contig size is a weighted median that is defined as the length of the smallest contig C in the sorted list of all contigs where the cumulative length from the largest contig to contig C is at least 50% of the total length (sum of contig lengths).
  3. 2 Increased number of reads due to splitting of the fragment-to-fragment concatenations.