Skip to main content

Table 1 Assembly statistics

From: Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms

  

Illumina derived datasets

Simulated datasets

 

Stepa

L. stagnalis

S. cerevisiae

Caenorhabditis sp.

D. simulans

H. rhamnoides

N. benthamiana

D. melanogaster

C. elegans

Number of concatenated transcripts

6

576,412

25,854

152,491

184,892

278,987

885,944

42,535

15,340

CDS number

7

139,727

22,180

112,813

81,598

137,601

379,596

37,920

41,103

uniCDS numberb

8

59,178 (58%)

9,942 (55%)

40,116 (64%)

27,735 (66%)

63,092 (54%)

131,656 (65%)

12,118 (68%)

14,890 (64%)

Total transcript number

9

58,185

9,744

39,022

26,968

61,798

127,526

11,582

14,283

Total CDS number

9

64,659

11,605

51,416

34,363

68,288

153,118

14,231

15,412

Transcripts with multiple CDSsc

9

5,759 (10%)

1,529 (15%)

9,756 (19%)

5,838 (22%)

5,999 (10%)

21,060 (17%)

2,218 (19%)

949 (7%)

Redundant CDSsd

9

5,481 (9%)

1,663 (14%)

11,300 (22%)

6,628 (19%)

5,196 (8%)

21,462 (14%)

2,113 (15%)

522 (3%)

Transcriptome size (bp)

9

131,591,076

16,164,888

69,689,679

69,421,322

86,181,833

206,036,224

34,121,269

19,765,122

Smallest transcript (bp)

9

300

300

300

300

300

300

300

300

Largest transcript (bp)

9

35,470

15,061

21,466

51,362

13,117

19,833

29,220

26,756

N50

9

3,483

2,414

2,366

3,866

1,823

2,116

4,479

1,666

  1. aStep number in Fig. 1
  2. bProportion of discarded CDSs is indicated in brackets
  3. cProportion of transcripts with >1 CDS is indicated in brackets
  4. dProportion of none unique CDSs is indicated in brackets