Skip to main content

Table 2 Summary of simulated and real contig data

From: EndHiC: assemble large contigs into chromosome-level scaffolds using the Hi-C links from contig ends

Assembly type

Species

Chromo-somes (n)

Genome size

Total contig number

Total contig length

Assembled percent (%)

Average contig number

Contig N50

Contig N90

Simulated assembly

Human

23

3,054,815,472

104

3,054,815,472

100%

4.52

47,909,438

14,532,355

Rice

12

373,094,580

42

373,094,580

100%

3.50

11,071,427

5,000,429

Arabidopsis

5

119,146,348

11

119,146,348

100%

2.20

9,660,775

5,994,203

Hifiasm-assembly

Human

23

3,054,815,472

82

3,007,080,905

98%

3.57

89,131,734

28,203,557

Great burdock

18

1,720,000,000

30

1,709,056,189

99%

1.67

74,692,580

38,981,084

Water spinach

15

485,000,000

29

480,197,403

99%

1.61

23,511,778

9,860,712

  1. Genome size are validated or estimated value. The contigs with size > 1 Mb are used for statistics in this table. Assembled percent = total contig length/genome size. Average contig number means average contig number per chromosome. For simulated data, the reference genomes of human CHM13 v1.1, rice (Nipponbare) ASM386523v1, Arabidopsis thaliana (Columbia) TAIR10.1 were used to simulate large contigs, and each chromosome of reference genome was randomly split into 1–6 contigs. In the simulated assembly, contigs size smaller than 1 Mb were not allowed, and all the contigs are larger than 1 Mb. For real data, the hifiasm-assembled contigs of human were downloaded from https://zenodo.org/record/4393631/files/CHM13.HiFi.hifiasm-0.12.fa.gz, while the contigs of great burdock and water spinach were assembled by hifiasm using default parameters from HiFi reads downloaded from NCBI-SRA databases (PRJNA764011 and PRJNA764042)