Skip to main content

Table 2 More complete mate-pair statistics

From: Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies

K-mer

MP Size(nt)

PathLenMatcha

CrossForkb

MatchSeqc

Uniqued

Usablee

ComplReducf

100

400

99.96

0.78

91.84

83.33

0.48

57.97

 

2000

98.99

2.90

63.82

59.71

1.03

58.87

 

6000

96.86

6.05

62.21

61.42

2.20

60.50

 

8000

95.85

7.41

62.01

61.60

2.69

60.13

 

35000

84.60

20.43

60.89

62.21

6.55

58.27

50

400

99.68

1.87

72.72

64.92

0.75

47.22

 

2000

97.72

4.58

61.08

64.42

1.69

46.84

 

6000

93.72

9.18

62.42

67.93

3.63

47.09

 

8000

91.87

11.16

62.75

68.69

4.39

46.98

 

35000

72.98

29.16

62.74

69.43

8.98

45.17

35

400

99.5

2.68

68.5

65.67

1.06

35.87

 

2000

96.45

6.56

63.66

69.57

2.82

34.17

 

6000

90.51

13.30

66.05

72.61

5.77

33.90

 

8000

87.84

16.04

66.47

72.99

6.77

33.66

 

35000

62.29

38.43

66.47

73.91

10.78

31.62

  1. In the table below, a library of 50,000 mate-pairs of a particular length was applied to each of the graphs for each of the 391 genomes listed in Additional file 1. The values refer to average percentages. The first four columns refer the percentages retained by successive filtration steps used to identify and remove unusable mate-pairs: a% which had a shortest path of the prescribed length between end sequences, b% which crossed forks, c% which shortest path matched insert sequence, and d% which had a unique shortest path. The final two percentage values (e% of mate-pairs that were usable, and f % reduction in finishing complexity) refer to overall percentages across all 50,000 mate-pairs for each category.