Skip to main content

Table 2 Estimation of the node multiplicity in de Bruijn graphs (k=21) built from real Illumina data for 5 organisms (2 bacteria, 3 eukaryotes)

From: Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields

  

10×

 

25×

 

50×

 

s

node acc.

k-mer acc.

 

node acc.

k-mer acc.

 

node acc.

k-mer acc.

P. aeruginosa

0

84.31

96.47

 

95.02

98.72

 

97.47

99.21

 

1

92.85

98.80

 

98.27

99.49

 

98.89

99.46

 

3

93.91

98.96

 

98.60

99.50

 

99.13

99.52

 

5

94.11

98.94

 

98.74

99.51

 

99.17

99.51

S. enterica

0

84.50

96.46

 

93.65

98.25

 

95.96

98.53

 

1

88.81

97.18

 

94.98

98.39

 

96.44

98.60

 

3

89.41

97.18

 

95.27

98.45

 

96.53

98.63

 

5

89.55

97.22

 

95.32

98.46

 

96.57

98.64

C. elegans

0

68.65

93.93

 

80.69

96.42

 

87.35

97.10

 

1

78.90

96.48

 

86.47

97.77

 

90.74

98.05

 

3

81.02

97.21

 

87.16

98.01

 

91.27

98.24

 

5

81.29

97.18

 

87.32

98.05

 

91.25

98.25

A. thaliana

0

67.84

89.10

 

82.20

96.16

 

89.91

97.05

 

1

73.67

94.64

 

85.45

96.83

 

91.27

97.54

 

3

73.92

95.26

 

85.83

97.09

 

91.46

97.71

 

5

73.93

95.43

 

85.68

97.17

 

91.56

97.70

H. sapiens

0

75.26

92.27

 

83.29

94.67

 

88.09

95.51

 

1

80.68

93.92

 

85.66

95.23

 

89.23

95.83

 

3

81.33

94.56

 

86.12

95.50

 

89.52

95.95

 

5

81.57

94.71

 

86.26

95.58

 

89.59

95.97

  1. The datasets were downsampled to coverage depths of 10×,25× and 50×. For H. sapiens, the multiplicity was inferred for one million randomly sampled nodes; for all other datasets the multiplicity was inferred for all nodes. The node (resp. k-mer) accuracy refers to the percentage of nodes (resp. k-mers) in the de Bruijn graph that were assigned the correct multiplicity. The accuracy improves when using CRFs with increasing neighbourhood size s