Skip to main content

Table 4 A selection of CSE-causing motifs

From: Discovering motifs that induce sequencing errors

(q, n)

Context

Rk.

Occ.

FM

= a

RM

= c

FMM

= b

RMM

= d

- log(p)

FER

[%]

RER

[%]

ERD

[%]

Dataset: GAIIx-bs

(8, 4)

NGGCGGGT

3

264

5857

6867

859

40

180.0

12.8

0.6

12.2

 

CGGNGGGT

4

136

3366

3930

477

22

121.2

12.4

0.6

11.9

 

GGCGGGGT

5

62

1318

1624

180

5

52.0

12.0

0.3

11.7

 

ACGGCGGG

6

84

1690

2065

241

17

58.3

12.5

0.8

11.7

(4, 1)

GGGT

1

13478

374933

384732

10002

2643

∞

2.6

0.7

1.9

 

CGGT

2

25144

716801

730328

14765

5071

∞

2.0

0.7

1.3

 

AGGT

3

20146

581562

584578

12086

4237

∞

2.0

0.7

1.3

 

NGGT

4

79810

2272988

2317196

46304

16224

∞

2.0

0.7

1.3

Dataset: GAIIx-hg

(8, 4)

CGGCGGGT

1

532

731

1330

169

7

60.7

18.8

0.5

18.3

 

TGGCGGGT

2

3232

5715

6410

1128

37

229.3

16.5

0.6

15.9

 

CGGCAGGT

3

1396

2788

3522

409

19

110.8

12.8

0.5

12.3

 

NGGCGGGT

10

13712

24040

30886

3029

158

∞

11.2

0.5

10.7

(4, 1)

No motifs passed filter

Dataset: HiSeq-hg

         

(8, 4)

TGGCGGGT

1

3232

3803

5547

1475

53

∞

27.9

0.9

27.0

 

CGGCGGGT

2

532

418

777

152

4

56.1

26.7

0.5

26.2

 

CGGCAGGT

4

1396

1935

2820

567

23

167.5

22.7

0.8

21.9

 

NGGCGGGT

10

13712

17251

26924

4432

177

∞

20.4

0.7

19.8

 

GTGGCTTG

17

7568

12047

18583

2526

67

∞

17.3

0.4

17.0

(4, 1)

GGGT

1

1366400

3208669

3340323

82048

15104

∞

2.5

0.5

2.0

 

AGGT

2

1836218

4530889

4740634

87166

20448

∞

1.9

0.4

1.5

 

NGGT

3

5261516

13265123

13614878

239748

57694

∞

1.8

0.4

1.4

 

CGGG

4

460830

876560

861233

16336

4710

∞

1.8

0.5

1.3

 

CGGT

5

232662

516547

521942

9306

2544

∞

1.8

0.5

1.3

Dataset: MiSeq-ec

(8, 4)

GGCGGGGT

1

102

16780

24956

5809

88

∞

25.7

0.4

25.4

 

GGCGCCTC

4

4

349

506

84

1

28.7

19.4

0.2

19.2

 

NGGCGGGT

5

762

122922

171199

28401

879

∞

18.8

0.5

18.3

 

CGGNGGGT

11

444

74979

95226

12415

568

∞

14.2

0.6

13.6

 

CGGCGGGN

12

942

158741

205881

25090

1187

∞

13.6

0.6

13.1

(4, 1)

GGGT

1

24802

5324301

5495475

145090

24701

∞

2.7

0.4

2.2

 

AGGT

2

27414

5979767

6104684

121330

29230

∞

2.0

0.5

1.5

 

NGGT

3

146116

32813986

33422161

604790

162298

∞

1.8

0.5

1.3

 

CGGT

4

49530

10934765

11081037

184200

54762

∞

1.7

0.5

1.2

 

GGGN

5

78504

20903313

21323544

338589

114360

∞

1.6

0.5

1.1

 

CGGG

6

32740

7089342

7227334

115433

42523

∞

1.6

0.6

1.0

  1. A selection of CSE-causing motifs for each combination of dataset and parameters. For each motif, we give the rank (Rk.) in the original list sorted by ERD; number of occurrences in the respective genome (Occ.); the contingency table entries FM, RM, FMM, and RMM; the forward error rate FER = FMM/(FM + FMM); the reverse error rate RER = RMM/(RM + RMM); and the error rate difference ERD = FER - RER. If a motif's p-value cannot be numerically distinguished from zero within double precision, we report a - log(p) score of ∞.