Skip to main content

Table 2 Amount of sequences extracted by classification and amount of duplicated sequences eliminated during the preprocessing

From: Transductive learning as an alternative to translation initiation site identification

Downstream

TIS

UPSTREAM

UPSTREAM

CDS

CDS

 

Window

 

out of Phase (nTIS)

in Phase

in Phase

out of Phase

 
 

Non-duplicated

Duplicated

Non-duplicated

Duplicated

Non-duplicated

Duplicated

Non-duplicated

Duplicated

Non-duplicated

Duplicated

 

Rattus norvegicus

  

235

113

49

123

61

58

34

11703

1373

9738

1989

 

518

100

38

124

47

60

29

8630

1120

6161

1638

 

800

81

29

114

41

58

28

2141

945

2170

1451

 

1081

66

22

101

39

57

24

546

824

983

1278

 

1365

48

14

86

37

54

15

463

741

822

1158

 

1650

42

11

69

28

40

11

420

675

720

1056

 

Mus musculus

  

235

678

358

776

471

308

170

5154

4853

8364

8067

 

518

581

272

810

384

315

147

4323

3779

7230

6316

 

800

466

203

726

331

288

127

3612

2927

6102

5000

 

1081

398

158

632

293

260

113

2976

2311

5318

3931

 

1365

319

113

568

234

242

92

2506

1839

4440

3124

 

1650

277

79

495

187

208

68

2104

1463

3757

2454

 

Homo sapiens

  

235

13564

7271

17729

9177

6972

3386

109658

109137

194726

192024

 

518

13124

5674

18760

7188

7606

2492

94503

83527

171463

150192

 

800

11579

4398

17917

5914

7334

1986

79368

63663

148148

117440

 

1081

9716

3366

16085

4902

6629

1677

65717

48341

126260

90786

 

1365

7753

2469

13662

3853

5649

1371

54818

37422

106842

71030

 

1650

5877

1793

10918

2871

4537

1098

46233

29136

91066

56808

 

Drosophila melanogaster

  

235

15225

10455

26777

18065

12378

8252

142022

194250

200046

285816

 

518

13723

9076

27548

16202

12787

7432

119185

162288

171884

244359

 

800

11942

7745

26905

14748

12581

6704

99615

134106

146638

208787

 

1081

10122

6594

25725

13314

12086

6092

82645

110225

124842

178443

 

1365

8344

5400

23695

11929

11233

5474

69079

91113

106093

153047

 

1650

6657

4390

21482

10472

10227

4740

58253

75705

90979

131754

 

Arabidopsis thaliana

  

235

20867

5157

15869

3515

6542

1319

196447

56135

388223

116238

 

518

18440

9200

15663

2677

6555

975

145585

38519

299284

82624

 

800

14948

3013

14112

2122

5968

750

105415

25929

221892

56195

 

1081

11082

2046

11644

1592

4942

562

74236

17100

160512

38259

 

1365

7683

1281

8453

1112

3658

399

51462

11625

115329

26194

 

1650

4967

839

5952

808

2582

283

36505

8297

83812

18706