Skip to main content

Table 2 Number of benchmark proteins in orthologous pairs recovered perfectly or as essentially complete by coding regions assembled by different methods

From: SAUTE: sequence assembly using target enrichment

Read

Target

Ortho pairs

Perfect

Essentially complete

set

species

Count

Median

rnaSP

Trinity

SPAln

Clust

SP10

rnaSP

Trinity

SPAln

Clust

SP10

Corn

Z. marina

2710

57.78

537

539

213

312

548

1434

1450

1037

916

1344

 

A. tauschii

2888

77.94

550

555

410

510

664

1491

1514

1360

1395

1552

 

S. bicolor

2871

92.60

549

554

463

572

666

1488

1507

1399

1464

1549

Thale

P. axillaris

2287

61.56

1616

0

598

1216

1598

1902

0

1364

1503

1933

cress

B. rapa

2312

85.18

1636

0

1255

1405

1870

1924

0

1879

1697

2107

 

A. thaliana

2317

100.00

1640

0

1689

1375

1951

1928

0

1991

1668

2122

Worm

T. spiralis

2194

40.92

1588

0

156

646

1046

1940

0

638

867

1269

 

C. briggsae

3112

86.46

2230

0

1216

2016

2436

2707

0

2336

2486

2731

 

C. elegans

3130

100.00

2244

0

2276

2176

2640

2724

0

2720

2624

2851

Mouse

P. cinereus

9005

77.04

2824

2880

1954

2185

3308

3790

3743

3393

3202

4274

 

H. sapiens

9109

86.73

2831

2889

2212

2568

3469

3803

3753

3537

3640

4356

 

M. musculus

9177

100.00

2858

2914

2862

3001

3686

3832

3781

3850

3982

4460

Human

P. cinereus

8984

78.80

2796

3107

2144

2206

3340

4383

4645

3995

3551

4792

 

M. Musculus

9109

86.73

2824

3134

2372

2401

3496

4425

4684

4168

3784

4875

 

H. sapiens

9157

100.00

2839

3152

2994

2601

3760

4448

4708

4630

4032

5013

  1. SAUTE_PROT low (SP10) with maximum of 10 variants reported per graph, SPAligner (SPAln), and CLUSTER (Clust) used proteins from the target species for assembling the read set. rnaSPAdes (rnaSP) and Trinity are de-novo assemblers. The median percent identity between orthologous protein pairs varies from 40.92 to 100%. In each row, count for the method that finds the largest number of proteins as perfect or as essentially complete are in bold