Skip to main content

Table 1 Case study of the performance of TAGOPSIN

From: TAGOPSIN: collating taxa-specific gene and protein functional and structural information

Query

H. sapiens

A. thaliana

S. cerevisiae

E. coli

Streptococcus

Human coronavirus

Human papillomavirus

No of organisms

 From Taxonomy

7

5

351

3 365

5 039

27

259

 From Nucleotide

0

0

0

1 142

637

2

33

No of curated genomes/chromosomes

24

5

16

1 274

796

4

65

No of CDSs

112 702

48 147

5 989

6 066 761

1 540 276

30

442

No of proteins

17 261

15 383

5 876

17 777

9 037

21

127

No of protein isoforms

16 007

2 196

29

6

0

0

0

No of GO terms

17 597

6 825

6 023

2 614

815

47

54

No of protein domain families

6 108

3 125

3 222

2 281

819

36

12

No of protein 3D structures

39 885

1 472

4 565

1 893

232

27

36

Approx. runtime (hours)

147.83

5.31

3.03

72.35

19.52

3.01

3.18

  1. The statistics of the datasets built for Homo sapiens, Escherichia coli and five other organisms, as indicated, is classified by entity type. Estimated runtimes are on a 64-bit Linux Ubuntu system with 4.7 GiB of RAM and an Intel\(^{\circledR }\) CoreTM i7-6500U CPU @ 2.50 GHz processor. Actual runtimes may vary depending on Internet bandwidth, volume of data to process, and hardware specifications. Here an average bandwidth of 1.1 MB/s was used. The times indicated for H. sapiens and E. coli include time to download and decompress standard data files (updated July/August 2020)