Skip to main content

Table 1 Case study of the performance of TAGOPSIN

From: TAGOPSIN: collating taxa-specific gene and protein functional and structural information

Query H. sapiens A. thaliana S. cerevisiae E. coli Streptococcus Human coronavirus Human papillomavirus
No of organisms
 From Taxonomy 7 5 351 3 365 5 039 27 259
 From Nucleotide 0 0 0 1 142 637 2 33
No of curated genomes/chromosomes 24 5 16 1 274 796 4 65
No of CDSs 112 702 48 147 5 989 6 066 761 1 540 276 30 442
No of proteins 17 261 15 383 5 876 17 777 9 037 21 127
No of protein isoforms 16 007 2 196 29 6 0 0 0
No of GO terms 17 597 6 825 6 023 2 614 815 47 54
No of protein domain families 6 108 3 125 3 222 2 281 819 36 12
No of protein 3D structures 39 885 1 472 4 565 1 893 232 27 36
Approx. runtime (hours) 147.83 5.31 3.03 72.35 19.52 3.01 3.18
  1. The statistics of the datasets built for Homo sapiens, Escherichia coli and five other organisms, as indicated, is classified by entity type. Estimated runtimes are on a 64-bit Linux Ubuntu system with 4.7 GiB of RAM and an Intel\(^{\circledR }\) CoreTM i7-6500U CPU @ 2.50 GHz processor. Actual runtimes may vary depending on Internet bandwidth, volume of data to process, and hardware specifications. Here an average bandwidth of 1.1 MB/s was used. The times indicated for H. sapiens and E. coli include time to download and decompress standard data files (updated July/August 2020)