Skip to main content

Table 4 Information about datasets

From: KEGG orthology prediction of bacterial proteins using natural language processing

Data name

BD

RD

Number of proteins (in k)

16,874

623

Number of amino acids (in m)

5121

296

Disk space (in MB)

6550

335

  1. Units: number of proteins in thousands (k), of amino acids in millions (m), and of disk space in MB (uncompressed storage as text)