Skip to main content

Table 1 Overview of the UniProt data sets

From: Infrastructure for the life sciences: design and implementation of the UniProt website

Data set

Description

References

Entries

Path

Formats

UniProtKB

Protein sequence and annotation data

UniRef, UniParc, Literature citations, Taxonomy, Keywords

6.4 M

/uniprot/

Plain text, FASTA, (GFF), XML, RDF

UniRef

Clusters of proteins with similar sequences

UniProtKB, UniParc, Taxonomy

12.3 M

/uniref/

FASTA, XML, RDF

UniParc

Protein sequence archive

UniProtKB, Taxonomy

17.0 M

/uniparc/

FASTA, XML, RDF

Literature citations

Literature cited in UniProtKB (based on PubMed)

 

0.4 M

/citations/

RDF

Taxonomy

Taxonomy data (based on NCBI taxonomy)

 

0.5 M

/taxonomy/

RDF, (Tab-delimited)

Keywords

Keywords used in UniProtKB

 

1K

/keywords/

RDF, (OBO)

Subcellular locations

Subcellular location terms used in UniProtKB

 

375

/locations/

RDF, (OBO)