Skip to main content

Table 1 Overview of the UniProt data sets

From: Infrastructure for the life sciences: design and implementation of the UniProt website

Data set Description References Entries Path Formats
UniProtKB Protein sequence and annotation data UniRef, UniParc, Literature citations, Taxonomy, Keywords 6.4 M /uniprot/ Plain text, FASTA, (GFF), XML, RDF
UniRef Clusters of proteins with similar sequences UniProtKB, UniParc, Taxonomy 12.3 M /uniref/ FASTA, XML, RDF
UniParc Protein sequence archive UniProtKB, Taxonomy 17.0 M /uniparc/ FASTA, XML, RDF
Literature citations Literature cited in UniProtKB (based on PubMed)   0.4 M /citations/ RDF
Taxonomy Taxonomy data (based on NCBI taxonomy)   0.5 M /taxonomy/ RDF, (Tab-delimited)
Keywords Keywords used in UniProtKB   1K /keywords/ RDF, (OBO)
Subcellular locations Subcellular location terms used in UniProtKB   375 /locations/ RDF, (OBO)