Skip to main content

Table 1 Location-wise distribution of full-length and Pfam-mapped protein sequences

From: Mining for class-specific motifs in protein sequence classification

 

SL dataset

Pfam dataset

Organelle

Code

# of protein sequences

# of Pfam sequences

Cytoskeleton

CSK

259

200

Cytoplasm

CYT

3334

2809

Endoplasmic Reticulum

END

1016

884

Extracellular

EXC

8666

6393

Golgi apparatus

GOL

291

248

Lysosome

LYS

159

138

Mitochrondria

MIT

2760

2383

Nuclear

NUC

5104

4221

Plasma membrane

PLA

6852

6155

Perixosome

POX

212

190

  1. SL- Subcellular Localization; Pfam- Protein family database.