Skip to main content

Table 1 Serine centered phosphopetide sequences of 21 length, n-grams of varying size (6 to 21 mer) and references from the datasets in each kingdom/phylum and species under study in the training set and the test set. For each species in the training set two datasets were used, and hence, two numbers are given. There were many more n-grams than phospho-sites, due to the window of phospho-sites (21) and varying length of n-grams within the sites

From: Comparison of phosphorylation patterns across eukaryotes by discriminative N-gram analysis

Training set

Number of phosphopeptides

Number of n-grams

References

Kingdom

Phylum

Arabidopsis thaliana

2903 and 4270

349724 and 527397

[40, 41]

Plantae

 

Homo sapiens

1972 and 4075

200661 and 454563

[50, 51]

Animalia

Chordata

Drosophila melanogaster

6363 and 6362

671933 and 596922

[52, 53]

Animalia

Arthropoda

Saccharomyces cerevisiae

6343 and 1178

712345 and 116095

[54, 55]

Fungi

Ascomycota

Plasmodium falciparum

744 and 1048

93799 and 137899

[56, 57]

Chromalveolata

Apicomplexa

Oryza sativa

447

50007

[58]

Plantae

 

Mus musculus

7372

811443

[59, 60]

Animalia

Chordata

Caenorhabditis elegans

4003

436055

[61]

Animalia

Nematoda

Schizosaccharomyces pombe

1362

155639

[62]

Fungi

Ascomycota

Toxoplasma gondii

1388

172714

[56]

Chromalveolata

Apicomplexa