Skip to main content

Table 1 Comparison of search times for standard X!Tandem and Hydra

From: Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Mode

Scans

Nodes (Cores)

DB Name

Proteins (K)

Peptides (M)

Dot product (M)

Tim (min)

Hadoop

16000

43 (344)

ecoli

5.4

1.3

164

9.8

Hadoop

256000

43 (344)

ecoli

5.4

1.3

23395

338

Tandem

4663

1 (4)

human

222

168

477

29

Hadoop

4663

43 (344)

human

222

168

477

4.7

Tandem

184880

1 (4)

nr

4370

692

3291

2280

Hadoop

184880

43 (344)

nr

4370

692

3291

15.4

Tandem

184880

1 (4)

nr

16392

1248

13167

8410

Hadoop

184880

43 (344)

nr

16392

1248

13167

52.7

  1. Example of comparison of run time for different complexities of search using the standard X!Tandem implementation and Hydra. The scans columns gives the number of spectra searched against, the Nodes column is the number of resources used (the first number of the number of machines, the second number is the number of total cores), the database name is the species database used, the Database Proteins is the number of proteins in the database, the dot product is the number of actual calculations. The times show that Hydra, unlike X!Tandem, is able to scale nearly linearly with the size of the problem. However, due to the startup costs associated with Hydra it is not suited for small searches. The PRIDE accession numbers for the spectra used were 10295 and 7962.