Mode | Scans | Nodes (Cores) | DB Name | Proteins (K) | Peptides (M) | Dot product (M) | Tim (min) |
---|
Hadoop | 16000 | 43 (344) | ecoli | 5.4 | 1.3 | 164 | 9.8 |
Hadoop | 256000 | 43 (344) | ecoli | 5.4 | 1.3 | 23395 | 338 |
Tandem | 4663 | 1 (4) | human | 222 | 168 | 477 | 29 |
Hadoop | 4663 | 43 (344) | human | 222 | 168 | 477 | 4.7 |
Tandem | 184880 | 1 (4) | nr | 4370 | 692 | 3291 | 2280 |
Hadoop | 184880 | 43 (344) | nr | 4370 | 692 | 3291 | 15.4 |
Tandem | 184880 | 1 (4) | nr | 16392 | 1248 | 13167 | 8410 |
Hadoop | 184880 | 43 (344) | nr | 16392 | 1248 | 13167 | 52.7 |
- Example of comparison of run time for different complexities of search using the standard X!Tandem implementation and Hydra. The scans columns gives the number of spectra searched against, the Nodes column is the number of resources used (the first number of the number of machines, the second number is the number of total cores), the database name is the species database used, the Database Proteins is the number of proteins in the database, the dot product is the number of actual calculations. The times show that Hydra, unlike X!Tandem, is able to scale nearly linearly with the size of the problem. However, due to the startup costs associated with Hydra it is not suited for small searches. The PRIDE accession numbers for the spectra used were 10295 and 7962.