Skip to main content

Table 2 The performance of RAPSearch as compared to BLAST.

From: RAPSearch: a fast protein similarity search tool for short reads

Test datasets

Number of reads

Read length (nt)

Reads with homologs (by BLAST)

Running time (CPU hours)

Reads with homologs found in the IMG protein databasea

    

BLAST

RAPSearch

Overlap g

BLAST-only

RAPSearch-only

SRR020796 (2%) b

1,164,805

72

19%e

1,590 f

16.8

218,134 (98.4%)

2,832 (1.3%)

745 (0.3%)

4440037c

188,445

100

5%

154

3.5

9,791 (95.3%)

270 (2.6%)

213 (2.1%)

TS50d

622,554

200

75%

1000

54.3

459,509 (97.9%)

7,339 (1.5%)

2683 (0.6%)

TS28d

312,665

329

75%

900

45.7

225,953 (96%)

7,511 (3.2%)

1,222 (0.5%)

  1. a: the reads are searched against the 98% non-redundant dataset of proteins collected in the IMG database with a total of 4,054,694 proteins, and an E-value cutoff of 1e-1 was used to define homologs (less stringent) for the Illumina reads (the SRR020796 dataset) considering the reads are extremely short, and an E-value cutoff of 1e-3 for the rest. b: the dataset was downloaded from the NCBI website (from the rumen microbiota response study), and only 2% of the reads were used for testing because the BLAST search of the entire dataset will require a computer farm. c: dataset was from the nine biomes project [7]. d: TS50 (4440615.3) and TS28 (4440613.3) datasets were from the Twin Study [24]. 4440037, TS50 and TS28 datasets were downloaded from the MG-RAST server. e: the percentage of reads that have homologs in the IMG database as identified by BLAST. f: the running time was estimated based on the running time of BLAST search of a small fraction of the original dataset on the same computer (Intel Xeon 2.93 GHz) on which RAPSearch was carried out for comparison purposes; the actual BLAST search of the original datasets was carried out on BigRed, a computer cluster maintained at Indiana University. g: the Overlap column lists the total number of reads that have homologs in the IMG database detected by both BLAST and RAPSearch, while the total number of reads that have homologs in the IMG database detected by BLAST or RAPSearch only are listed in the BLAST-only and RAPSearch-only column, respectively.