Skip to main content

Table 4 The number of similar Trinity transcripts between original Inchworm and MapReduce-Inchworm using the mouse RNA-seq data [22]

From: K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

cutoff for transcript similarity (%) number of similar transcripts
100 47,816
99 57,926
95 64,109
90 67,178
85 69,002
80 70,398
75 71,390
70 72,285
  1. Two sets of transcripts from original Inchworm and MapReduce-Inchworm were compared using BLAT [42]; Transcripts from original Inchworm was used as target and transcripts from MapReduce-Inchworm was used as query for input parameters to BLAT. The perl script blat_top_hit_extractor.pl, included in Trinity pipeline, was used to extract the most top hit for each transcript in query against target. The first column refers to the cutoff of transcript similarity, which was quantified using two similarity score defined as follows: 1) 1 - (query_sequence_size - number_of_matching_bases)/query_sequence_size 2) 1 - (target_sequence_size - number_of_matching_bases)/target_sequence_size. If these two similarity scores between two transcripts from both methods were greater than or equal to the cutoff value, those were considered as similar transcripts. The second column refers to the number of similar transcripts between original and MapReduce-Inchworm according to the cutoff value. Note the total number of transcripts from both methods can be found in Table 3