Skip to main content

Table 2 Accuracy assessment of MapReduce-Inchworm compared to the original Inchworm using three simulated read datasets for mouse RNA-Seq

From: K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity

     Number of pair-end reads
     100 M 150 M 200 M
REF-EVAL Contig Recall Original 0.3098 0.3122 0.3126
MapReduce 0.3274 0.3308 0.3314
Precision Original 0.3258 0.3283 0.3280
MapReduce 0.3389 0.3419 0.3422
Nucleotide Recall Original 0.9752 0.9764 0.9779
MapReduce 0.9763 0.9783 0.9793
Precision Original 0.9847 0.9845 0.9845
MapReduce 0.9870 0.9869 0.9869
   N1 Original 32,712 39,273 43,862
   MapReduce 33,687 40,452 45,344
   N2 Original 16 20 26
   MapReduce 4 12 8
  1. Statistics from the REF-EVAL component of DENONATE [41], for three simulated read datasets. Recall is the fraction of reference elements that are correctly recovered by an assembly. Precision is the fraction of assembly elements that correctly recover a reference element. At the Contig level, a 99% alignment cutoff has been used to identify a recovered transcript (left-hand bars in Fig. 3). Original refers to the results of Trinity run with the original version of Inchworm. MapReduce refers to the results of Trinity run with the MapReduce-Inchworm method presented here. Also shown are the N1 and N2 statistics, as given by the script FL_trans_analysis_pipeline.pl. N1 represents the total number of assembled transcripts that give full-length matches to the reference. N2 represents the number of fused transcripts. For comparison, there are 80,867 reference transcripts