Skip to main content

Table 3 Time and memory usage of different versions of align-families.py, using different multiple sequence alignment algorithms

From: Family reunion via error correction: an efficient analysis of duplex sequencing data

versionalignertime/ memoryCPUs     
   12481632
0.4MAFFTtime (seconds)28,63815,7698912517330381747
2.15MAFFT 28,75414,282707934631686854
 Kalign2 47311777945600381246
0.4MAFFTmemory (MB)23,70412,2996622375522841602
2.15MAFFT 23,92712,5996850398525411810
 Kalign2 24,64823,22012,408666837812327
  1. At low levels of parallelization, Kalign2 made the process over 8 times faster, with a memory usage less than twice as much as MAFFT. The new algorithm sped up the tool between 1 and 2.05x. Naturally, at higher levels of parallelization, the reduction of the job queue bottleneck made more of a difference. Memory usage appeared to not be affected, which is expected due to the small size of the job queue compared with the rest of memory usage. To attempt to disentangle the effects of the job queueing algorithm from all the other changes between 0.4 and 2.15, the two versions were compared with all parameters set as similarly as possible. In both cases, the number of --processes was set to 32 and MAFFT was used as the aligner. Crucially, the --queue-size for the 2.15 version was set to be 32, the same as the number of --processes. This approximates the bottleneck in the pre-2.0 version of Du Novo’s job queueing algorithm. Comparing the median of 3 trials of each, the wallclock time of 2.15 was 27% higher than that of 0.4. This could be because of the higher overhead in the more complicated parallelization algorithm, or other changes between 0.4 and 2.15