Detailed memory usage and CPU profiling of metagenome assembly with metaSPAdes
Modern metagenome assemblers take a “multi-k” approach to assemble species of various abundances in the microbial community. For example, the metaSPAdes pipeline first constructs an initial de Bruijn graph, then iteratively simplifies it one k-mer size at a time. Then it transforms the graph into the assembly graph, followed by graph simplification and graph traversal to obtain contigs [13]. To profile its use of computational resources, we ran metaSPAdes on the 233 GB Wastewater metagenome dataset and record its CPU, memory utilization. As expected, each phase of de Bruijn graph construction is computationally and memory intensive, as well as the final assembly graph phase (Fig. 1). In any phase where the memory consumption of metaSPAdes is greater than the available DRAM, an Out-of-Memory (OOM) failure occurs. To ensure the success of the pipeline, a workstation must have sufficient memory that is larger than the peak (maximum) memory.
To test whether or not enabling PMem leads to changes in CPU or memory utilization, we used the Memory Machine to configure 32 GB of DRAM while supplying the rest with PMem. We saw an increase in peak memory consumption from (250 GB to 370 GB), which is likely due to the overhead incurred by the Memory Machine software, as the page size was increased from 4 KB to 2 MB. The general pattern of memory utilization was the same. Furthermore, CPU utilization increases due to additional hot-swapping options (Fig. 1).
PMem can effectively satisfy the requirement of metaSPAdes for large amount of DRAM at a small cost of running speed
The maximum memory consumption to run metaSPAdes on the Wastewater Metagenome dataset with only DRAM was approximately 250 GB, and the run was finished in 26.3 hours. To test whether or not PMem can be used to substitute DRAM for metaSPAdes, we used the Memory Machine to configure decreasing amounts of DRAM and measured PMem usage and running time. We found that PMem can substitute DRAM in all tested memory configurations, resulting in savings in DRAM of up to 100% (Fig. 2A, top). Meanwhile, increasing DRAM savings also led to a longer execution time of metaSPAdes due to the slower performance of PMem compared to DRAM. As shown in Fig. 2A, top, substituting up to 30% of the total memory with PMem led to no appreciable slowdown, but after that the slowdown became apparent. When 100% PMem is used to replace DRAM, the run time of the metaSPAdes on the Wastewater Metagenome dataset was approximately twice as long (\(2.17\times\)). In all configurations, the total amount of memory consumption (DRAM + PMEM) is largely the same at 372 GB, except in the DRAM-only case (250 GB).
The longer running time in PMem is probably caused by a combination of slower PMem writing performance and the cost of swapping data between DRAM and PMem. Memory Machine does provide a software development kit (SDK) to precisely allocate memory on DRAM or PMem based on each application to reduce the data migration cost, but doing so would require additional software engineering effort.
Running metaSPAdes on the larger 1.2 TB Antarctic Lake Metagenome Dataset with only DRAM led to an OOM error, as the required memory exceeds the total amount of DRAM (768 GB). When running with Memory Machine with 550 GB DRAM tiering, the peak memory reached 1.8 TB, and the run was completed successfully in 11.15 days.
PMem supports other metagenome assemblers including MetaHipMer2 and MEGAHIT
In addition to metaSPAdes, we also tested MetaHipMer2 [6] and MEGAHIT [9] on the Wastewater Metagenome dataset. As shown in (Fig. 2A, bottom), with 100% PMem, the run time of MEGAHIT on the Wastewater Metagenome dataset was \(1.75\times\) that on DRAM, and the run time increases as PMem to total memory ratio increases. In contrast, the run time of MetaHipMer2 on the Wastewater Metagenome dataset for different PMem ratios were roughly the same: 807.6 (min), 792.97 (min), 788.45 (min), and 675.85 (min) for 97%, 98%, 99%, and 100% PMem to total memory ratio, respectively. The slightly better performance with more Pmem is probably due to the slightly decreased data migration cost. When the amount of DRAM is a small fraction of total memory (MetaHipMer2 used 4.16 TB of memory), more DRAM leads to more data migration.
MetaHipMer2 consumed the most memory (peaked at 4.2 TB), while it took the least time (647 min). MEGAHIT used the least memory (peaked at 124 GB), but took almost twice as long and finished the assembly in 1191 min. metaSPAdes used a peak memory of 372 GB and took the longest time (3423 min). The comparison of the three assemblers is shown in Fig. 2B.
We did not attempt to compare the three assemblers using the Antarctic Lake Metagenome Dataset because MetaHipMer2 would likely run into OOM. It required 13 TB of RAM distributed across multiple nodes on a supercomputer to complete this dataset in a previous experiment.