Skip to main content
Fig. 7 | BMC Bioinformatics

Fig. 7

From: Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly

Fig. 7

mmseqs2 clustering in eukrhythmic collapses redundant sequences and highlights between-assembler differences in fidelity of recovered proteins to designer proteins. Panel A The total number of contigs per cluster as separated by the assemblers from which they were recovered. rnaSPAdes produced the highest number of contigs overall independently, which was a higher overall number than the contigs which were produced by all four assemblers (far right boxplot in panel A). Panel B The proportion of mmseqs2 clusters of proteins that did not cluster with proteins from the designer assembly as a function of the number of assemblers represented within the cluster. Protein products supported by assembly by all four assemblers were least likely to be “spurious”, or not recoverable from the designer assembly. Panel C Number of contigs that had no protein ORF assigned to them via TransDecoder (black) as compared to contigs with proteins having BLAST matches according to some percentage identity. The first stacked bar corresponds to contigs that both had a detected ORF and a BLAST match with percentage identity >75% at an e-value threshold of \(10^{-2}\). Additional file 1: Fig. S11.4 shows the contigs from the designer assembly which originally did not have an identified ORF

Back to article page