Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets

Figure 1

Basic computational pipeline for sequence clustering. Sequence clustering begins with a sampling of raw sequence reads, stripped of duplicates. Pairwise sequence alignments and genetic distances are calculated over the entire sample. For this study, the Needleman-Wunsch global alignment algorithm was employed. Next, the calculated distances are passed to multidimensional scaling and pairwise clustering algorithms, producing Cartesian coordinates and clustering information which can be used to visualize the sequence space. Both the distance calculation and multidimensional scaling are order O(N2), where N is the number of sequences, making the pipeline computationally expensive as the sample grows very large.

Back to article page