Skip to main content

Table 4 Average frequency of gaps in the multiple sequence alignments

From: DNACLUST: accurate and efficient clustering of phylogenetic marker genes

 

0.99

0.97

0.95

DNACLUST

0.016

0.071

0.103

UCLUST

0.071

0.117

0.146

  1. The average frequency of gaps in multiple sequence alignments for sampled clusters at various similarity thresholds. For each MSA, the frequency of gaps is the number of gaps divided by the total number of characters in the MSA. Gaps before the beginning and after the end of each sequence are excluded. Note that since an insertion in one sequence results in a gap in all other sequences in the MSA, the ratio of gaps may be higher than the clustering threshold. Since the sequence identity measure used by UCLUST does not take gaps into account the number of gaps in UCLUST MSAs are higher than the gaps in DNACLUST MSAs, specially at more stringent thresholds.