Skip to main content

Table 2 Basic statistics on simulated (i.e., true) MSAs and reconstructed MSAs

From: Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map

 

Dataset

Subjects counted

Aligner a

12 primates

15 mammls

9 fast-evolving mammals

[True MSAs]

 1. MSAs

10,000

10,000

3000

 2. Gapped segments

404,394

966,017

309,425

 3. Gapped segments with long indels b (%) c

446 (0.1%)

9503 (1.0%)

5784 (1.9%)

[Reconstructed MSAs]

 4. All erroneous segments

MAFFT-1 a

145,002

320,455

29,781

MAFFT-i a

139,701

352,482

58,372

Prank

135,602

374,087

39,315

 5. True gapped segments in item 4 (%) c

MAFFT-1

182,712 (45.2%)

836,766 (86.6%)

305,756 (98.8%)

MAFFT-i

173,701 (43.0%)

813,591 (84.2%)

300,879 (97.2%)

Prank

150,618 (37.2%)

722,767 (74.8%)

300,359 (97.1%)

 6. Erroneous segments without long indels d

MAFFT-1

144,422

308,923

24,239

MAFFT-i

139,144

340,865

51,912

Prank

135,097

363,961

34,907

 7. True gapped segments in item 6 (%) c

MAFFT-1

181,868 (45.0%)

781,784 (80.9%)

171,008 (55.3%)

MAFFT-i

172,915 (42.8%)

772,818 (80.0%)

234,311 (75.7%)

Prank

150,010 (37.1%)

676,950 (70.1%)

175,967 (56.9%)

  1. aThe aligner labels, “MAFT-1” and “MAFFT-i” stand for E-INS-1 (a progressive mode) and E-INS-i (an accuracy-oriented iterative mode), respectively, of MAFFT
  2. bEach of these segments involves at least one apparent indel longer than 100 bases
  3. cThese percentages are relative to the number of all true gapped segments (in item 2) in the same column
  4. dIn these segments, neither the true MSAs nor the reconstructed MSAs involve any apparent indels longer than 100 bases each