AliGROOVE – visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support
© Kück et al.; licensee BioMed Central Ltd. 2014
Received: 8 November 2013
Accepted: 14 August 2014
Published: 30 August 2014
Masking of multiple sequence alignment blocks has become a powerful method to enhance the tree-likeness of the underlying data. However, existing masking approaches are insensitive to heterogeneous sequence divergence which can mislead tree reconstructions. We present AliGROOVE, a new method based on a sliding window and a Monte Carlo resampling approach, that visualizes heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa within a multiple sequence alignment and tags suspicious branches on a given tree.
We used simulated multiple sequence alignments to show that the extent of alignment ambiguity in pairwise sequence comparison is correlated with the frequency of misplaced taxa in tree reconstructions. The approach implemented in AliGROOVE allows to detect nodes within a tree that are supported despite the absence of phylogenetic signal in the underlying multiple sequence alignment. We show that AliGROOVE equally well detects heterogeneous sequence divergence in a case study based on an empirical data set of mitochondrial DNA sequences of chelicerates.
The AliGROOVE approach has the potential to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa in a multiple sequence alignment. It further allows to evaluate the reliability of node support in a novel way.
Alignment masking as a measure of reducing noise in sequence alignments is regularly applied in phylogenetics. The idea behind the concept of masking blocks of sequence alignments is the reduction of the unpredictable influence of substitution saturation and/or ambiguously aligned blocks of sequence alignments on subsequent tree reconstructions [1–8] by increasing the tree-likeness of the data. Simulations and analyses of alignment masking of empirical data corroborate the correctness of this idea. Currently, software packages mask complete blocks of multiple sequence alignments applying either arbitrarily chosen thresholds of sequence variability within alignment columns (e.g. software Gblocks [1, 2] and REAP ), or automatically adjusted thresholds depending on the input alignment (e.g. trimAl  and BMGE ), or applying a sliding window approach to identify blocks of predominantly high alignment ambiguity (ALISCORE [5, 7]). All methods exclude complete alignment blocks instead of sequence subsets thus masking also potentially valuable data for subsets of taxa.
Due to their design all masking methods are relatively insensitive to heterogeneous sequence divergence of single taxa. This is an important deficiency of masking methods, because heterogeneous sequence divergence can cause strong biases in tree reconstructions, for example long branch effects or the misplacement of rogue taxa. Therefore, a method which can visualize heterogeneous sequence divergence or alignment ambiguity related to single taxa or subsets of taxa would be a useful complement to currently used masking approaches. It offers the chance to identify taxa which are potentially misplaced in trees and reduce the tree-likeness of the data.
For this purpose, we developed AliGROOVE, a new tool to visualize the extent of sequence similarity and alignment ambiguity in pairwise sequence comparisons derived from a multiple sequence alignment. AliGROOVE can help to detect strongly derived sequences that have the potential to bias tree reconstructions and node support. We implemented an adaptation of the recently published ALISCORE masking algorithm [5, 7] which has been successfully tested in simulations and on empirical data [5, 7, 8]. Using a simple match/mismatch scoring for nucleotide data and a BLOSUM62 scoring matrix for amino acid data ALISCORE uses a Monte Carlo resampling within a sliding window to generate profiles of pairwise sequence similarity for all pairwise sequence comparisons. AliGROOVE summarizes site scores of these profiles normalized over the whole alignment length for each pairwise comparison. The obtained scoring values between sequences are translated into a similarity matrix and thus deliver information on the extent of taxonomically heterogeneous alignment ambiguity or sequence similarity within a multiple sequence alignment.
We used simulated data to investigate if our application of the algorithm is able to detect ambiguously aligned taxa or groups of taxa and if the obtained sequence similarity scores can be used to tag unreliable nodes. For that purpose we tested AliGROOVE on data sets with and without indel events whereby tests on data sets with indel events are performed on correct and on realigned data sets that deviate from the true alignment. Additionally, we applied AliGROOVE on an empirical data set comprising five mitochondrial genes of 53 chelicerate ingroup taxa and eight myriapod outgroup taxa. With both the simulated and empirical data sets we also tested the potential of the approach to illustrate heterogeneous tree-likeness among data blocks within an alignment.
Identification of sequence similarity/scoring
The algorithm of AliGROOVE is based on the scoring scheme of ALISCORE [5, 7] which compares pairs of amino acid/DNA sequences for random similarity within a sliding window. In short, first, the observed mismatch within the sliding window is scored. This mismatch score is then compared with mismatch scores of the same window size generated by permutations of character states within the sliding window and a predefined sequence neighborhood. If the observed score is better than 95% of the score of all generated permutations, it is considered non-random, otherwise indistinguishable from random similarity. Each position within the sliding window receives a positive sign if the observed score was significantly better than scores of random sequence similarity, or if not, a negative sign. The number of single signs for each alignment position corresponds to the size of the sliding window. For each position signs are summed up and normalized by the sliding window size. A profile of sequence similarity between two sequences will thus show sections in which these two sequences might show non-random similarity indicated by a positive sum of signs and sections of random similarity expressed by a negative sum of signs for each position. Now, for each profile the AliGROOVE algorithm calculates an arithmetic mean of profile signs over all sites excluding globally invariant sites within the alignment and records these values in a matrix for a given set of sequences. The entries in this similarity matrix express the average amount of non-random versus random similarity in pairwise comparisons and can thus illustrate heterogeneous signal in the data.
The algorithm is based on either match/mismatch scores for nucleotide sequences or on amino acid substitution matrices (BLOSUM62, PAM250, PAM500) to score amino acid matches/mismatches. This scoring regime turned out to be efficient in alignment masking [5, 7, 8, 10–18].
Identification of suspicious branches
AliGROOVE pairwise similarity scores can be used to tag potentially unreliable relationships in a pre-defined tree. Potentially unreliable relationships can be caused by extensive substitution saturation or extensive alignment ambiguity both causing long branches in a tree which can occurr in inner and terminal branches.
The calculation of the mean pairwise similarity score treats all pairwise comparisons as independent replicates. This assumption is not justified in every case. For example, taxa C and E might be closely related and S AC and S AE do not represent fully independent replicates.
Testing the performance with simulated data (Setup A & B)
In setup A (Figure 1a), we simulated data with increasing terminal branch lengths of two unrelated taxa. For increasing branch length conditions the similarity scores between sister taxa correlate with tree reconstruction success ((L1,S1) & (L2,S2) in Figure 2). The mean similarity scores for internal branches are as well correlated with the tree reconstruction success. Negative mean similarity scores are directly correlated with tree reconstruction errors. Using AliGROOVE with the tree tagging option to project the observed pairwise sequence similarity scores on a provided guiding tree, the internal branch connecting two groups of taxa is tagged as suspicious (red colored) when the observed similarity score of this branch receives a negative value. A complete overview of all results is given in the Additional files 1 and 2.
In setup B, we simulated multiple sequence alignments with two internal nodes using 6-taxon trees (Figure 1b). The results lead again to the conclusion, that there is a correlation between the similarity score of the two long internal branches and tree reconstructions, which were predominantly incorrect in case of negative scores (B L2≥ 1.1) (Figure 3). For example, in setup B taxa L1 and L2 are connected to the remaining taxa via two long internal branches. With increasing internal branch lengths taxa L1 and L2 occur more often as sister group instead of being paraphyletic in relation to remaining taxa. In this case, taxa L1 and L2 will share character states which have been lost in other taxa inducing a wrong sistergroup relationship based on plesiomorphies. By using the ALIGROOVE approach with the tree tagging option, correctly reconstructed short internal branches assigning taxa L1 and L2 as paraphyletic groups have been tagged as non-suspicious, whereas incorrectly resolved short internal branches have been identified as suspicious. Whenever branch lengths are balanced, tree reconstructions have been continuously successful, which is also reflected by the similarity scores obtained for the alignments of these topologies (see Figure 3). All AliGROOVE results of the 6-taxon setup are shown in the Additional file 3.
Testing the performance on simulated data setup C
Testing the performance on simulated data setup D
With the AliGROOVE algorithm, the highly divergent seven nucleotide sequences did not consistently cause negative scores in all pairwise sequence comparisons if branch lengths of BL2 were set to 0.5, but got almost always negative scores if BL2 were set to ≥ 0.9 and data blocks to >1000 sites (Additional file 5). With amino acid datsets, the seven highly divergent sequences got only positive scores in all pairwise sequence comparisons, independently of the tree reconstruction success (Additional file 6).
The tree tagging algorithm tagged all highly divergent nucleotide sequences and associated long branches as unreliable for branch lengths BL2 ≥ 0.9, and tagged all incorrectly placed nucleotide sequences and associated long branches as unreliable if sequence length of nucleotide data blocks was set to 2500 sites and branch lengths BL2 = 0.5. In case of shorter data blocks and branch lengths set to BL2 = 0.5, tagging was less consistently correct (Additional file 5). For amino acid datsets, non of the seven highly divergent sequences and associated long branches were tagged as unreliable.
These results also apply to the concatenated nucleotide and amino acid supermatrix data sets which consist of all data blocks. The AliGROOVE pairwise distance similarity matrix of the concatenated nucleotide supermatrix shows the seven highly divergent sequences mostly red colored, however despite being misplaced on the tree, the branches associated with this seven highly divergent sequences are not consistently tagged as suspicious (Figure 8). With the amino acid supermatrix, the highly divergent sequences are not highlighted in the distance matrix and branches associated with these sequences are not tagged as suspicious, despite being wrong. For both nucleotide and amino acid supermatrices the exclusion of the seven divergent sequences led to correct topologies (Additional file 7).
In general, the AliGROOVE tagging algorithm is optimistic concerning the reliability of branching patterns and never tags a branch as unreliable if in fact correct.
Testing the performance with empirical mitochondrial data
It has been shown that traditional masking of entire sequence alignment blocks can improve the signal-to-noise ratio or tree-likeness in sequence alignments. Here, we show that the sliding window approach as it is used in ALISCORE [5, 7] can be modified to identify single taxa or subsets of taxa which show predominantly randomized sequence similarity in comparison with other taxa (Figure 9). Masking of these taxa can also improve the signal-to-noise ratio in sequence alignments. The approach implemented in AliGROOVE can be used to test the reliabilities of reconstructed topologies and to identify unreliable node support in a user specified tree (Figures 2, 35, 6, 8, 9, Additional files 1, 2, 3, 4, 5, 6). This possibility offers a convenient way of studying node support in a given tree and multiple sequence alignment complementary to conventional bootstrap analyses. The identification of taxonomic subsets offers the possibility to mask only taxonomic sub-blocks of multiple sequence alignments that clearly contain the least signal due to alignment ambiguity, sequence saturation or excessive divergence.
Results of the analyses of simulated nucleotide data sets with indel events and/or missing data (coded as gaps) and correct sequence alignment showed that the AliGROOVE approach correctly identified excessively divergent sequences with treating indels as fifth character state (Figure 5, Additional file 4). After realigning these data, the difference between treating indels as fifth or ambiguous character state vanished. This may be explained by misplaced indels during the process of realignment which should be better treated as ambiguous character states. For empirical data, in particular indel-rich data in which we cannot discriminate between misplaced and correctly placed indels, this result implies that indels should be treated as ambiguous character state or completely removed from phylogenetic analyses [2, 4, 21].
The results concerning simulation setup D merit additional discussions. In these analyses, branch length differences between clades have been pushed to the extreme. With nucleotide sequences, the AliGROOVE algorithm correctly tagged misplaced branches if BL2 ≥ 0.9. With amino acid data even these long branches were never tagged as unreliable despite being incorrectly placed. Apparently, detectable substitutional saturation accumulated only if branch lengths BL2 were ≥ 0.9, and extremely short internal BL1 =0.01 were insufficient to accumulate any signal. This phenomenon was pronounced for amino acid data. The extremely short internal branch lengths of BL1 =0.01 can be interpreted as hard polytomies, for which tree reconstructions cannot deliver correct results. However, the frequency of hard polytomies limiting the application of the AliGROOVE algorithm in empirical data is currently unknown.
The mitochondrial DNA sequence data set of chelicerates shows strong heterogeneity of sequence divergence as indicated in the similarity matrix (Figure 9). Specimens of Acariformes display mostly random similarity to all other sequences. This observation implies that Acariformes cannot be robustly placed in the tree or are potentially misplaced despite robust bootstrap support. This is exactly what we see in the tree reconstruction using the concatenated supermatrix data set, as Acariformes are sister group to Ricinulei and form together with Parasitiformes the sister group to Pycnogonidae. This grouping which is considered implausible by many specialists [19, 20, 22, 23] gets a high bootstrap support. The questionable sister group relationship between Ricinulei and Acariformes has been identified with AliGROOVE and is tagged as suspicious in the topology inferred from the supermatrix. The AliGROOVE algorithm clearly identified the most problematic sequences and gene partitions in the data set and demonstrates its usability with this data.
Material and methods
Simulated data setup A & B
To test the efficiency of AliGROOVE we designed two sets of nucleotide and amino acid sequence data using 4-taxon and 6-taxon trees (Figure 1). The topology of the 4-taxon setup (setup A, Figure 1a) contained two long branches of unrelated taxa (with branch lengths B L 2 = 0.1,0.3,0.5,0.7,0.9,1.1,1.3,1.5) under three different branch length conditions for the other two short terminal branches (B L 3 = 0.1,0.12,0.14 and R B = 0.1) and two different lengths of the short internal branch (B L 1 = 0.01,0.02). The 6-taxon setup (setup B, Figure 1b) contained two long internal branches (B L 2 = 0.1,0.3,0.5,0.7,0.9,1.1,1.3,1.5), separated by a short internal branch (B L 1 = 0.01) while the lengths of terminal branches are kept constant (B L 3 = 0.01 and R B = 0.1). For both test setups, 100 alignments were generated for each step of B L 2 branch elongation. Sequence length of each alignment of setup A was set to 250,000 character state positions and for setup B to 50,000 character state positions to reduce the calculation time. All alignments were generated with INDELible v.1.03 . In order to simulate nucleotide sequence data we used the Jukes-Cantor model (JC) of sequence evolution and for amino acid sequence data the BLOSUM62 substitution model. All data were simulated with among site rate variation (ASRV), using a mixed-distribution model with a shape parameter α = 1.0, and a proportion of invariant sites ρ inv = 0.3. ASRV was modelled using a continuous Γ-rate distribution while indel events were not simulated.
Trees of simulated data were inferred with PhyML_3.0_linux64 [25, 26]. We analyzed the data with a mixed-distribution model (JC+ Γ + I) and correct parameter values (α = 1.0, ρ inv = 0.3), except for the categorization of the gamma distribution. The number of relative substitution rate categories was set to four (c = 4) and tree topologies and branch lengths were optimized. Maximum Likelihood analyses were performed and evaluated with a Perl pipeline. For each branch length-combination, we generated 100 data replicates and recorded the frequencies of correct and incorrect tree reconstructions using correct alignments and nearly correct substitution models (Figures 2, 3, Additional files 1, 2, 3).
Simulated data setup C
To test the efficiency of AliGROOVE when sequences contain gaps and missing data we simulated nucleotide sequence data sets for four different 15-taxon topologies (Figure 4). The -N option of AliGROOVE allows to toggle between scoring gaps as fifth character state or as ambiguity. The efficiency of AliGROOVE with and without the usage of the -N option was tested on correct alignments (Figure 5) and on realigned data sets using MAFFT [27, 28] under default values (Figure 6). Additionally, alignments were also simulated without indel events under otherwise identical parameter settings. Topologies differed only in branch lengths. While topology C1 (Figure 4a) consisted of more or less well balanced branch lengths, three terminal branches (Taxon T3, T7, T9) have been strongly increased in topology C2 (Figure 4b). One internal branch separating taxa T1 to T10 from remaining taxa has been strongly increased in topology C3 (Figure 4c), and one internal branch separating taxa T1 to T10 from remaining taxa as well as an addtional terminal branch (taxon T10) has been strongly increased in topology C4 (Figure 4d). Alignment lengths of simulation setup C were set to 50,000 sites. All data were simulated with ASRV, using a mixed-distribution model with a shape parameter α=0.5, and a proportion of invariant sites ρ inv = 0.1. ASRV was modeled using a continuous Γ-rate distribution while indel events were simulated using a Lavalette Distribution where the maximum indel length was set to 20. Insertion and deletion rate were both set to 0.2. Single state frequencies of GTR simulations were set to T = 0.35, C = 0.15, A = 0.35, G = 0.15.
Trees of simulated data were inferred with PhyML_3.0_linux64 [25, 26] using either the JC or GTR model of sequence evolution (depending on the substitution model used for data simulations) with a mixed-distribution model by estimating the α shape parameter and the proportion of invariant sites. The number of gamma shape rate categories was set to four (c = 4) and tree topologies and branch lengths were optimized. Maximum Likelihood analyses were performed and evaluated with a Perl pipeline. For each topology and AliGROOVE setting, we generated 20 data replicates and recorded the frequencies of correct and incorrect tree reconstructions (Figures 5, 6, Additional file 4).
Simulated data setup D
To test the efficiency of AliGROOVE on large data sets and more realistic data block lengths, we simulated five different data block lengths of nucleotide and amino acid sequence data for a 61-taxon topology under four different internal and terminal branch length conditions (Figure 7). Alignment lengths of single data blocks were set to 500, 1000, 1500, 2000, and 2500 sites. To simulate different substitution rates for specific branches we stepwise increased single internal and terminal branches for data block length from 0.1 to 1.3 (B L 2 = 0.1,0.5,0.9,1.3). To increase rate heterogeneity between long branches and nearest-neighbour branches we kept internal branches very short (B L 1= 0.01). All remaining branches are kept at R B = 0.1. Our simulation setup lead to a total number of 20 gene partitions with each alignment length of data blocks being represented four times, each time with another substitution rate for specific taxa due to increased branch lengths of the data underlying topology.
Like in simulation setup A and B we simulated all data with ASRV, using a mixed-distribution model with a shape parameter α = 1.0, and a proportion of invariant sites ρ inv = 0.3. ASRV was modeled using a continuous Γ-rate distribution. Indel events were not simulated. In order to simulate nucleotide sequence data we used the Jukes-Cantor model (JC) of sequence evolution and the BLOSUM62 substitution model for amino acid sequence evolution. For sequence concatenation we used FASconCAT v1.0 .
Trees of simulated data were again reconstructed with PhyML_3.0_linux64 [25, 26] using the JC of sequence evolution (JC+ Γ + I) with correct rate heterogeneity and invariant site proportion parameters (α = 1.0, ρ inv = 0.3). The number of gamma shape rate categories was set to four (c = 4). All Maximum Likelihood analyses were performed and evaluated with a Perl pipeline.
AliGROOVE was tested on complete as well as reduced data blocks and supermatrices. Reduced sequence blocks and supermatrices were used to test the overall quality improvement of given data and associated trees after removing sequences which have been identifed as potentially unreliable in the majority of the AliGROOVE analyses (Additional files 5, 6, 7, 8).
We used AliGROOVE without the -N option (indels coded as fifth character state) on a concatenated super alignment (5082 character state positions) as well as on corresponding single gene data sets of five mitochondrial genes (Atp6 ↪ 696 character state positions, COI ↪ 1575 character state positions, COII ↪ 783 character state positions, COIII ↪ 861 character state positions, and Cytb ↪ 1167 character state positions) downloaded from the NCBI genome data base for 53 chelicerate ingroup taxa and eight myriapod outgroup taxa. Single mitochondrial genes were aligned with ClustalW  and concatenated with FASconCAT . The best ML topology of the mitochondrial data set was estimated using RAxML_7.2.2  and the GTR+ Γ model. Single node support has been evaluated by performing 1000 bootstrap replicates (Figure 9).
M means the sequence length of a given alignment, N the total number of aligned taxon sequences. For example, the AliGROOVE computation time of a single 4-taxon alignment with sequence lengths of 250.000 character states took 809 seconds using a GenuineIntel(R) Core(TM) i7, 2.60GHz processor. The computation time of a 64-taxon data set with an alignment length of 2500 characters, conducting 1830 pairwise sequence analyses, took 2578 seconds.
Implementation of AliGROOVE
AliGROOVE is implemented in Perl and runs on Linux, Mac OS, and Windows operating systems. It can be used via command line or graphical user interface (GUI). The GUI of AliGROOVE (Figure 10) is based on Qt, a cross-platform application and GUI framework in C ++.
Availability of supporting data and requirements
Project name: AliGROOVE – visualization of heterogeneous sequence divergence within multiple sequence alignments and detection of inflated branch support
Operating system(s): Platform independent
Programming language: Perl
Other requirements: Perl 5.0 or higher
License: GNU GPL version 2
Any restrictions to use by non-academics: No restrictions
We would like to thank Birthe Thormann for proof reading, Christoph Mayer for helping to determine the computational time complexity of AliGROOVE, all members of the ZFMK for inspiring discussions, and two anonymous reviewers for helpful comments.
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17 (4): 540-552. 10.1093/oxfordjournals.molbev.a026334.View ArticlePubMedGoogle Scholar
- Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007, 56 (4): 564-577. 10.1080/10635150701472164.View ArticlePubMedGoogle Scholar
- Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, Stadler PF: Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol Biol. 2008, 3: 7-10.1186/1748-7188-3-7.View ArticlePubMed CentralPubMedGoogle Scholar
- Capella-Gutiérez S, Silla-Martinez JM, Gabaldón T: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009, 25 (15): 1972-1973. 10.1093/bioinformatics/btp348.View ArticleGoogle Scholar
- Misof B, Misof K: A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst Biol. 2009, 58: 21-34. 10.1093/sysbio/syp006.View ArticlePubMedGoogle Scholar
- Criscuolo A, Gribaldo S: BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 2010, 10: 210-10.1186/1471-2148-10-210.View ArticlePubMed CentralPubMedGoogle Scholar
- Kück P, Meusemann K, Dambach J, Thormann B, von Reumont B, Wägele JW, Misof B: Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front Zool. 2010, 7: 10-10.1186/1742-9994-7-10.View ArticlePubMed CentralPubMedGoogle Scholar
- Wu M, Chatterji S, Eisen JA: Accounting for alignment uncertainty in phylogenomics. PLoS ONE. 2012, 7: e30288-10.1371/journal.pone.0030288.View ArticlePubMed CentralPubMedGoogle Scholar
- Hartmann S, Vision TJ: Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment?. BMC Evol Biol. 2008, 8: S13-10.1186/1471-2148-8-13.View ArticleGoogle Scholar
- Schwarzer J, Misof B, Tautz D, Schliewen UK: The root of the East African cichlid radiations. BMC Evol Biol. 2009, 9: 186-10.1186/1471-2148-9-186.View ArticlePubMed CentralPubMedGoogle Scholar
- Simon S, Strauss S, von Haeseler A, Hadrys H: A phylogenomic approach to resolve the basal pterygote divergence. Mol Biol Evol. 2009, 26 (12): 2719-2730. 10.1093/molbev/msp191.View ArticlePubMedGoogle Scholar
- Meusemann K, von Reumont BM, Simon S, Roeding F, Kück P, Strauss S, Ebersberger I, Walzl M, Pass G, Breuers S, Achter V, von Haeseler A, Burmester T, Hadrys H, Wägele JW, Misof B: A phylogenomic approach to resolve the arthropod tree of life. Mol Biol Evol. 2010, 27 (11): 2451-2464. 10.1093/molbev/msq130.View ArticlePubMedGoogle Scholar
- Murienne J, Edgecombe G, Giribet G: Including secondary structure, fossils and molecular dating in the centipede tree of life. Mol Phylogenet Evol. 2010, 57: 301-313. 10.1016/j.ympev.2010.06.022.View ArticlePubMedGoogle Scholar
- Dinapoli A, Zinssmeister C, Klussmann-Kolb A: New insights into the phylogeny of the Pyramidellidae (Gastropoda). J Mollus Stud. 2011, 77: 1-7. 10.1093/mollus/eyq027.View ArticleGoogle Scholar
- Kück P, Hita-Garcia F, Misof B, Meusemann K: Improved phylogenetic analyses corroborate a plausible position of Martialis Heureka in the ant tree of life. PLoS ONE. 2011, 6 (6): e21031-10.1371/journal.pone.0021031.View ArticlePubMed CentralPubMedGoogle Scholar
- Nesnidal MP, Heimkampf M, Bruchhaus I, Hausdorf B: The complete mitochondrial genome of Flustra foliacea (Ectoprocta, Cheilostomata) - compositional bias affects phylogenetic analyses of lophotrochozoan relationships. BMC Genomics. 2011, 12: 572-10.1186/1471-2164-12-572.View ArticlePubMed CentralPubMedGoogle Scholar
- Privman E, Penn O, Pupko T: Improving the performance of positive selection inference by filtering unreliable alignment regions. Mol Biol Evol. 2012, 29: 1-5. 10.1093/molbev/msr177.View ArticlePubMedGoogle Scholar
- von Reumont BM, Jenner RA, Wills MA, Dell’Ampio E, Pass G, Ebersberger I, Meyer B, Koenemann S, Iliffe TM, Stamatakis A, Niehus O, Meusemann K, Misof B: Pancrustacean phylogeny in the light of new phylogenomic data: support for Remipedia as the possible sister group of Hexapoda. Mol Biol Evol. 2012, 29 (3): 1031-1045. 10.1093/molbev/msr270.View ArticlePubMedGoogle Scholar
- Dabert M, Witalinski W, Kazmierski A, Olszanowski Z, Dabert J: Molecular phylogeny of acariform mites (Acari, Arachnida): Strong conflict between phylogenetic signal and long-branch attraction artifacts. Mol Phylogenet Evol. 2010, 56: 222-241. 10.1016/j.ympev.2009.12.020.View ArticlePubMedGoogle Scholar
- Pepato AR, daRocha CEF, Dunlop JA: Phylogenetic position of the acariform mites: sensitivity to homology assessment under total evidence. BMC Evol Biol. 2010, 10: 235-10.1186/1471-2148-10-235.View ArticlePubMed CentralPubMedGoogle Scholar
- Capella-Gutiérez S, Gabaldón T: Measuring guide-tree dependency of inferred gaps in progressive aligners. Bioinformatics. 2013, 29 (8): 1011-1017. 10.1093/bioinformatics/btt095.View ArticleGoogle Scholar
- Dunlop J, Alberti G: The affinities of mites and ticks: a review. J Zool Syst Evol Res. 2008, 46: 1-18.Google Scholar
- Talarico G, Michalik P: Spermatozoa of an Old World Ricinulei (Ricinoides karschii, Ricinoidae) with notes about the relationships of Ricinulei within the Arachnida. Tissue Cell. 2010, 42 (6): 383-390. 10.1016/j.tice.2010.10.003.View ArticlePubMedGoogle Scholar
- Fletcher W, Yang Z: INDELible: A flexible simulator of biological sequence evolution. Mol Biol Evol. 2009, 26 (8): 1879-1888. 10.1093/molbev/msp098.View ArticlePubMed CentralPubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.View ArticlePubMedGoogle Scholar
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: PhyML 3.0: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59 (3): 307-321. 10.1093/sysbio/syq010.View ArticlePubMedGoogle Scholar
- Katoh K, Kuma Ki, Hiroyuki T, Miyata T: MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33 (2): 511-518. 10.1093/nar/gki198.View ArticlePubMed CentralPubMedGoogle Scholar
- Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008, 9 (4): 286-298. 10.1093/bib/bbn013.View ArticlePubMedGoogle Scholar
- Kück P, Meusemann K: FASconCAT: Convenient handling of data matrices. Mol Phylogenet Evol. 2010, 56: 1115-1118. 10.1016/j.ympev.2010.04.024.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.View ArticlePubMed CentralPubMedGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22 (21): 2688-2690. 10.1093/bioinformatics/btl446.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.