Phylogeny inference allows, among others, detecting orthology/paralogy relationships among gene family members (*e.g*., [1, 2]), estimating divergence times and evolutionary rates (*e.g*., [3–5]), reconstructing ancestral sequences (*e.g*., [6–9]), identifying molecular characters constrained by purifying selection or prone to positive selection (*e.g*., [10]), uncovering hidden biodiversity (*e.g*., [11]), and mapping the evolution of morphological, physiological, epidemiological, biogeographical and even behavioral characters [12, 13]. Molecular phylogeny inference is now a mature science, and an important part of the maturation process pertained to the realization (in the last 10 years) that the quest for the Holy Grail of absolute best tree should be abandoned for a much more meaningful goal: the inference of clades and trees robustness. Still, that objective remained intractable in practice because of *(a)* the NP-hard nature of optimality-criterion-based phylogeny inference (*i.e*., no algorithm can solve it in polynomial time; [14, 15]) and *(b)* the computing-time requirements of using complex substitution models (and rate heterogeneity across sites) in the framework of what had been identified as the probable most robust optimality criterion: Maximum Likelihood (ML; [16–18]). Today large phylogeny inference is incorporated, across biological disciplines, as an essential step in most comparative studies involving nucleotide or protein sequences. This has been made possible thanks to both theoretical and practical developments.

First, one key advance that made large phylogeny inference tractable is the implementation in this field of stochastic heuristics with inter-step optimization, *i.e*., a family of approaches that existed for decades in physics and computer science and explore multidimensional solution spaces in a much more efficient manner than the older intra-step optimization hill-climbing methods. Indeed, in the latter, one prime parameter (typically, the topology of the tree) is modified and all other parameters are optimized before the new solution is evaluated whereas, in stochastic heuristics, all free parameters are optimized while the search proceeds. Inter-step optimization methods include MCMC approximations of the Bayesian approach [19, 20], stochastic simulated annealing [21], and genetic algorithms [22–26]. The efficiency of stochastic heuristics is quite counter-intuitive but can be explained by several factors: *(a)* poorer solutions are accepted with a non-null probability (contrary to hill-climbing that strictly restricts moves toward better likelihood values) such that valleys in likelihood space can eventually be crossed; *(b)*, parameters are not over-optimized (*e.g*., starting and intermediate trees are generally largely sub-optimal, hence, optimizing model parameters on these topologies is a clear example of over-fitting). We think that avoiding over-optimization at every topology evaluation generates a flatter likelihood space shape, such that valleys are more easily crossed and local optima more easily escaped. This suggestion however requires further investigation.

Second, several stochastic methods have been incorporated into robust application softwares. The importance of that point should not be underestimated. For example, the recent success of Bayesian methods is probably due as much to its incorporation into a robust and efficient software (MrBayes; [27]) as to the theoretical appeal of generating marginal posterior probabilities [19]. The software RaxML [28], enjoys well-deserved popularity because it is one of the fastest ML phylogeny inference programs available to date (despite that it does not incorporate stochastic methods) thanks to the implementation of approximations to rate heterogeneity across sites and of smart computer science tricks speeding up likelihood computation: optimized parallel code and 'Subtree Equality Vectors' (*i.e*., the extension of character compression to the subtree level). Similarly, highly efficient parallel code has recently been implemented for the evaluation of phylogenies on graphics processing units (GPUs), resulting in about 100-fold speed increase over an optimized CPU-based computation [29]. This efficient use of new hardware, existing stochastic heuristics (in this case, an MCMC approach in a Bayesian framework), and smart code parallelization for efficient harnessing of the hundreds of GPU processing cores, allowed the authors to successfully use a 60-state codon model on a dataset of 62 complete mitochondrial genomes [29].

The availability of multiple excellent softwares implementing different robust heuristics is clearly an asset for the end user: reliable results might be identified because they remain stable across softwares and methods. However, most users chose one single main software for their analyses, and this choice is sometimes dictated by availability of functionalities of importance but that do not pertain to the performances of the specific heuristic implemented (*e.g*., ability to perform batch analyses, availability of GTR nucleotide substitution model [30] or of rate heterogeneity [31–33], possibility to partition data). Finally, given that the need to infer large trees is critical in multiple biological disciplines, the non-specialist can be baffled by the large number of available heuristics, parameters, and softwares, such that the most user-friendly tools are often preferred even if more robust or more efficient (but less user-friendly) softwares are available. There is therefore a challenge to supply softwares that are easy to use for the non-specialist, provide flexibility for the specialist, and allow fast and robust inference for both.

The Metapopulation Genetic Algorithm (MetaGA; [23]) is an evolutionary computation heuristic in which several populations of trees exchange topological information which is used to guide the GA operators for much faster convergence. Despite the fact that the metaGA had been implemented in a simple and unoptimized software (metaPIGA v1) together with simple nucleotide substitution models, an approximate rate heterogeneity method, and only a low number of functionalities, is has been shown as one of the most efficient heuristics under the ML criterion [23, 34, 35]. Furthermore, it has been suggested that multiple metaGA searches provide an estimate of the posterior probability distribution of possible trees [23] although this proposition clearly warrants much further investigation. Here, we present MetaPIGA-2.0 the first phase of a robust implementation of the MetaGA (and other stochastic methods such as a classical Genetic Algorithm and Simulated Annealing) together with complex substitution models, rate heterogeneity, and high parameterization for the phylogeneticist, as well as an ergonomic interface and easy-to-use functionalities for the non-specialist.