**Appendix 1 – PROLOG code for the fusion of trees built with 3 different methods**

We modeled biologists' interpretation in a very natural way in PROLOG by these rules:

*fusion(npl_A) :- full_congruence(templeton, _), full_congruence (kishino-hasegawa, _)*.

*fusion(npl_A) :- full_congruence (templeton, _), partial_congruence(kishino-hasegawa, _)*.

*fusion(npl_A) :- full_congruence (kishino-hasegawa, _), partial_congruence(templeton, _)*.

*fusion(npl_T) :- full_congruence (templeton, _), no_congruence(kishino-hasegawa, _)*.

*fusion(npl_K) :- full_congruence (kishino-hasegawa, _), no_congruence(templeton, _)*.

*fusion(no_fusion) :- partial_congruence(kishino-hasegawa, _), no_congruence(templeton, _)*.

*fusion(no_fusion) :- no_congruence(kishino-hasegawa, _), partial_congruence(templeton, _)*.

*fusion(no_fusion) :- no_congruence(kishino-hasegawa, _), no_congruence(templeton, _)*.

*fusion(Label) :- partial_congruence(kishino-hasegawa, Label), partial_congruence(templeton, Label)*.

*Val1 < 0.05*,

*Val2 >= 0.05*,

*concat_labels (Best, Label2, Label)*.

These rules can be easily maintained. For example, we can decide to do the fusion on the "best" tree and not always on NJ tree like we do today by default in the 5 first cases. Rules will so look like this:

*fusion(FusionOnTheBestLabel) :-*

*full_congruence(templeton, Best)*,

*full_congruence (kishino-hasegawa, Best)*,

*get_fusion_label(Best, FusionOnTheBestLabel)*.

Information brought by EK unit during the pipeline execution take a form like this:

*topology(NameOfTest, Best, [Label1, Val1], [Label2, Val2])*.

*(e.g.: topology(templeton, n, [p, 0.15], [l, 0.01]). that means that for Templeton test, the tree with the best topology is the one built with Neighbor Joining, that tree built with Maximum Parsimony is congruent with a 0.15 rate and that the one built with Maximum Likelihood is congruent with a 0.01 rate.)*

Here are the rules for congruence tests:

*% congruence is full when comparing rates are higher or equal to the chosen threshold*

*full_congruence(Test, Best) :-*

*topology(Test, Best, [_, Val1], [_, Val2])*,

*Val1 >= 0.05*,

*Val2 >= 0.05*.

*% we have no congruence when comparing rates are lower than the chosen threshold*

*no_congruence(Test, Best) :-*

*topology(Test, Best, [_, Val1], [_, Val2])*,

*Val1 < 0.05*,

*Val2 < 0.05*.

*% congruence is partial when one of comparing rates is lower than the chosen threshold*

*% the label associated to the fusion type is just the concatenation of label for "best" (see before) tree and for its congruent tree*

*partial_congruence(Test, Label) :-*

*topology(Test, Best, [Label1, Val1], [Label2, Val2])*,

*Val1 >= 0.05*,

*Val2 < 0.05*,

*concat_labels(Best, Label1, Label)*.

*partial_congruence(Test, Label) :-*

*topology(Test, Best, [Label1, Val1], [Label2, Val2])*,

**Appendix 2 -Commented prolog code for paralogy groups' detection**

Each node of domain's phylogenetic tree, given to the "expert system" by an EK- unit, can have many children but for implementation reasons, we code it as a binary tree. Each node is a term like this:

*node(TheSpecies, LeftChild, RightChild)*

In the annotated tree, each node knows how many sequences it contains and has the full list of the different species it includes:

*node(NumberOfSequences, AllSpecies, LeftChild, RightChild)*

The main PROLOG rule for groups' detection is:

*% detecting paralogy groups in a phylogenetic tree implies annotating tree nodes with species information then searching biggest groups with different species*

*paralogy_groups(PhylogeneticTree, ParalogyGroups) :-*

*subtree_species(PhylogeneticTree, AnnotedPhylogeneticTree)*,

*biggest_groups(AnnotedPhylogeneticTree, ParalogyGroups)*.

*(Rules with the same signature express a "logical OR" between them)*

*% a leaf node which species is different as the one chosen as out group can belong to a paralogy group*

*% (*) ! character in a PROLOG rule means that if the first rule is successful, PROLOG engine doesn't try other rules with same signature*

*subtree_species(node(Species, no, no), noeud(1, [Species], no, no)) :-*

*outgroup_species(OutgroupSpecies)*,

*Species ≠ OutgroupSpecies, !*.

*% a leaf node which species is the same as the one chosen as out group can't belong to a paralogy group*

*subtree_species(node(_, no, no), node(1, no, no, no)) :- !*.

*% annotate a node which has only one child is equivalent to annotate this child*

*% (we have pseudo nodes to force binary structure)*

*subtree_species(node(_, Child, no), AnnotatedNode) :- subtree_species(Child, AnnotatedNode), !*.

*% annotate a sub-tree with two children is equivalent to annotate the children and to compile found species*

*subtree_species(node(Species, LeftChild, RightChild), node(N, SpeciesList, Left, Right)) :-*

*subtree_species(LeftChild, Left)*,

*Left = node(NL, SpeciesListL, _, _)*,

*subtree_species(RightChild, Right)*,

*Right = node(NR, SpeciesListR, _, _)*,

*compile_annotations(NL, SpeciesListL, NR, SpeciesListR, N, SpeciesList)*

*% two sub-trees with the same unique species merge in a leaf of this species*

*compile_annotations(_, [Species], _, [Species], 1, [Species])*.

*% if one of the two sub-trees is invalidated for merging, the compilation is a tree invalidated for merging*

*% however we compute the total number of sequences in the sub-tree*

*compile_annotations(NL, no, NR, _, N, no)) :- is(N, NL + NR)*.

*compile_annotations(NL, _, NR, no, N, no)) :- is(N, NL + NR)*.

% *if no species is common between the two sub-trees, we can merge all species*

*compile_annotations(NL, SpeciesListL, NR, SpeciesListR, N, SpeciesList)) :-*

*is(N, NL + NR)*.

*intersection(SpeciesListL, SpeciesListR, CommonSpecies)*,

*CommonSpecies = []*,

*concat(SpeciesListL, SpeciesListR, SpeciesList)*.

*% search biggest paralogy groups*

*biggest_groups(node(N, no, Child1, Child2), Groups) :-*

*biggest_groups(Child1, Groups1)*,

*biggest_groups(Child2, Groups2)*,

*concat(Groups1, Groups2, Groups), !*.

*% accept group if more than 4 different species*

*biggest_groups(Group, [Group]) :-*

*Group = node(N, TaxeIds, _, _)*,

*diff(TaxeIds, no)*,

*N >= 4, !*.

% reject subtree as a group

biggest_groups(_, []).