Robustness and similarity
We study a random sample \(X_i\) of \(10^3\) GRNs that produce the same phenotype A. We refer to the fraction of mutations that do not alter the phenotype of a GRN as that network’s mutational robustness (\(R_\mu\)). The GRNs in our random sample \(X_i\) were highly robust against mutations (mean \(R_\mu \pm\)SD: \(0.81\pm 0.06\)) (Fig. 1a). We also evaluated similarity after mutations (\(S_\mu\)), which refers to the average resemblance of phenotype A, produced by all networks in sample \(X_i\), to the phenotypes that each network in \(X_i\) produces after subjecting it to mutations (see details in Methods and Additional file 1). Mean \(S_\mu\) of the networks in sample \(X_i\) was \(0.94\pm 0.028\) (Fig. 1b). As expected, there is a strong positive correlation between \(R_\mu\) and \(S_\mu\) (Spearman’s \(\rho =0.82\), \(p=5.2\times 10^{-240}\)). Throughout this paper, this correlation is also found every time we evaluate \(R_\mu\) and \(S_\mu\).
We also inquired about the phenotypic effects that gene duplication has on GRNs in our sample \(X_i\). Analogously as for mutations, we define \(R_D\) as the fraction of all the \(N_r\) different duplications of a single regulation gene that preserves phenotype A. We also define \(S_D\) as the average resemblance between a network’s original phenotype A and the ones that duplication of transcription factors produces (see Methods and Additional file 1). Among the networks in sample \(X_i\), mean \(R_D\) equals \(0.60\pm 0.17\) and mean \(S_D\) equals \(0.88\pm 0.08\). Again, as expected, \(R_D\) and \(S_D\) strongly correlate (\(\rho =0.82\), \(p=9.0\times 10^{-241}\)).
To discern between effects exclusively due to duplication and effects that result from merely adding a new regulator, we considered two kinds of non-duplicate gene addition. In one of them, we added a new regulator gene with as many interactions as the average for the whole network. On the other kind of non-duplicate gene addition, the new regulator gene had the same number of connections as a randomly picked regulator gene in the network. In both cases, we wired randomly the interactions of the new gene to the rest of the network genes. Here we only delve into the first kind of gene addition. Notwithstanding, our observations are quite similar in both cases. We found that, in networks in sample \(X_i\), the fraction of independent single-gene additions that did not change the phenotype, \(R_a\), was \(0.43\pm 0.19\). The mean similarity of the phenotypes after non-duplicate gene addition to a network’s original phenotype, \(S_a\), was \(0.81\pm 0.09\). Again, there was a strong positive correlation between \(R_{a}\) and \(S_{a}\) (\(\rho =0.75\), \(p=1.6\times 10^{-181}\)). This data shows that the GRNs were more capable to buffer the addition of a duplicate gene than the addition of a non-duplicate gene. \(R_D\) was significantly greater than \(R_a\) (Mann-Whitney U test; \(U=348,252\), \(p=1.1\times 10^{-88}\)) and \(S_D\) than \(S_a\) (\(U=439,061\), \(p=4.1\times 10^{-99}\)).
Previous research suggests that networks robust to one kind of disturbance (i.e., against mutations) are often also robust to others such as recombination [82] or to transient perturbations due to developmental noise or environmental fluctuations [46]. Is this observation also valid for gene duplications? This is indeed the case. We found a strong positive correlation between \(R_\mu\) and \(R_D\) (\(\rho =0.66\), \(p=4.7\times 10^{-127}\)) and between \(S_\mu\) and \(S_D\) (\(\rho =0.72\), \(p=2.5\times 10^{-158}\)). The observation is also true for the addition of new non-duplicate genes, albeit the correlation is not as strong in this case. \(R_\mu\) and \(R_a\) had a positive correlation (\(\rho =0.44\), \(p=8.9\times 10^{-48}\)) just as similar as \(S_\mu\) and \(S_a\) (\(\rho =0.44\), \(p=7.8\times 10^{-49}\)).
Next, we analyzed the effect of mutations after adding a gene through duplication. For each network in \(X_i\) we picked at random one transcription factor gene and duplicated it. We thus created a new set of GRNs, \(X_D\), that we can compare to GRNs in \(X_i\) to assess the effect of gene duplication. Next, we perturbed GRNs in \(X_{D}\) with mutations to detect changes in their \(R_\mu\) and \(S_\mu\) behavior after duplication with respect to the phenotype A that GRNs in \(X_i\) produced before duplication. After duplication, \(R_\mu\) decreased significantly to \(0.51\pm 0.4\) (Wilcoxon signed-rank test; \(W=354,908\), \(p=7.5\times 10^{-37}\)), and \(S_\mu\) to \(0.85\pm 0.17\) (\(W=366,543\), \(p=2.0\times 10^{-37}\)).
The distribution of \(R_\mu\) after gene duplication, in \(X_{D}\), is bimodal (Fig. 1a). This suggests the existence of two different classes of networks that respond differently to gene duplication. We thus inquired whether the ability to produce phenotype A after gene duplication explained the two different kinds of response to mutations. Most (\(58.8\%\)) of the GRNs in \(X_D\) still produced phenotype A after duplication of a regulator gene picked at random. Hereafter we refer to such networks as duplication-resistant GRNs. Within duplication-resistant GRNs, the fraction of mutations that did not alter the phenotype of the GRNs in \(X_D\) was \(R_\mu =0.841\pm 0.059\), and the similarity of their phenotypes to the original phenotype A was \(S_\mu =0.957\pm 0.024\) (Fig. 1b). That is, duplication produced in these networks a significant increase in \(R_\mu\) from \(0.823\pm 0.057\) to \(0.841\pm 0.059\) (\(W=129,712\), \(p=6.6\times 10^{-34}\); Fig. 1a) and \(S_\mu\) from \(0.949\pm 0.026\) to \(0.957\pm 0.024\) (\(W=133,589\), \(p=2.0\times 10^{-30}\), Fig. 1b). Such increases may seem small, however, they imply that, on average, duplication reduces the effects of mutation in around \(10\%\) for \(R_\mu\) and \(16\%\) for \(S_\mu\). In contrast, mutations after a gene duplication that does not yield the original phenotype A are only rarely able to recover such original phenotype (Fig. 1a). In such duplication-susceptible GRNs we found a significant decrement in \(R_\mu\) from \(0.79\pm 0.06\) to \(0.04\pm 0.03\) (\(W=85,078\), \(p=1.5\times 10^{-69}\)) and in \(S_\mu\) from \(0.94\pm 0.03\) to \(0.69\pm 0.17\) (\(W=85,078\), \(p=1.5\times 10^{-69}\)).
Yet, duplication-susceptible GRNs displayed high similarity to the new phenotype that duplication created (\(0.95\pm 0.03\)). Nonetheless, our focus is on a GRN’s ability to preserve the original pre-duplication phenotype.
We also studied the effect of adding a non-duplicate gene on \(R_\mu\) and \(S_\mu\). Thus, for each GRN in sample \(X_i\) we added a new random transcription factor gene, as detailed in Methods, to obtain a new set of GRNs that we call \(X_a\). In this case, \(R_\mu\) equals \(0.35\pm 0.4\) and \(S_\mu\) equals \(0.76\pm 0.2\), which means a decrease in \(R_\mu\) (\(W=453,778\), \(p=4.6\times 10^{-119}\)) and \(S_\mu\) (\(W=461,009\), \(p=4.6\times 10^{-118}\)) after non-duplicate gene addition. The decrement in \(R_\mu\) and \(S_\mu\) after non-duplicate gene addition was more pronounced than after gene duplication (\(U=650,574\), \(p=9.3\times 10^{-32}\) and \(U=634,346\), \(p=1.2\times 10^{-25}\), respectively). While almost \(60\%\) of the GRNs produced the original phenotype after duplication, only \(41.3\%\) of the GRNs conserved the original phenotype A after adding a new non-duplicate gene. We refer to such networks as ‘addition-resistant GRNs’.
We also found a bimodal distribution in \(R_\mu\) after non-duplicate addition in networks in \(X_{a}\). Among addition-resistant GRNs, we did not find a significant difference in \(R_\mu\) after non-duplicate gene addition (\(R_\mu =0.82\pm 0.06\); \(W=36,767\), \(p=0.09\)), nor in \(S_\mu\) (\(0.95\pm 0.03\); \(W=46,155\), \(p=0.16\); Fig. 1b). In contrast, in addition-susceptible GRNs, those networks in which addition of a non-duplicate gene produced a phenotype other than A, \(R_\mu\) significantly decreased from \(0.80\pm 0.06\) to \(0.02\pm 0.03\) (\(W=172,578\), \(p=4.0\times 10^{-98}\)) and \(S_\mu\) from \(0.94\pm 0.03\) to \(0.64\pm 0.18\) (\(W=172,578\), \(p=4.0\times 10^{-98}\)). Withal, while mutations in addition-susceptible networks yielded phenotypes that were very different to phenotype A, mutations after gene-addition produced a phenotype similar to the one that gene addition alone produced in those networks (\(0.94\pm 0.03\)).
Are there any structural traits in a duplicate that are associated to the distinction between duplication-resistant and duplication-susceptible GRNs? We measured three properties of the duplicate gene in both kinds of networks: (i) the path length from the duplicate to the nearest structural gene; (ii) the number of the duplicate’s incoming interactions (Fig. 2a); (iii) the number of the duplicate’s outgoing interactions (Fig. 2b). Neither path length (duplication-resistant: \(1.17\pm 0.37\); duplication-susceptible: \(1.17\pm 0.38\); \(U=120,426\); Bonferroni-corrected \(p>0.95\)) nor the number of incoming interactions (duplication-resistant: \(3.26\pm 1.74\); duplication-susceptible: \(3.26\pm 1.63\); \(W=121,102\); Bonferroni-corrected \(p>0.95\)) yielded significant differences. In contrast, we found a statistically significant difference (\(U=90,842\), Bonferroni-corrected \(p=2.7\times 10^{-11}\)) in the number of the duplicate’s outgoing interactions between duplication-resistant (\(4.45 \pm 5.29\)) and duplication-susceptible (\(5.2 \pm 1.8\)) GRNs. Our results thus suggest that duplication is more likely to preserve the original phenotype when the duplicate gene regulates few other genes.
Phenotypic similarity after different kinds of mutations
So far, we have analyzed the effects of single-interaction mutations without distinguishing between distinct kinds of such genetic perturbations. However, separate classes of mutations may produce different kinds of phenotypic effects. Thus, we now consider separately the two kinds of mutations that can generate the most contrasting effects, namely, those mutations that (i) delete an interaction or (ii) yield a new interaction. The first experiment consists in the deletion of every interaction in the network, one at a time. In the second experiment, we try 10 positive and 10 negative interactions for every possible missing interaction, one at a time. We extract interaction weights from a standard normal distribution upon which we force the sign that we require. We also consider whether, in the affected interaction, the regulator gene is a duplicate or a singleton and if the target gene is a duplicate or singleton transcription factor or a structural gene (descriptive statistics appear in Additional file 2). We found that the effect of mutations after duplication depends on both the kind of mutation and genes involved.
For each of the two different mutation assays, we used a type II two-way ANOVA to study the effect that the kind of target and regulator have on phenotypic similarity after the two different kinds of mutations (Additional file 3).
When we deleted an interaction, we found statistical differences for regulator and target genes and for their combined effects (Fig. 3a). This means that the kind of gene at the beginning and at the ending point of the perturbed interaction influences the outcome. Notwithstanding, we also found that the effect size for the regulator (\(\eta ^2 = 0.13\)) was much greater than that for the target gene or for their combined effect (\(\eta ^2 = 0.017\) and 0.012, respectively). Moreover, Fig. 3a shows that the kind of regulator gene in the deleted interaction is more relevant than the kind of target gene. Specifically, we found that those interactions with a duplicate as a regulator are better avoiding the effects of mutations that remove interactions.
The pattern was different when we subjected the GRNs to mutations that added a regulatory interaction to the network (Additional file 3; Fig. 3b). In this case, we only found significant differences for the target gene factor, along with a strong effect size (\(\eta ^2 = 0.163\)). Thus, when we add regulatory interactions, target genes are much more important than regulator genes. What kind of target gene favors a greater \(S_\mu\)? Clearly, we appreciate in Fig. 3b that those networks in which the target gene in the new interaction is a duplicate tend to produce phenotypes more similar to the original one.
Gene duplication and access to new phenotypes through mutation
Next we turned to study how transcription factor duplication affects a GRN’s capacity to generate new distinct phenotypes after mutations or, in other words, the capacity to uncover phenotypes other than A. We thus evaluate mutational access to new phenotypes as the number of distinct phenotypes that a GRN yields after perturbing it with \(2N_r(N_r+N_s)\) independent random mutations, one at a time. Even when such mutational access to new phenotypes is strongly negatively correlated to the GRNs’ capacity to keep the original phenotype A (\(S_\mu\), \(\rho =-0.59\), \(p=1.2\times 10^{-93}\)), variation on \(S_\mu\) only explains \(34.8\%\) of the variation on the number of mutation-accessible phenotypes (Fig. 4).
Originally, GRNs in our initial sample \(X_i\) had mutational access to a mean number of \(18.33\pm 8.56\) different phenotypes. After duplication, the duplication-resistant GRNs in \(X_D\) significantly decreased their number of mutational-accessible phenotypes from \(17.78\pm 8.21\) to \(15.68\pm 7.90\) (\(W=101,092\), \(p=2\times 10^{-16}\)). In contrast, duplication-susceptible GRNs significantly increased their accessible phenotypes from \(19.12\pm 9.47\) to \(22.76\pm 15.27\) (\(W=30,350\), \(p=0.0002\)).
GRNs in \(X_a\), those with a non-duplicate additional gene, had a different response: they always increased their number of accessible phenotypes. Addition-resistant GRNs increased their access to new distinct phenotypes from \(17.06\pm 8.40\) to \(21.15\pm 8.58\) (\(W=47,464\), \(p=0.0002\)), and addition-susceptible GRNs from \(19.22\pm 8.92\) to \(34.55\pm 20.74\) (\(W=153,199\), \(p=7.5\times 10^{-70}\)).
We detected that some phenotypes that were accessible through mutations before adding a gene were also accessible after adding a duplicate or non-duplicate regulator gene. We called this kind of phenotype ‘recurrently accessible’ (Fig. 5a). We called the new phenotypes that were no longer accessible through mutation after adding a gene the ‘lost accessible’ phenotypes; and those phenotypes only accessible after adding a gene the ‘newcomer accessible’ ones (Fig. 5a).
Most of the new phenotypes produced by mutation in duplication-resistant GRNs in set \(X_D\) were recurrent phenotypes (\(61.67\%\pm 22.4\%\)). In comparison, the proportion of recurrently accessible phenotypes in duplication-susceptible GRNs represented no more than the \(14.04\%\pm 12.1\%\) of their whole repertory of mutation-accessible phenotypes.
Otherwise, GRNs in \(X_a\), those with a newly added non-duplicate gene, also expressed recurrent phenotypes. Here, this kind of phenotype represented less than the half of the phenotypes produced by the addition-resistant GRNs (\(46.29\%\pm 19.92\%\)); for addition-susceptible GRNs, the amount decreased further to \(8.32\%\pm 7.14\%\).
As we observe in Fig. 5b, duplication-resistant GRNs in \(X_D\) lost their access to a mean of \(8.95\pm 6.86\) phenotypes after duplication, yet they got \(6.86\pm 5.95\) newcomer accessible phenotypes; while, \(8.83\pm 4.41\) phenotypes remained accessible on average. In contrast, the average number of recurrently mutation-accessible phenotypes in duplication-susceptible GRNs was \(2.60\pm 2.18\). Duplication-resistant GRNs produce a number of recurrently accessible phenotypes that is statistically indistinguishable from the number of lost accessible phenotypes (\(W=82,902\), \(p=0.36\)), but significantly higher than the number of newcomer accessible phenotypes (\(W=109,742\), \(p=7\times 10^{-15}\)). Moreover, they produce significantly more recurrent phenotypes than duplication-susceptible GRNs (\(U=225,540\), \(p=5\times 10^{-120}\)).
In contradistinction to duplication-resistant GRNs, addition-resistant GRNs had more lost accessible phenotypes on average (\(16.87\pm 8.35\)) and produced more newcomer phenotypes through mutations (\(11.96\pm 7.49\); \(W=7,877.5\), \(p=7\times 10^{-44}\)) than recurrent phenotypes (\(9.18\pm 4.58\); \(W=26,528\), \(p=9\times 10^{-8}\)).
What kind of phenotypes are more likely to remain accessible through mutation after duplication? We speculated that phenotypes more similar to the original phenotype were more likely to be recurrently-accessible. Across all networks in our sample, the value of \(S_\mu\) after duplication for a typical recurrently accessible phenotype averages \(0.71\pm 0.12\), that of lost accessible phenotypes \(0.69\pm 0.1\) and the one of newcomer accessible phenotypes \(0.64\pm 0.12\) (Fig. 5d). Indeed, recurrently accessible phenotypes are significantly more similar to the original phenotype A than the other classes of phenotypes (lost: \(W=258,458\), \(p=2\times 10^{-7}\) and newcomer: \(W=301,904\), \(p=1\times 10^{-36}\)). We notice the same trend, although slightly more pronounced, when we focus exclusively on duplication-resistant GRNs. In such duplication-resistant GRNs, average \(S_\mu\) is higher (\(0.75\pm 0.08\)) for recurrently accessible phenotypes than for lost accessible phenotypes (\(0.67\pm 0.11\); \(W=130,705\), \(p=1\times 10^{-39}\)) and for newcomer accessible phenotypes (\(0.67\pm 0.12\); \(W=117,014\), \(p=9\times 10^{-37}\)).
We also revised the different kinds of phenotypes in GRNs after addition of a non-duplicate gene. In this case, we observe a similar pattern as for GRNs after duplication. For all the networks in our sample, the mean value of \(S_\mu\) after addition of a non-duplicate is \(0.71\pm 0.14\) for recurrently accessible phenotypes, \(0.69\pm 0.1\) (\(W=239,074\), \(p=3\times 10^{-8}\)) for lost accessible phenotypes and \(0.61\pm 0.11\) (\(W=334,184\), \(p=3\times 10^{-61}\)) for newcomer accessible phenotypes.
We also hypothesized that perhaps those phenotypes that had easier access through mutation before duplication were more likely to remain mutation-accessible after duplication. To assess this possibility we counted the number of different mutations leading to each distinct new phenotype before duplication. Figure 5c shows that, indeed, the average number of mutations leading to each recurrently-accessible phenotype (\(7.03\pm 3.93\)) is greater (\(W= 379,310\), \(p=5\times 10^{-91}\)) than the number of mutations leading to lost accessible phenotypes (\(3.95\pm 2.33\)). The difference holds when we only take into account duplication-resistant GRNs (\(6.73\pm 2.42\) and \(3.07\pm 1.94\); \(W=153,032\), \(p=4\times 10^{-82}\)). Even though there are fewer recurrently-accessible phenotypes in duplication-susceptible GRNs than in duplication-resistant GRNs (Fig. 5b), the number of mutations leading to each such recurrently-accessible phenotypes in both classes of networks is statistically indistinguishable (duplication-susceptible GRNs: \(7.52\pm 5.52\); \(U=113,052\), \(p=0.19\); Fig. 5c). Our observations support that phenotypes with easier mutational access before duplication are more likely to remain accessible through mutations after duplication. It is noteworthy that this effect also appears in addition-resistant GRNs. In this case, there are more mutations leading to each recurrently accessible phenotype (\(6.47\pm 2.23\)) than to each lost accessible phenotype (\(3.03\pm 2.10\); \(W=73,816\), \(p=1\times 10^{-58}\)).