Different networks by different distance cut-offs
Before discussing the signaling effects predicted by our model, it would be worth discussing the effect of different distance cutoff values on the network analysis because different network connectivity often produces significantly different results . This type of ambiguity due to network rewiring often arises in the field of network analysis and makes it difficult to interpret the biological meaning. For example, the relationship between degree centrality and gene essentiality in the yeast protein interaction network has been dramatically changed as new connections have been detected and added . Therefore, it is important to select a robust cutoff value that is insensitive within a certain reasonable range and to select the most meaningful value that shows higher performance in appropriate test cases. After considering these aspects, a cutoff value of 8 Å was used in this study because it had longer coverage (than previous studies [26, 33, 34] that used cutoffs of around 5 Å), high robustness, and high predictability of binding hot spots (see later sections).
The long-range cutoff is more suitable if the network is connected by weighted edges as in the present study because the transition matrix from a weighted network would not be changed significantly as the cutoff values are changed. In fact, the EVT values of different cutoff values in our examples were highly correlated. For example, the correlation coefficients between 6 Å and 12 Å were higher than 0.8 (Additional file 1). The performances of the predicted binding hot spots also showed similar results to the performances in the range of 6 Å and 12 Å (Figure 3). In addition, by using the long cutoff value, effects due to the long-range interactions caused by dynamic movements could be included in our model. Accordingly, the cutoff value of 8Å was consistently used for all proteins analyzed in this study.
EVT profiles of two protein structures
Perturbation effects induced by a signal initiation site were estimated by calculating the EVT profiles for two protein structures: G protein-coupled receptor (GPCR) and Per/Arnt/Sim (PAS) domain. The EVT profile contains information on all possible signaling routes by a stochastic process, and the resulting effects are displayed in Figure 1 and Figure 2. In Additional file 2, EVT values for the protein structures are listed. We also gave special attention to the residues that were located far away from the signal initiation sites and simultaneously had high EVT values because they are potential allosteric regulatory sites. To clearly represent the distant but high EVT residues with colors, EVT value of each residue was multiplied by the shortest path length from the signal initiation site. The protein structures with this scaled EVT value are shown in Figure 1b and Figure 2b (also after z-score normalization), where red residues have high EVT values and are located long distances from the signal initiation site.
G protein-coupled receptor (GPCR)
GPCRs comprise a large protein family of membrane-bound receptors. Because they are often involved in transferring cellular signals and inducing signaling cascades, they have been regarded as primary drug targets. As a result, many studies have attempted to elucidate their structural features and signaling mechanisms [19, 35]. Furthermore, because the external signals (e.g., ligand binding or light absorption) that cross the plasma membrane are thought to trigger structural changes at distant regions and activate cellular signaling , these long range interactions of the amino acid network have also been studied previously .
In this study, we used the rhodopsin receptor as a representative GPCR (PDB id: 2j4y, A chain) . Its structure consists of intradiscal (corresponding to the extracellular domain of related GPCRs), membrane-embedded and cytoplasmic surface domains. The intradiscal domain is mainly composed of loops that connect seven transmembrane helices (H1-H7) of the membrane-embedded domains, and the interhelical networks of the transmembrane helices are known to play a central role in activating rhodopsin. The cytoplasmic domain also consists of many loops and a short α-helix (H8) that lies nearly perpendicular to H7 and is thought to be involved in G-protein binding . Moreover, the retinal binds covalently to Lys296 on the intradiscal portion of the transmembrane helix through a Schiff base linkage, and its light-dependent isomerization induces cellular signaling. Thus, the EVT profile for LYR296 (retinal conjugated Lys296 in H7 helix) was calculated in this case.
The EVT values (represented by different colors in Figure 1a) suggest that the residues near LYR296 tend to have higher signal transmission. For example, the residues with high EVT values were Tyr178, Tyr268, Phe103, Trp126 and Trp265 (2.71, 2.29, 2.25, 2.23, and 2.21, respectively), and most of them participate in hydrophobic packing with LYR296 near the intradiscal domain. The side chains of Tyr178, Tyr268 and Trp265 constrain the position of LYR296, and Trp126 and Trp265 were shown to reorient to more polar environments during receptor activation. Specifically, photoactivation of rhodopsin involves a change in the relative positions of H3 and H6, which contain Trp126 and Trp265, within the α-helical bundle of the receptor .
In addition, some residues in an interhelical packing region distant from LYR296 also had high EVT values. Among them, Asp83 (1.35) formed hydrogen bond interactions with Asn55 (1.32), which is known to play a central role in the hydrogen bond network between the helices. Tyr306 (1.39) is in the highly conserved Asn/Pro/X/X/Tyr motif (Asn302/Pro303/Val304/Ile305/Tyr306)  that directly interacts with another highly conserved residue, Phe313 (1.32). The two residues form a hydrophobic interaction in the amphipathic H8 helix, which implies that they are functionally and allosterically important (Figure 1b). For example, structural changes were detected at positions Tyr306, Phe313, and Cys316, which is consistent with movements of the nearby helix H6, and a light-dependent interaction was observed with Cys316 [21, 38].
Arg135 is located on the opposite side of the protein from helix H8 and had a high EVT value (1.07), even though it is far from LYR296. Arg135 belongs to the highly conserved E/DRY motif in H6 that is involved in the major structure perturbations associated with the transition to the active state . The hydrogen-bonding network that links H3 and H6 involves the conserved residue Arg135, which interacts with Glu134. Glu134 forms a salt bridge with Arg135, and this Glu134/Arg135 di-peptide likely forms a functional domain that is responsible for inducing the release of GDP . In addition, Val139 also had a relatively high EVT value (0.44), even though it is far away from LYR296 (Figure 1b). Interestingly, three consecutive Val residues (Val137, Val138, and Val139) are known to form a cytoplasmic cap on helix H3. The Val tri-peptide might stabilize the Glu134/Arg135 salt bridge, which in turn maintains the receptor in its off state in the dark; this implies that these residues are highly connected to LYR296.
Per/Arnt/Sim domain (PAS)
Phototropins are flavin-based photoreceptors that regulate phototropism (the ability of plants to bend toward sunlight). They are composed of two N-terminal light, oxygen, or voltage (LOV) domains, denoted LOV1 and LOV2, and a C-terminal serine/threonine kinase domain. LOV2 is known to be the predominant photoreceptor domain that modulates light-dependent autophosphorylation of the kinase domain. In addition, LOV2 is a typical PAS domain and binds to the flavin mononucleotide (FMN) chromophore, which absorbs light and then transmits the signal through the structure. That is, the formation of the covalent bond between Cys450 and FMN and the light-induced rearrangement of the FMN binding pocket have been proposed to modulate its enzyme activity. This signaling event is also known to propagate across the β-sheet to the hydrophobic interface formed by the core LOV2 domain, the N-terminal turn-helix-turn motif (A’ α helix), and the Jα helix, resulting in the conformational changes at the site near the Jα helix . The two helices (A’ α helix and Jα helix) pack against the surface of the β-sheet of the core LOV domain and are stabilized by amphipathic interactions and a conserved hydrogen-bonding network. Therefore, there may be some functional relationship between these helices and the hydrophobic patch on the surface of the β-sheet of the core LOV domain. Harper et al. showed that the structural change caused by light-induced rearrangement of hydrogen bonds near FMN propagates to both the N-terminal motif (A’ α helix) and C-terminal flanking regions (Jα helix). Thus, in this example, FMN was used as a signal initiation site and the cryo-trapped dark structure (2v0u) was used for constructing the protein structure network. In addition, the EVT profiles between the cryo-trapped dark structure (CDS, PDB ID: 2v0u) and light structure (CLS, PDB ID: 2v0w) were compared to observe the differences induced by light-dependent signaling.
Generally, the hydrophobic residues near FMN and the residues between the Jα helix and central β-sheets had high EVT values (Figure 2a). The result suggests that the internal hydrophobic packing region of the amphipathic Jα helix has an important effect on signal transmission from FMN to the Jα helix, which is consistent with previous studies . Among them, Phe415, Phe494, Phe434, Phe509 and His495 had high EVT values (2.02, 1.95, 1.80, 1.73, and 1.64, respectively). Phe434, Phe494 and Phe509 near the bound FMN form the ligand-binding pocket. They also provide the hydrophobic docking sites for the Jα helix (Ile510 and Val512) or A’ α helix (Phe415).
However, Phe415 and His495 are located in opposite directions compared to the above three residues. Among them, Phe415 directly interacts with the N-terminal A’α helix (Thr407-Arg410). Interestingly, Glu409 (EVT: 0.87), which is far from FMN (Figure 2b), is highly conserved in the PAS domain and stabilizes the N-terminal A’ α helix through the hydrogen-bonding network with neighboring residues (e.g., Asn432 and Arg442), which prevents further displacement of the A’ α helix . Because it is known that the interactions between the LOV2 core domain, A’ α helix and J α helix (which flanks the LOV2 core domain) play an important role in light-mediated signal propagation, the stabilizing interaction of Glu409 appears to be critical for signaling. On the other hand, His495 interacts with the middle portion of the Jα helix and forms a hydrogen-bonding network between the LOV2 domain and helix Jα (with Glu475, Thr495, Gln497 and Lys533). Interestingly, the importance of this middle part of Jα has been demonstrated by previous studies [25, 39].
The comparison of EVT profiles between the CLS and CDS shows a more intuitive view (Figure 2c). Although the overall structures of CLS and CDS are almost identical (rmsd: 0.20Å), the dark minus light difference Fourier maps constructed by Halavaty et al. showed significant structural changes in the FMN binding pocket, the middle part of the Jα helix and the N-terminal turn-helix-turn motif (A’α helix) . In addition, they also found that the displacements are caused by light-dependent disruption of the Asn414-Asp515 hydrogen bond (present in CDS) after Cys450-FMN interaction formation. Similar to the dark minus light difference Fourier maps, the ratio of the EVT values between CLS and CDS, , was used to investigate the different signaling effects of each residue (Figure 2c). The result shows that the EVT values of the Jα helix (Gln502, Lys534, Glu537, Lys545 and Glu546) and the loop (Glu412, Lys413, and Arg421) adjacent to the A’α helix were significantly altered. On the contrary, the hydrogen-bonding network of Glu409, Asp432 and Arg442 (not shown) showed little difference in EVT values, which is consistent with their unchanged positions in both CDS and CLS. Accordingly, our results showed overall consistency with the dark minus light difference Fourier maps.
Binding hot spots of the protein complex and their higher signal traffic
Protein-protein interactions have been analyzed in terms of network properties, such as hubs and clusters of amino acid residues in the protein complex, with particular focus on the protein-protein interface and binding hot spots [40–42]. This approach gives a global perspective of the interactions across the interface, which is difficult to obtain from pairwise interaction or loss of accessible surface area analyses. For example, Morra et al. tried to elucidate the mechanisms of signal propagation and determine the hot spots involved in interdomain communication pathways . In addition, Fenton discussed allosteric interactions, such as binding hot spots, that do not elicit an allosteric effect on the binding of a second ligand .
Similarly, the binding hot spots of the protein complex were examined in terms of signal transmission in a protein structure network. We hypothesized that the binding hot spots act as a signal transmitter between interacting protein domains and thus will have higher signal traffic than others. Because ligand binding to one protein domain often induces allosteric changes in the interacting domain, such as in the caspase-1 dimer, a perturbing signal should be efficiently propagated across the binding interface. Furthermore, the interface residues that have higher signal traffic may have much greater effects on protein binding.
To test this hypothesis, experimental binding data representing the binding free energy change (ΔΔG value) were collected, and the relationship between the ΔΔG value and the average EVT value (average EVT across all initiation sites; for details see Methods) was investigated by constructing a linear regression model. The results show that all linear regression models had positive correlations that ranged from 0.31 to 0.89 (Figure 3 and Additional file 3), which suggests that the nodes that have higher effects on protein binding (higher ΔΔG) tended to have high EVT values. In addition, among the 13 linear regression models, 8 models had significant p-values (F-statistic for linear regression model) below 0.1 (Additional file 3). Accordingly, the results not only validate our model but also support our hypothesis that binding hot spots have higher signal traffic between interacting domains.
Effect of distance cut-off values on predicting binding hot spots
The models based on EVT were affected by the distance cutoff values in short cutoff ranges. That is, the long distance cutoffs from 6Å to 12Å usually showed high robustness (little variation) and high correlation with the ΔΔG values, but cutoffs below 6Å showed lower correlation with dramatic performance changes (Figure 3). Accordingly, the long-range cutoff values would contain more information about dynamic movements and appear to describe the signaling events within a protein structure better than shorter ones.