Skip to main content

A dual controllability analysis of influenza virus-host protein-protein interaction networks for antiviral drug target discovery



Host factors of influenza virus replication are often found in key topological positions within protein-protein interaction networks. This work explores how protein states can be manipulated through controllability analysis: the determination of the minimum manipulation needed to drive the cell system to any desired state. Here, we complete a two-part controllability analysis of two protein networks: a host network representing the healthy cell state and an influenza A virus-host network representing the infected cell state. In this context, controllability analyses aim to identify key regulating host factors of the infected cell’s progression. This knowledge can be utilized in further biological analysis to understand disease dynamics and isolate proteins for study as drug target candidates.


Both topological and controllability analyses provide evidence of wide-reaching network effects stemming from the addition of viral-host protein interactions. Virus interacting and driver host proteins are significant both topologically and in controllability, therefore playing important roles in cell behavior during infection. Functional analysis finds overlap of results with previous siRNA studies of host factors involved in influenza replication, NF-kB pathway and infection relevance, and roles as interferon regulating genes. 24 proteins are identified as holding regulatory roles specific to the infected cell by measures of topology, controllability, and functional role. These proteins are recommended for further study as potential antiviral drug targets.


Seasonal outbreaks of influenza A virus are a major cause of illness and death around the world each year with a constant threat of pandemic infection. This research aims to increase the efficiency of antiviral drug target discovery using existing protein-protein interaction data and network analysis methods. These results are beneficial to future studies of influenza virus, both experimental and computational, and provide evidence that the combination of topology and controllability analyses may be valuable for future efforts in drug target discovery.


The development of computational methods to identify key host factors that allow viruses to interrupt and control healthy cell functions will greatly aid in the prediction of novel anti-viral drug targets [1]. Traditional systems biology approaches to understanding cell dynamics during infection include the creation of detailed kinetic models for intercellular signaling pathways. While these models are advantageous in understanding the disease state in a quantitative way, they require experimentally-derived or estimated parameters and training data [2,3,4], without which complications can arise and an accurate model can quickly become unattainable. Further, modeling studies are often limited to specific pathways which fails to consider the total cellular environment as an interdependent system.

Alternatively, network analysis methods applied to protein-protein interaction (PPI) data have been used to model cell-wide systemic changes associated with disease, changes in cell function, or cell fate [5]. This strategy provides a holistic understanding of system behavior by viewing proteins as interdependent states, regardless of specific interaction mechanisms, and allows for the exploration of cell level relationships. The field of network theory is well established. Several basic network metrics like degree (the number of interactions a protein is involved in) and betweenness (the importance of a protein to information flow through a network, or, how much of a bottleneck a protein is to system behavior) [6] are commonly used to describe the significance of network components in a wide range of applications [7,8,9]. These analyses have repeatedly revealed the importance of specific proteins within biological processes that cannot be found from traditional modeling approaches [10,11,12,13,14]. Disease networks have identified genes involved with cancer [15,16,17,18], demonstrated that the genes responsible for similar diseases are likely to interact with each other [19, 20], and predicted novel drug targets [21, 22].

There is precedent for network studies of many common viruses including hepatitis C [23, 24], severe acute respiratory syndrome (SARS) [19, 25], Human immunodeficiency virus (HIV) [25,26,27,28,29], and influenza virus [19, 30,31,32,33]. Past work studying the effects of influenza virus in PPI networks has focused on identifying host factors involved in virus replication and improving the prediction of drug targets but ends with an analysis of basic topological measurements. While this provides a general overview of the state of the network, it is a static snapshot of the cell and, therefore, fails to capture the dynamic nature of the cell. Therefore, the next logical step in analyzing biological networks lies in understanding how these dynamic systems can be manipulated and exploited to manage biological properties.

In classic control theory, controllability is the idea that a deterministic system can be driven to any final state in finite time given an external input [34]. This is commonly applied to linear, time invariant dynamic systems,

$$ \frac{dx(t)}{dt}= Ax(t)+ Bu(t) $$

where A is an NxN matrix of state coefficients that describes how N molecule states, x(t), interact within the system and B is a matrix of input weights describing how external influences, u(t), impact the system. In general, a system is controllable if the controllability matrix,

$$ C=\left[B, AB,{A}^2B,\dots, {A}^{N-1}B\right] $$

is full rank, N. This means that the system can be manipulated to reach any desired combination of states within all of state space following the defined input, B. In total, a controllability analysis identifies the key components of a system that must be manipulated to drive desired system outcomes [35].

An example PPI network in Fig. 1a is transformed into its state space matrix representation. With the inclusion of two independent inputs (u1 and u2), the controllability matrix is full rank. Therefore, the system is fully controllable and it is possible to drive the protein concentrations to any desired state. Applying the idea of controllability to a cell at the onset of viral infection, a virus aims to control cellular functions (the system of proteins), promote virus replication tasks, and reach a final infected cell state. While it would be advantageous to interpret the infection from this control perspective, mathematical limits due to large system dimensions prevent the direct application of traditional controllability methods to PPI networks.

Fig. 1
figure 1

a An example protein-protein interaction network with three proteins and two protein translation process inputs. The state space representation of the same network demonstrates that the change in state of a protein’s concentration is a function of its current state and an input process. A classic controllability analysis demonstrates that this system is fully controllable and could, therefore, be driven to any possible state change in every protein. b Example application of robust controllability, which determines the robustness of the network after the removal of a protein. c Example application of global controllability which assesses the importance of a protein to all methods of network control

Advances in network theory have created alternative methods of network controllability evaluation which survey each node’s (protein’s) importance in the ability of an external set of inputs to fully control the network. Controllability classification is founded in “driver node” calculations: identifying the network components which must be manipulated for the system to be fully controlled (analogous to determining the non-zero elements of the B matrix in classic controllability). Without manipulation, driver nodes will remain unaffected by changes to the rest of the system, rendering the total system uncontrollable. Driver nodes are identified using the Hopcroft-Karp algorithm [36] which can be applied to any directed graph in bipartite form. This method calculates the maximum matching of the graph, or, the largest set of network paths where no node is shared by two edges. Because each node can only influence one of its interactors, the identification of these paths dictates the way in which control can propagate through the network. The nodes that are not included in these paths or at the start of these paths are not receiving control from a neighboring node and, therefore, require “driving”. A set of driver nodes (size ND) that is capable of controlling the total network is called a minimum input set (MIS). The MIS is not unique and the number of possible MISs scales exponentially with the size of the network [37]. After a primary MIS is calculated, two methods of controllability node classification can be used.

In robust controllability (by Liu et al. [38], pictured in Fig. 1b), the MIS is re-calculated (size ND′) after removing each node from the network. The node is then classified by its effect on the manipulation required to control the network, where an increase in the size of the MIS makes it more difficult to control the network and a decrease in the size of the MIS makes it easier to control the network. The removal of: an indispensable node increases the number of driver nodes (ND′ > ND), a dispensable node decreases the number of driver nodes (ND′ < ND), and a neutral node has no effect on the number of driver nodes (ND′ = ND). This method has previously been applied to many network types such as gene regulatory networks, food webs, citation networks, and PPI networks to better understand what drives the dynamics of each system [29, 38]. While it is useful to observe the structural changes to the network after the removal of singular nodes, this method only considers one possible MIS. A second global controllability method by Jia et al. [39] (Pictured in Fig. 1c) classifies a node by its role across all possible MISs. A critical node is included in all possible MISs, an intermittent node is included in some possible MISs, and a redundant node is not included in any possible MISs. This method places each node in the broader context of all possible control configurations.

In total, this study aims to determine key host factors with regulatory roles specific to the influenza virus-infected cell state for the prediction of novel antiviral targets. We have completed a two-part controllability analysis of a host PPI network (HIN) and a hybrid network of human host PPI data combined with influenza A virus-host protein interaction data (VIN). The controllability characteristics of influenza virus interacting host proteins and driver proteins are compared to the characteristics of the total network. A set of 24 host factors that hold value topologically, in controllability, and functionally are identified as candidates for further study in drug development based on their specialized behavior during influenza infection.


Topology of the host interaction network and virus integrated network

The directed PPI network from Vinayagam et al. [40] was restricted to confident interactions (see Methods for network construction details), creating a network containing 6281 proteins and 31,079 interactions. This network is referred to as the “Host Interaction Network” (HIN). Influenza A virus-host interactions from Watanabe et al. [41] were narrowed to 2592 directed interactions between 11 influenza A virus (IAV) proteins (HA, M1, M2, NA, NP, NS1, NS2, PA, PB1, PB2, and PB1-F2 proteins) and 752 “IAV interacting proteins” preexisting in the HIN. After integration into the HIN, the network contains 6292 proteins and 33,671 interactions. This network is referred to as the “Virus Integrated Network” (VIN).

Degree and betweenness calculations were completed for the HIN and VIN. As expected, the only proteins with altered degree after the addition of virus interactions to the network are the 752 IAV interacting proteins (Marked in blue in Fig. 2a). This shift is significant for the group of IAV interacting proteins as compared to all proteins in both the VIN (log scaled median of IAV interacting proteins: 1.04; log scaled median of all proteins: 0.70; student t-test of log scaled data p < 2.20 × 10− 16) and the HIN (log scaled median of IAV interacting proteins: 0.85; log scaled median of all proteins: 0.70; Student t-test of log scaled data p: 5.97 × 10− 12). The degree distributions of both networks are scale free (Additional file 1: Figure S1).

Fig. 2
figure 2

a Degree of the VIN vs degree of the HIN where the IAV interacting proteins are marked in blue. The degree distributions of the networks are scale free. b Difference in betweenness between the VIN and HIN for proteins which exhibit a difference greater than one

Because betweenness is sensitive to the information flow through all proteins instead of only neighboring proteins, 2735 proteins exhibit an increase in betweenness after the addition of IAV interactions. Of these proteins, 207 proteins’ log betweenness exhibits an increase of 2 or more in the VIN compared to the HIN (Fig. 2b). This suggests that the addition of IAV interactions has an effect on network topology that reaches over 3.5 times the number of host proteins that are directly interacting with IAV proteins. The betweenness shift in the group of IAV interacting host proteins is significant as compared to all proteins in both the VIN (Log scaled median of IAV interacting proteins: 3.23; Log scaled median of all proteins: 2.82; Student t-test of log scaled data p < 2.20 × 10− 16) and the HIN (Log scaled median of IAV interacting proteins 3.22; Log scaled median of all proteins: 2.82; Student t-test of log scaled data p: 2.13 × 10− 15). This is a result of being the limited protein set responsible for information flow from the viral proteins to the rest of the network.

Driver proteins

Driver proteins (nodes) are the foundation of both types of controllability calculations, representing the protein set which must be manipulated for the system to be fully controlled. The proteins are identified through maximum matching algorithms [36]. The HIN and VIN both require ND = 2463 driver proteins to achieve controllability, suggesting that the magnitude of network control is unchanged by the influence of the IAV interactions. However, the identity of driver proteins shifts slightly as the 11 viral proteins replace 11 host proteins within the primary MIS as drivers in the VIN. Table 1 lists the identities of the 11 host proteins along with the shortest distance to an IAV protein in the network, degree, and betweenness. Of these 11 proteins, only five are directly interacting with IAV proteins. One of the remaining proteins is two steps (two interactions and one connecting protein) from any IAV protein, and the remaining five proteins are three steps from any IAV protein. The number of paths between viral proteins and these proteins are reflective of the number of paths between viral proteins and all host proteins (Fisher test p: 0.99). This supports the idea that viral interactions have lasting effects on the system’s control structure, affecting proteins that are multiple paths away.

Table 1 Identities of the proteins that are drivers in the HIN but not the VIN with the shortest number of paths to an Influenza A viral protein. Degree and betweenness of the proteins of the VIN is provided (with the values from the HIN in parenthesis). Only 45% of these proteins are directly interacting with the viral proteins, demonstrating the cascade effect caused by the inclusion of viral interactions

Lastly, analysis finds that 8.9% of all driver proteins are also IAV interacting proteins, meaning the intersection of the two protein groups of interest comprise only 3.5% of the total network. There is a significant increase in the betweenness of driver proteins depending on their status as IAV interacting or IAV non-interacting proteins (Fisher test p < 2.2 × 10− 16) where there is no significant difference in degree of the same groups (Fisher test p: 0.7161). This is further evidence that the addition of virus interactions to the network magnifies information flow through the proteins most involved in controlling network behavior.

Robust controllability

Robust controllability was calculated (see Methods) for all proteins of the HIN and VIN (as shown in Table 2 with and without parentheses, respectively). The addition of IAV interactions to the network has no effect on the distribution of classifications of host proteins, and consequently, the IAV Interacting proteins. Upon entry to the VIN, the 11 IAV proteins are classified as neutral, meaning that removing these proteins does not alter the number of driver proteins required to control the VIN (ND = ND’). This reveals that the removal of singular proteins from the system is not enough to disturb the existing control structure under robust controllability.

Table 2 Robust controllability types of all proteins, driver proteins, and virus interacting proteins in the VIN (HIN in parenthesis)

While none of the proteins change robust classification between networks, the aforementioned replacement of 11 host driver proteins with viral proteins after the addition of virus interactions creates a small change in robust type distribution for driver proteins. Of the displaced host proteins (deemed “robust proteins”, found in Table 1), seven are neutral and four are dispensable in the HIN, meaning that their removal from the network does not change the number of driver proteins and reduces the number of driver proteins needed, respectively. All IAV proteins are classified as dispensable in the VIN. Of the five robust proteins that are both driver and IAV interacting proteins, four are neutral and one is dispensable. The most notable change in degree and betweenness between the HIN and VIN is PRMT5, with an increase of 9 and 2250, respectively. Overall, robust controllability results suggest that the HIN is stable against potential changes in the control structure that could be caused by the addition of IAV interactions.

We developed an analysis to test if IAV is selectively targeting host proteins based on controllability characteristics. 10,000 random sets of 752 proteins (the number of IAV interacting proteins) were pulled from the host proteins of the VIN. Their robust type distributions were plotted against the classification results of IAV interacting proteins, driver proteins, and all proteins in the VIN (Fig. 3a-c). The randomly sampled sets closely resemble all proteins of the network, not the true interacting protein set, suggesting that robust controllability behavior of interacting proteins is not a coincidence of network construction (one-sided p = 0.51, 0.49, and 0.50 for indispensable, neutral, and dispensable, respectively). IAV interacting proteins tend to be indispensable compared to the percentage of all proteins that are indispensable (Fig. 3a). This suggests that viruses prefer to interact with proteins that are vital to cellular control. Driver proteins are very likely to be dispensable proteins compared to the percent of all proteins that are dispensable (Fig. 3c). Further, the mean and median log degree and betweenness of the randomly sampled protein sets is significantly lower than the same measurements of the true IAV interacting set (p < 2.2 × 10− 16, 2.2 × 10− 16, Fig. 4), signifying that virus interacting proteins are in positions of network significance. Overall, the robust controllability results of IAV interacting proteins suggest that the virus may be selectively targeting host proteins based on controllability characteristics.

Fig. 3
figure 3

a-c Density plots of distribution of robust controllability type for 10,000 random pulls of 752 proteins (number of virus interacting proteins in network). d-f Density plots of distribution of global controllability type for 10,000 random pulls of 752 proteins (number of virus interacting proteins in network). Values for IAV interacting proteins (blue), driver proteins (green), and all proteins (gold) are pictured for all figures

Fig. 4
figure 4

Density plots of a) mean (blue) and median (green) log degree of random IAV interacting protein sets and b) mean (blue) and median (green) log betweenness of random IAV interacting protein. Values for the true IAV interaction set shown as vertical lines, evidence that host proteins that directly interact with viral proteins are in positions of network significance

Global controllability

Global controllability was calculated (see Methods) for all proteins of the HIN and VIN (as shown in Table 3 with and without parentheses, respectively). Unlike in robust controllability, there is a small disturbance to global type distributions of host proteins after the addition of virus interactions. 24 host proteins shift from being classified as critical (a member of all MISs) to intermittent (a member of some MISs) proteins. Identities of these proteins (deemed “global proteins”) can be found in Table 4 along with the shortest distance to an IAV protein in the network and protein degree and betweenness. The two most notable changes in degree and betweenness between the HIN and VIN are EPH receptor A2 (EPHA2) with an increase of 1 and 93, respectively, and transferrin receptor (TFRC), with an increase of 3 and 164, respectively. All 24 global proteins are driver and IAV interacting proteins which, as mentioned, only comprises 3.5% of the total network. There are only two proteins (EPHA2 and HNRNPA0) that are also members of the robust protein set. 45% of IAV interacting proteins are never drivers, suggesting that they are always manipulated by neighboring host proteins within any possible control configuration. IAV interacting proteins are not enriched for driver proteins (Fisher test p: 0.14).

Table 3 Global types of all proteins, driver proteins, and virus interacting proteins in the VIN (HIN in parenthesis)
Table 4 Identities of global Proteins (proteins that shift global classification between the HIN and VIN). All identified proteins are directly interacting with viral proteins. Degree and betweenness of the proteins of the VIN is provided (with the values from the HIN in parenthesis)

Again, a randomized protein set was created to test if IAV may be selectively interacting with host proteins based on their controllability characteristics. 10,000 random sets of 752 proteins (the number of IAV interacting proteins) were sampled from the host proteins of the VIN. Their global type distributions were plotted against the classification results of IAV interacting proteins, driver proteins, and all proteins in the VIN (Fig. 3d-f). As with the robust classification, the random sets closely resemble the total network (one-sided p = 0.50, 0.51, and 0.50 for critical, intermittent, and redundant, respectively). While there are no redundant driver proteins by definition, driver proteins are more likely to be intermittent proteins than critical proteins (Fig. 3d-e), where more than 75% of all driver proteins are missing from at least one possible MIS. This means the majority of possible driver proteins are able to be controlled by a neighboring protein in at least one MIS. IAV interacting proteins tend to be redundant compared to the total number of proteins that are redundant (Fig. 3f). This suggests that viruses prefer to interact with proteins that are part of existing control structures to receive input from neighboring proteins.

Overall, global calculations identify a set of proteins for consideration that are more important within the VIN than the HIN. This is demonstrated through a comparison of degree and betweenness for the identified robust and global driver sets in Fig. 5. Proteins identified in the robust analysis show little deviation in both degree (Fig. 5a) and betweenness (Fig. 5b) measures after the addition of virus-host interactions to the network. In contrast, proteins identified in the global analysis show much larger deviations in degree (Fig. 5a) and betweenness (Fig. 5b) with all proteins having a betweenness of 0 in the HIN with an up to two log unit increase in the VIN (Table 4). Because the identified proteins were not responsible for information flow until the addition of virus-host interactions to the network, this suggests that the global protein set may identify key regulators of host immune response to infection.

Fig. 5
figure 5

a) Degree and b) betweenness of robust (blue) and global (green) protein sets between the HIN and VIN. While proteins identified in the robust controllability analysis do not show significant deviation in degree or betweenness, proteins identified in the global controllability analysis show a shift in both measures after the addition of viral interactions

Validation of controllability significant host factors

All proteins were checked against 6 siRNA screens for host factors involved in influenza replication (Brass et al. [42], Hao et al. [43], Karlas et al. [44], König et al. [45], Shapira et al. [46], and Watanabe et al. [41]), grouped by both robust and global controllability classifications. Less than 5% of all classifications of both types are validated by any of the 6 screens (Fig. 6), suggesting that no controllability classification is more enriched for host factors than another. This is likely due to the low agreement observed across siRNA studies [47]. However, the driver proteins that change robust and global classification have higher hit rates in siRNA screens, with 2 of 11 changing MIS proteins (SF3B4, SRPK2, 18% validation) and 5 of 24 global-identified proteins (OSMR, PPA1, PSMA5, POLE4, GDI2, 21% validation), though neither are statistically significant results (Fisher p-values of 0.685 and 0.252, respectively).

Fig. 6
figure 6

Percent of each a) robust classification type and b) global classification type confirmed in 6 siRNA screens (Brass, Karlas, Shapira, Hao, Konig, Watanabe). None of the 6 possible classifications are more than 5% validated in the screenings, suggesting that experimental findings do not favor certain protein controllability types

An analysis of both protein sets of interest was performed using Ingenuity Pathway Analysis (IPA) [48]. The network created for the robust protein set identified cellular compromise, cell death, and cell cycle functions. The network created for the global protein set identified protein synthesis functions, all centered around NF-kB. The global network notably recognizes six proteins (EPHA2, FBL, PFKM, PSMA5, SSR1, and TFRC) for their involvement in the infection of cells (p: 9.58 × 10− 4). Four proteins in the robust network (CELF1, SF384, SRPK2, and HNRNPA0, the last of which appears in both protein sets) were identified for their involvement in mRNA processing (p-value: 3.33 × 10− 6).

Lastly, Interferome v2.01 [49] was used to determine if the 11 robust proteins and 24 global proteins are interferon regulated genes (IRGs). All 11 robust proteins are identified as IRGs and exhibit a 2-fold change in expression when treated with interferon in at least one experimental dataset. 20 of 24 global proteins are identified as IRGs and exhibit a 2-fold change in expression in at least one experimental dataset. 6 global proteins are identified in more than 10 studies. In particular, HNRNPA0 and PPA1 are significantly down regulated in 20 and 63 datasets, respectively. These results point toward the involvement of the predicted protein subsets in immune response events.


A network representation of the cellular environment demonstrates that the effects of infection (represented by the addition of virus-host interactions) cascade through the system, demonstrated by the alteration of basic topology measures. The betweenness shift between the two networks, particularly in IAV interacting proteins, supplies evidence that the topological effect of viral infection is wide reaching (Tables 1 and 4). Further, a comparison of driver protein betweenness for those that are also IAV interacting proteins in comparison to those that are not shows a significant difference. Driver proteins that are IAV interacting are not receiving control influence from viral proteins (dictated by the maximum matching requirement that each protein only control a single protein) and require additional external influence to achieve network control. However, the increased betweenness of proteins that are both driver and IAV interacting proteins suggests that this group is still of great importance to information flow through the network. This is one example where differences in network topology measures can emphasize the importance of select proteins that are overlooked by controllability principles.

Controllability analyses confirm that IAV interacting proteins are in positions of significance for both types of classification. The increased population of indispensable IAV interacting proteins (robust controllability: ND′ > ND, Fig. 3a) compared to what would be expected by random chance suggests that it would be more difficult for an outside influence (such as viral infection) to control the network after removing the IAV interacting proteins opposed to a randomly selected protein. This is logical as IAV interacting proteins act as the connection between viral proteins and the host network where control is initiated. The increased population of redundant IAV interacting proteins (global controllability: never a driver protein, Fig. 3f) when compared to the random expectation indicates that more IAV interacting proteins are always being manipulated internally than would be expected by chance. This means that they are fully incorporated into the control structure of the VIN. From these two results, one can conclude that IAV interacting proteins contribute to both the “gate” (the ease of entering the system) and the “heart” (the proteins responsible for propagating control through the system) of the network control structure during infection. These findings support the idea that viruses are likely to interact with proteins which offer an advantage to total network control.

Similarly, both sets of controllability results demonstrate that driver proteins play interesting roles in the network control structure. The large population of dispensable driver proteins (robust controllability: ND′ < ND, Table 2) signifies that the majority of driver proteins are making it more difficult to control the network by requiring more external inputs to control system behavior. In their absence, the number of driver proteins would decrease and it would theoretically be easier for a viral attack to compromise the network control structure. As such, a possible strategy for drug development could be to protect these proteins from repression effects during infection. Over 75% of driver proteins are classified as intermittent (global controllability: sometimes a driver protein, Table 3), meaning there is at least one MIS where these driver proteins are not drivers, and receive control influence through internal propagation. This lends itself to the idea of viral escape routes: under pressure, virus proteins could utilize alternative pathways to maintain system control and reach the goal of hijacking cellular function.

The method of controllability implementation used identifies protein sets of interest through changes to classification between the HIN and VIN. Unfortunately, robust classification methods do not detect a change between the two networks in this study. As it is a measure of the robustness of the network to structural changes in the absence of each protein, this suggests that the HIN upholds its typical control structure during IAV infection. This could be a consequence of the interaction data used or it may be that the strategy applied here cannot distinguish between the behavior of healthy and diseased states. Knowing the extent of changes to cell behavior within immune response pathways [50,51,52], apoptosis signaling [53, 54], and transcriptional processes [55,56,57] during infection, the IAV infected cell can be interpreted as a different system. The failure to see this distinction may be a shortcoming of the robust controllability calculation, especially knowing that the 11 robust proteins are not unique due to the method’s use of a single MIS. Overall, the robust analysis should be applied to additional virus-host networks in the fashion described within this study to further evaluate the method.

The 24 proteins identified by the global controllability analysis show promise as indicators of regulatory roles specific to the infected state. All global proteins are IAV interacting and driver proteins, a high distinction which demonstrates a significant importance to network information flow marked by significantly higher betweenness in the VIN than even driver proteins that are not IAV interacting. Additionally, all global proteins have no importance to network flow in the HIN (betweenness = 0) (Table 4), suggesting their role in network structure “turns on” after the onset of infection. It is noteworthy that PRDX1 has been implicated in respiratory syncytial virus (RSV) [58], a lower respiratory tract infection that is often associated with influenza virus [59]. Though the number of global proteins identified in existing siRNA screening data is not statistically significant, it should be noted that siRNA screens cover only the partial genome. As such, this type of analysis could be used to direct future experimental studies to save time, money, and effort. IPA analysis reveals that some of the identified proteins hold roles in mRNA processing, an integral part of the influenza virus’ ability to spread through processing its own RNA using host machinery [60]. The global protein network is centered around NF-kB, which is implicated in host immunity with evidence that the virus directly inhibits NF-kB activity [61, 62]. The interferon regulating roles of proteins in a high number of both identified sets (all 11 changing MIS proteins and 20 of 24 global-identified proteins) speak to their responsibility in controlling infection. PPA1 appears as downregulated in 63 studies and HNRNPA0 appears as downregulated in 20 studies when treated with interferon compared to a control, solidifying their involvement in the host immune response. In total, this evidence suggests that controllability analyses hold power as predictors for important regulators of the host response to influenza infection and, therefore, hold power for drug target prediction.

Existing influenza virus studies using PPI networks require additional data such as differentially expressed gene information [63] or protein context [30] to construct host response networks. Alternative methods such as DeltaNet [64, 65] and ProTINA [66] utilize gene transcription profiles to infer protein drug targets, but rely on the accurate deduction of gene regulatory networks. More recent PPI studies have used network growing functions such as GeneMANIA, STRING, and IPA [67] to predict IAV host factors and studied infected cell systems through the integration of screening data with network methods [33, 68]. Approaches incorporating time course data into network analysis have also been explored [69]. While these methods (which include basic network metrics such as degree and betweenness of PPI networks) have been successful at identifying disease host factors and in drug target development in the existing body of work, this dual controllability study offers a novel, in-depth analysis of the role of individual proteins in the context of total system function and how possible changes to the system can be interpreted.

Lastly, though this study has used experimental data from Influenza A studies, this analysis can be used to improve the prediction of drug targets for any pathogen-host interaction given available protein interaction data because of the generality of the method. The limits of these methods lie in limited availability of large-scale, dependable databases of protein-protein interactions. Foundational maximum matching algorithms for the calculation of driver proteins must be performed with directed networks. While larger directed networks than the network from Vinayagam et al. [40] are available [70], the network used here contains only experimentally derived data opposed to computationally predicted interactions, assuring biological confidence in the results within this study. A robust controllability analysis of the computationally predicted network presented in Uhart et al. [70] finds that 29% of proteins are categorized as indispensable where approximately 20% of proteins in the Vinayagam network are classified as the same, though there is 89% overlap in directed edges between the two networks. This suggests that methods for predicting protein interactions may over represent these key proteins within the analysis, even in combination with experimental results. However, larger networks will move towards a more complete analysis of infected cell behavior and possibly reveal further proteins of interest. Therefore, the future of this field depends on continued establishment of large, confident, directed PPI networks.


In total, this two-part network controllability analysis for a host protein-protein interaction network (HIN) and an integrated influenza virus-host protein-protein interaction network (VIN) aims to enhance the prediction of antiviral drug targets for influenza A virus. While robust controllability methods have previously been applied to study PPI networks [29], past analysis focuses only on the classification of virus interacting proteins and does not evaluate before and after the addition of virus-host interactions to the network. A global controllability analysis has never been applied to PPI networks. The unique construction of the VIN includes experimentally-derived virus-host interaction data [41] which represents opportunities for the virus to manipulate host intracellular machinery using protein-protein interactions. Here, analysis of the transition between the healthy and infected network states and further investigation of virus interacting and driver proteins has identified 24 proteins as regulatory markers of the infected state. This protein set is noted for its characteristics in topology, controllability, and functional roles within the infected cell: results that are summarized in Table 5. Our workflow observes both the effect of structural changes to the network in the case of potential protein knock outs, as well as each protein’s role in all MISs, representing all possible ways of controlling the system. In combination, network approach and results provide deeper understanding of how changes to cell behavior at the onset of infection are able to occur through the work of a small set of viral proteins. Through understanding the system in this way, we present the possibility to “outsmart” viral attack by dismantling the control structure which allows the viral infection to take hold.

Table 5 Summary of results for proteins identified in the global controllability analysis


Protein-protein interaction network

The host protein-protein interaction network (from Vinayagam et al. [40]) is the combination of interactions identified in two or more repetitions of Y2H screens within the study and known, experimentally identified interactions from literature where interactions had been given direction using a naïve Bayesian predictor. After retrieving the network, a confidence level cutoff of 0.7 was used based on the correlation between confidence scores and biological relevance reported in Yu et al. [71]. This network is the HIN. Influenza A virus-host interactions detected by Co-IP RNAi assay in Watanabe et al. [41] were narrowed to interactions which contained host proteins already found within the HIN to avoid skewing degree and betweenness network metrics. All virus-host interactions are directed viral to host protein. These interactions were directly integrated into the host network, creating the VIN. All analysis was completed in R 3.4.3 using the igraph package.

Robust controllability classification

Calculations for robust classification were adopted from Liu et al. [38]. For a network of n nodes, a set of driver nodes for the bipartite representation of the network, ND, is found using a maximum matching algorithm such as Hopcroft-Karp [36]. Each node of the network is iteratively removed (N = N − 1) and maximum matching, ND, is reevaluated. Nodes are classified as indispensable (ND′ > ND), neutral (ND′ = ND), or dispensable (ND′ < ND).

Global controllability classification

Calculations for global classification were adopted from Jia et al. [39]. For a network of n nodes, a set of driver nodes for the bipartite representation of the network, ND, is found using a maximum matching algorithm such as Hopcroft-Karp [36]. For all ND, control adjacent nodes were identified iteratively and an input graph was created as dictated in Zhang et al. [72]. The input graph was used to classify nodes as critical (in all minimum input sets), neutral (in some minimum input sets), or redundant (in no minimum input sets).

Availability of data and materials

The datasets generated and/or analyzed during the current study are available at DOI: [40] and DOI: [41].



Host Interaction Network


Human immunodeficiency virus


Influenza A virus


Ingenuity Pathway Analysis


Interferon regulated gene


Minimum input set


Protein-protein interaction


Severe acute respiratory syndrome


Virus Integrated Network


  1. Rask-Andersen M, Almén MS, Schiöth HB. Trends in the exploitation of novel drug targets. Nat Rev Drug Discov. 2011.

  2. Klipp E, Liebermeister W. Mathematical modeling of intracellular signaling pathways. BMC Neurosci. 2006;7.

  3. Schoeberl B, Eichler-Jonsson C, Gilles ED, Muüller G. Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat Biotechnol. 2002;20(4):370–5.

    Article  Google Scholar 

  4. Aldridge BB, Burke JM, Lauffenburger DA, Sorger PK. Physicochemical modelling of cell signalling pathways. Nat Cell Biol. 2006;8:1195–203.

    Article  CAS  Google Scholar 

  5. Cho D-Y, Kim Y-A, Przytycka TM. Chapter 5: network biology approach to complex diseases. PLoS Comput Biol [Internet]. 2012;8(12):e1002820. Available from:

    Article  CAS  Google Scholar 

  6. Freeman LC. A set of measures of centrality based on Betweenness. Sociometry. 1977;40(1):35–41.

    Article  Google Scholar 

  7. Borgatti SP. Centrality and network flow. Soc Networks. 2005.

  8. Everett MG, Borgatti SP. The centrality of groups and classes. J Math Sociol. 1999.

  9. del Sol A, Fujihashi H, O’Meara P. Topology of small-world networks of protein-protein complex structures. Bioinformatics [Internet]. 2005 Apr 15 [cited 2019 Mar 12];21(8):1311–1315. Available from:

    Article  CAS  Google Scholar 

  10. Zhu M, Gao L, Li X, Liu Z, Xu C, Yan Y, et al. The analysis of the drug-targets based on the topological properties in the human protein-protein interaction network. J Drug Target. 2009;17(7):524–32.

    Article  CAS  Google Scholar 

  11. Vinayagam A, Zirin J, Roesel C, Hu Y, Yilmazel B, Samsonova AA, et al. Integrating protein-protein interaction networks with phenotypes reveals signs of interactions. Nat Methods. 2014;11(1):94–9.

    Article  CAS  Google Scholar 

  12. He X, Zhang J. Why do hubs tend to be essential in protein networks? PLoS Genet. 2006.

  13. Lopes TJS, Shoemaker JE, Matsuoka Y, Kawaoka Y, Kitano H. Identifying problematic drugs based on the characteristics of their targets. Front Pharmacol. 2015;6.

  14. Barabasi A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56–68.

    Article  CAS  Google Scholar 

  15. Jonsson PF, Bates PA. Global topological features of cancer proteins in the human interactome. Bioinformatics. 2006;22(18):2291–7.

    Article  CAS  Google Scholar 

  16. Hase T, Tanaka H, Suzuki Y, Nakagawa S, Kitano H. Structure of protein interaction networks and their implications on drug design. PLoS Comput Biol. 2009;5(10).

    Article  Google Scholar 

  17. Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, et al. A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol Syst Biol. 2008.

  18. Mine KL, Shulzhenko N, Yambartsev A, Rochman M, Sanson GFO, Lando M, et al. Gene network reconstruction reveals cell cycle and antiviral genes as major drivers of cervical cancer. Nat Commun. 2013.

  19. Mitchell HD, Eisfeld AJ, Sims AC, McDermott JE, Matzke MM, Webb-Robertson BJM, et al. A network integration approach to predict conserved regulators related to pathogenicity of influenza and SARS-CoV respiratory viruses. PLoS One. 2013.

  20. Gandhi TKB, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat Genet. 2006.

  21. Arrell DK, Terzic A. Network systems biology for drug discovery. Clin Pharmacol Ther. 2010;88:120–5.

    Article  CAS  Google Scholar 

  22. Pujol A, Mosca R, Farrés J, Aloy P. Unveiling the role of network and systems biology in drug discovery. Trends Pharmacol Sci. 2010;31:115–23.

    Article  CAS  Google Scholar 

  23. Germain M-A, Chatel-Chaix L, Gagné B, Bonneil É, Thibault P, Pradezynski F, et al. Elucidating novel hepatitis C virus-host interactions using combined mass spectrometry and functional genomics approaches. Mol Cell Proteomics. 2014 Jan;13(1):184–203.

    Article  CAS  Google Scholar 

  24. de Chassey B, Navratil V, Tafforeau L, Hiet MS, Aublin-Gex A, Agaugué S, et al. Hepatitis C virus infection protein network. Mol Syst Biol [Internet] 2008;4(230):1–12. Available from:

  25. Moni MA, Liò P. Network-based analysis of comorbidities risk during an infection: SARS and HIV case studies. BMC Bioinformatics. 2014.

  26. Murali TM, Dyer MD, Badger D, Tyler BM, Katze MG. Network-based prediction and analysis of HIV dependency factors. PLoS Comput Biol. 2011.

  27. Ptak RG, Fu W, Sanders-Beer BE, Dickerson JE, Pinney JW, Robertson DL, et al. Short Communication: cataloguing the HIV type 1 human protein interaction network. AIDS Res Hum Retrovir. 2008.

  28. Shityakov S, Dandekar T, Förster C. Gene expression profiles and protein-protein interaction network analysis in AIDS patients with HIV-associated encephalitis and dementia. HIV/AIDS - Res Palliat Care. 2015.

  29. Vinayagam A, Gibson TE, Lee H-J, Yilmazel B, Roesel C, Hu Y, et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc Natl Acad Sci [Internet]. 2016;113(18):4976–4981. Available from:

    Article  CAS  Google Scholar 

  30. Schaefer MH, Lopes TJS, Mah N, Shoemaker JE, Matsuoka Y, Fontaine J-F, et al. Adding protein context to the human protein-protein interaction network to reveal meaningful interactions. PLoS Comput Biol. 2013 Jan;9(1):e1002860.

    Article  CAS  Google Scholar 

  31. Shoemaker JE, Fukuyama S, Eisfeld AJ, Muramoto Y, Watanabe S, Watanabe T, et al. Integrated network analysis reveals a novel role for the cell cycle in 2009 pandemic influenza virus-induced inflammation in macaque lungs. BMC Syst Biol. 2012 Jan;6(1):117.

    Article  Google Scholar 

  32. Korth MJ, Tchitchek N, Benecke AG, Katze MG. Systems approaches to influenza-virus host interactions and the pathogenesis of highly virulent and pandemic viruses. Semin Immunol. 2012 Dec:1–12.

  33. Tripathi S, Pohl MO, Zhou Y, Rodriguez-Frandsen A, Wang G, Stein DA, et al. Meta- and orthogonal integration of influenza “oMICs” data defines a role for UBR4 in virus budding. Cell Host Microbe. 2015;18(6):723–35.

    Article  CAS  Google Scholar 

  34. Lin CT. Structural controllability. IEEE Trans Automat Contr. 1974;19(3):201–8.

    Article  Google Scholar 

  35. Wuchty S. Controllability in protein interaction networks. Proc Natl Acad Sci. 2014.

  36. Hopcroft JE, Karp RM. An $n^{5/2} $ algorithm for maximum matchings in bipartite graphs. SIAM J Comput. 1973.

  37. Jia T, Barabási AL. Control capacity and a random sampling method in exploring controllability of complex networks. Sci Rep. 2013;3.

  38. Liu YY, Slotine JJ, Barabási AL. Controllability of complex networks. Nature. 2011;473(7346):167–73.

    Article  CAS  Google Scholar 

  39. Jia T, Liu Y-Y, Csóka E, Pósfai M, Slotine J-J, Barabási A-L. Emergence of bimodality in controlling complex networks. Nat Commun. 2013.

  40. Vinayagam A, Stelzl U, Foulle R, Plassmann S, Zenkner M, Timm J, et al. A directed protein interaction network for investigating intracellular signal transduction. Sci Signal. 2011.

  41. Watanabe T, Kawakami E, Shoemaker JE, Lopes TJS, Matsuoka Y, Tomita Y, et al. Influenza virus-host interactome screen as a platform for antiviral drug development. Cell Host Microbe. 2014;16(6):795–805.

    Article  CAS  Google Scholar 

  42. Brass AL, Huang IC, Benita Y, John SP, Krishnan MN, Feeley EM, et al. The IFITM proteins mediate cellular resistance to influenza a H1N1 virus, West Nile virus, and dengue virus. Cell. 2009 Dec;139(7):1243–54.

    Article  Google Scholar 

  43. Hao L, Sakurai A, Watanabe T, Sorensen E, Nidom CA, Newton MA, et al. Drosophila RNAi screen identifies host genes important for influenza virus replication. Nature. 2008 Aug;454(7206):890–3.

    Article  CAS  Google Scholar 

  44. Karlas A, Machuy N, Shin Y, Pleissner K-P, Artarini A, Heuer D, et al. Genome-wide RNAi screen identifies human host factors crucial for influenza virus replication. Nature. 2010 Feb;463(7282):818–22.

    Article  CAS  Google Scholar 

  45. König R, Stertz S, Zhou Y, Inoue A, Hoffmann H-H, Bhattacharyya S, et al. Human host factors required for influenza virus replication. Nature. 2010 Feb;463(7282):813–7.

    Article  Google Scholar 

  46. Shapira SD, Gat-Viks I, Shum BO, Dricot A, de Grace MM, Wu L, et al. A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell. 2009 Dec;139(7):1255–67.

    Article  Google Scholar 

  47. Hao L, He Q, Wang Z, Craven M. Newton M a, Ahlquist P. limited agreement of independent RNAi screens for virus-required host genes owes more to false-negative than false-positive factors. PLoS Comput Biol. 2013 Jan;9(9):e1003235.

    Article  CAS  Google Scholar 

  48. Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523–30.

    Article  Google Scholar 

  49. Samarajiwa SA, Forster S, Auchettl K, Hertzog PJ. INTERFEROME: the database of interferon regulated genes. Nucleic Acids Res. 2009.

  50. Koyama S, Ishii KJ, Coban C, Akira S. Innate immune response to viral infection. Cytokine [Internet]. 2008 [cited 2018 Nov 10];43(3):336–341. Available from:

  51. Thompson MR, Kaminski JJ, Kurt-Jones EA, Fitzgerald KA. Pattern recognition receptors and the innate immune response to viral infection. Viruses. 2011.

  52. Iwasaki A, Medzhitov R. Toll-like receptor control of the adaptive immune responses. Nat Immunol. 2004.

  53. Barber GN. Host defense, viruses and apoptosis. Cell Death Differ. 2001.

  54. Thomson BJ. Viruses and apoptosis. Int J Exp Pathol. 2001.

  55. Gale M Jr, Tan S-L, Katze MG. Translational control of viral gene expression in eukaryotes. Microbiol Mol Biol Rev. 2000.

  56. Sonenberg N, Hinnebusch AG. Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell. 2009.

  57. Walsh D, Mathews MB, Mohr I. Tinkering with translation: protein synthesis in virus-infected cells. Cold Spring Harb Perspect Biol. 2013.

  58. Pavia AT. Viral infections of the lower respiratory tract: old viruses, new viruses, and the role of diagnosis. Clin Infect Dis. 2011.

  59. Jamaluddin M, Wiktorowicz JE, Soman KV, Boldogh I, Forbus JD, Spratt H, et al. Role of Peroxiredoxin 1 and Peroxiredoxin 4 in protection of respiratory syncytial virus-induced Cysteinyl oxidation of nuclear cytoskeletal proteins. J Virol. 2010.

  60. Dubois J, Terrier O, Rosa-Calatrava M. Influenza viruses and mRNA splicing: doing more with less. mBio. 2014.

  61. Kumar N, Xin Z-T, Liang Y, Ly H, Liang Y. NF-kappaB signaling differentially regulates influenza virus RNA synthesis. J Virol [Internet]. 2008;82(20):9880–9 Available from:

    Article  CAS  Google Scholar 

  62. Ludwig S, Planz O. Influenza viruses and the NF-κB signaling pathway - Towards a novel concept of antiviral therapy. Biol Chem. 2008;389:1307–12.

  63. Shoemaker JE, Fukuyama S, Eisfeld AJ, Muramoto Y, Watanabe S, Watanabe T, et al. Integrated network analysis reveals a novel role for the cell cycle in 2009 pandemic influenza virus-induced inflammation in macaque lungs. BMC Syst Biol. 2012;6.

    Article  Google Scholar 

  64. Noh H, Gunawan R. Inferring gene targets of drugs and chemical compounds from gene expression profiles. Bioinformatics. 2016.

  65. Noh H, Ziyi H, Gunawan R. Inferring causal gene targets from time course expression data. IFAC-PapersOnLine. 2016.

  66. Noh H, Shoemaker JE, Gunawan R. Network perturbation analysis of gene transcriptional profiles reveals protein targets and mechanism of action of drugs and influenza a viral infection. Nucleic Acids Res [Internet]. 2018; Available from:

  67. Taye B, Vaz C, Tanavde V, Kuznetsov VA, Eisenhaber F, Sugrue RJ, et al. Benchmarking selected computational gene network growing tools in context of virus-host interactions. Sci Rep [Internet]. 2017;7(1):5805. Available from:

  68. Heaton NS, Moshkina N, Fenouil R, Gardner TJ, Aguirre S, Shah PS, et al. Targeting viral Proteostasis limits influenza virus, HIV, and dengue virus infection. Immunity. 2016;44(1):46–58.

    Article  CAS  Google Scholar 

  69. Jain S, Arrais J, Venkatachari NJ, Ayyavoo V, Bar-Joseph Z. Reconstructing the temporal progression of HIV-1 immune response pathways. Bioinformatics. 2016.

  70. Uhart M, Flores G, Bustos DM. Controllability of protein-protein interaction phosphorylation-based networks: participation of the hub 14-3-3 protein family. Sci Rep. 2016.

  71. Yu J, Finley RL. Combining multiple positive training sets to generate confidence scores for protein-protein interactions. Bioinformatics. 2009.

  72. Zhang X, Lv T, Pu Y. Input graph: the hidden geometry in controlling complex networks. Sci Rep. 2016.

Download references


Thank you to the Department of Chemical and Petroleum Engineering at the University of Pittsburgh for funding this research. Thank you to the Systems Biology Institute, Tokyo, for expertise and computational training.


Funding for this study comes from the University of Pittsburgh’s Central Research Development Fund (Shoemaker) and the Howard Hughes Medical Institute’s James H. Gilliam Fellowships for Advanced Study program (Ackerman). Funding bodies played no role in study design, data analysis, or manuscript development.

Author information

Authors and Affiliations



EEA assisted in conceptualization of the study, designed the study, performed all computational experiments, and wrote the manuscript. JES conceptualized and funded the study. JFA assisted in conceptualization of the study and advised on relevant virology and immunology. TH provided computational training. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Jason E. Shoemaker.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Figure S1. Degree distribution of network with IAV interactions (blue solid) and without IAV interactions (dotted black) show that both networks demonstrate scale free topology (PDF 17 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ackerman, E., Alcorn, J., Hase, T. et al. A dual controllability analysis of influenza virus-host protein-protein interaction networks for antiviral drug target discovery. BMC Bioinformatics 20, 297 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: