The advent of high-throughput technologies allows the large-scale identification of cellular components and their interactions. This wealth of experimental data is assembled in biological networks (transcriptional regulatory, protein-protein interaction, metabolic, signaling, phosphorylation networks etc.). The question of systems biology araises: do the structural, architectural principles of biological networks can reveal functional properties of cellular systems. Analysis of biological networks has become one of the emergent topics. It was suggested that biological networks are *scale-free* with the majority of nodes having a small number of connections but with relatively fewer *hubs* possessing a high degree of connectivity [1, 2]. Starting from the work of Jeong *et al* [3], which stated that highly connected proteins are more likely to be lethal when knocked out, many works showed that hubs represent vulnerable points in the cell. Previous approaches, however, used different, rather subjective definitions of hubs. Some of them defined hubs as the top 20% of high-degree nodes [4], while other defined them as those interacting with ≥ 10 partners [5].

Beyond a purely local notion of centrality such as the node degree, more global centrality measures incorporating network-wide properties have attracted much interest. E.g. *betweenness centrality* which measures the total number of shortest pathes going through a node [6, 7]. Further centrality measures like *eccentricity, centroid* etc. are described in [8, 9]. Recently, the *pairwise diconnectivity index* of a network's element was proposed, defined as the fraction of initially connected pairs of nodes which become disconnected if the element is removed from the network [10]. The new measure quantifies the importance of an individual element in sustaining communication in the network. Hence, different measures result in different scoring of nodes in a network and lead to different hypotheses about the importance of biological entities.

Previous approaches for the analysis of biological networks were descriptive, graph-based ones. In this paper, a statistical approach is employed, originating from social network analysis field [11]. It is important to recognize that biological processes underlying the network data as well as experimental measurements are of stochastic nature: some of the interactions might be unreliable and some of the real relations might be missed or not measured. This concern leads to the employment of the statistical approach based on probability distributions. A probabilistic model is defined which tries to predict edge probabilities and captures network properties summarizing them in form of statistical parameters. By learning the model from available network data, estimates of the parameters are obtained. I.e. the empirical network is treated as an outcome of a statistical process that generated the network data. Thus, generic questions can be addressed: How the biological network emerged? Are there local biological processes that generated the network? In the statistical approach, the global network structure emerges as an agglomeration of local structural configurations, which rely on local interdependencies of edges. In fact, it is the dependency that may explain the deviation of real biological networks from random ones.

At the heart of the statistical modelling of networks is a general family of graph probability distributions called *exponential random graph models* (ERGMs) or the *p***-model* [12]. Each possible edge in a network is represented by a probabilistic variable and the network model expresses the value of each variable as a stochastic function of network structural properties. Various dependencies among the network variables are considered reflecting the interactive nature of processes from which a given network emerges. The dependencies give rise to different local structural configurations and correspond to different classes of models. The simplest dependency structure assumes no dependencies among the network variables (all edges are independent). Allowing the probabilities of edges to be equil implies the class of *Bernoulli graphs* or *Erdős-Rényi graphs* (p0-model). The structural configuration considered in this model is just a single edge. Further on, the p1-model (*dyadic independence* model) assumes the dependency between reciprocated edges connecting two nodes in a directed graph [13]. The *Markov random graph* model considers Markov dependencies, whereby two possible edges are assumed to be dependent if they share a node. (Such type of dependency resembles a Markov spatial process). The assumption leads to the consideration in the network model of three-nodal network structures like *stars* (2-path, 2-star, 3-star etc.) and *triangles*. The Hammersley-Clifford theorem [14] yields the expression of the graph probability distribution in terms of *network statistics*, being counts of local network configurations. In the formulation of the model, each statistic comes with its own parameter reflecting the importance of the corresponding configuration in the network under study. They are a kind of 'regression coefficients' which, when positive, indicate that a particular network configuration is relevant for the emergence of edges in the network. The choice of statistics present in the model embodies different assumptions about relevant local processes that might generate the network. Recently, new statistics were proposed including higher-order triangulations, *alternating k-stars, geometrically weighted degree* (GWD) [15, 16].

Exponential random graph models for biological networks were first explored in [17]. There, several networks (RegulonDB *E.coli*, ChIP-chip *S.cerevisiae* etc.) were fitted using 2-5-degree statistics, 2-star, edges and GWD, while iteratively increasing the complexity of the models. The results show that only basic statistics, namely edges and 2-star, have a positive effect, determining the network.

In this paper, we leave apart complex structural configurations and, instead, turn to a node-oriented model, namely to *p2-model*, which is a generalization of the *p1-model* [18]. The p1-model considers *dyads* (a pair of directed edges between two nodes) and represents the probability of any dyad as the function of global features of the graph (density and tendency towards reciprocity) and of individual nodal features (tendencies to send edges and to receive them). The model specification thus includes two global parameters - *density* (also called *edges*) and *reciprocity*, as well as local, node-specific parameters: *expansiveness* and *attractiveness*. The density determines the probability of an edge between a pair of nodes irrespective of individual characteristics of the nodes and captures the overall formation of edges in the network. The attractiveness and expansiveness describe the abilities of a node to attract and to produce edges beyond the overall density, hence individual nodal contributions to the network emergence. Estimates of the expansiveness and attractiveness parameters can be used to rank the nodes and to reveal the most important nodes in the network from the perspective of the statistical modelling.

Given the node-specific parameters, the p1-model implies the conditional independence between dyads. The p2-model is an extension of the p1-model in that the nodal parameters are modelled as *random effects*, a formulation that makes it possible to include into the model node- and dyad-specific covariates, as will be shown below.

Although the p1-model was defined for directed graphs, it can be modified to be applicable for undirected graphs. Then, the local, node-specific parameters are called *sociality* and reflect the propensity of an individual node to be connected to other nodes in the network. We term *social* nodes those with positive sociality parameters i.e. positively influencing the formation of edges in the network.

In this paper, we apply the undirected graph p2-model to a human protein interaction network. We assess the node-specific parameters of the model and use them to infer *social* nodes, which are important for the emergence of the network. Thus, our view of essential nodes in biological networks is based not on the descriptive measures, but, alternatively, on the parameters of the statistical model learned from network data.

An apparent advantage of the present approach is the possibility to introduce exogeneous biological knowledge into the network model and to study it in the light of structural properties of the network. This permits to investigate important biological hypotheses and to examine, what biological properties have effect on the connectivity of proteins in the network. In this paper, we introduce into the protein interaction network model the information on protein disorder and study, if this property influences the protein interactivity. Further benefit of the node-specific sociality parameters is that they provide a basis for partitioning the nodes into groups of structural similarity. One important question is: How many such structural groups are contained in the network? The connectivity pattern between the groups displays organizational properties of the network, in abstraction from the level of individual proteins. We analyze the protein interaction network to reveal its organizational principles.

It was well established in biology that proteins fold to their unique native conformations as determined by their amino acid sequences, and that each protein's function originates from a specific three-dimentional (3D) strucuture. In the last decade, however, numerous biologicaly active proteins were found that fail to maintain stable ordered 3D-structures under physiological conditions [19–22]. These proteins are called natively unfolded or *intrinsically disordered*. Bioinformatics analysis has revealed that about 25-30% of eukaryotic proteins are mostly disordered, and that more than half of eukaryotic proteins have long regions of disorder [23, 24]. It was predicted that 70% of signaling proteins and vast majority of cancer- and cardiovascular disease-associated proteins have long disordered regions [25, 26]. It was shown that disordered proteins play a number of crucial roles in regulation, signaling and control processes, many post-translational modifications (ubiquitination, methylation, phosphorylation etc.) occur within the regions of intrinsic disorder [27]. In [28–30], the authors revealed 238 Swiss-Prot functional keywords positively correlated with long disordered regions in proteins. The common view now is that disordered regions carry out a large variety of functions [31, 32].

It is postulated that the intrinsic disorder plays an important role in the interactivity of proteins [33]. The intrinsically disordered proteins can bind to large numbers of diverse targets or can facilitate other proteins binding to many targets. Haynes *et al* demonstrate that intrinsic disorder is a distinctive and common characteristic of eukaryotic hub proteins [5]. Dosztanyi *et al* [34] and Patil and Nakamura [35] investigate the structural properties of hubs that enable them to interact with multiple proteins and conclude that global flexibility and extended interaction surface provided by disordered regions play a significant role in the binding ability of hubs. In this paper, we study the effect of disorder on the connectivity of proteins.