 Proceedings
 Open access
 Published:
Global multiple proteinprotein interaction network alignment by combining pairwise network alignments
BMC Bioinformatics volume 16, Article number: S11 (2015)
Abstract
Background
A wealth of protein interaction data has become available in recent years, creating an urgent need for powerful analysis techniques. In this context, the problem of finding biologically meaningful correspondences between different proteinprotein interaction networks (PPIN) is of particular interest. The PPIN of a species can be compared with that of other species through the process of PPIN alignment. Such an alignment can provide insight into basic problems like species evolution and network component function determination, as well as translational problems such as target identification and elucidation of mechanisms of disease spread. Furthermore, multiple PPINs can be aligned simultaneously, expanding the analytical implications of the result. While there are several pairwise network alignment algorithms, few methods are capable of multiple network alignment.
Results
We propose SMAL, a MNA algorithm based on the philosophy of scaffoldbased alignment. SMAL is capable of converting results from any global pairwise alignment algorithms into a MNA in linear time. Using this method, we have built multiple network alignments based on combining pairwise alignments from a number of publicly available (pairwise) network aligners. We tested SMAL using PPINs of eight species derived from the IntAct repository and employed a number of measures to evaluate performance. Additionally, as part of our experimental investigations, we compared the effectiveness of SMAL while aligning up to eight input PPINs, and examined the effect of scaffold network choice on the alignments.
Conclusions
A key advantage of SMAL lies in its ability to create MNAs through the use of pairwise network aligners for which native MNA implementations do not exist. Experiments indicate that the performance of SMAL was comparable to that of the native MNA implementation of established methods such as IsoRankN and SMETANA. However, in terms of computational time, SMAL was significantly faster. SMAL was also able to retain many important characteristics of the native pairwise alignments, such as the number of aligned nodes and edges, as well as the functional and homologene similarity of aligned nodes. The speed, flexibility and the ability to retain prior correspondences as new networks are aligned, makes SMAL a compelling choice for alignment of multiple large networks.
Introduction
With the advent of highthroughput experimental techniques such as yeast twohybrid screening [1–3] and coimmunoprecipitation coupled mass spectrometry [4, 5] there has been a substantial increase in the data available on proteinprotein interactions (PPIs). The experimental data is supplemented by computationally predicted PPIs [6–9]. Put together, a vast amount of PPI data is now accessible through multiple databases [10–13]. Comparative network analysis of PPINs complements traditional sequence and structure basedmethods, providing insights into species evolution [14], conserved functional components [15, 16], protein function prediction [17, 18]. In addition to their role in elucidating a mechanistic understanding of the fundamental biological processes from the molecular to the evolutionary scales [19], PPIdata can also be invaluable in translational contexts, for instance, by explaining mechanisms of infection spread [13, 20–23] and through discovery of novel targets, such as dependency factors [24].
The complexity of proteinprotein interactions coupled with the volume and noisy nature of PPI data, underline the acute need for automated analysis of PPIs. For computational analysis, the standard way of representing PPI data is through a proteinprotein interaction network (PPIN), which is a (possibly disconnected) graph G = (V, E), where each node represents a protein and each edge denotes an experimentally or computationally determined interaction between the corresponding two proteins. Depending on the detection/prediction method, the edge weights may be binary or realvalued. An important problem in PPIN analysis, much like with traditional sequencebased genomics, is the establishment of correspondences between proteins and interactions across different species. This can be accomplished through PPI network alignment, where, by incorporating network topology, notions of protein similarity and other related data, members of one PPIN are matched with their closest analogues in another PPIN.
In the following, for simplicity, we introduce the basic notions and notations related to network alignment using the pairwise network alignment formulation; the extension of these concepts to the multiple network alignment setting is facile. Formally, given two PPI networks, G_{ 1 } = (V_{ 1 }, E_{ 1 }) and G_{ 2 } = (V_{ 2 }, E_{ 2 }), where, {\vartheta}_{1}\subseteq {V}_{1} and {\vartheta}_{2}\subseteq {V}_{2}, solving the alignment problem requires finding a correspondence C:{\vartheta}_{1}\to {\vartheta}_{2}. Intuitively, the objective of any such mapping is to establish correspondences between similar proteins (nodes) and similar intermolecular interactions across the networks. The problem of PPIN alignment was initially tackled as a local alignment problem (that is, the setting considered was with {\vartheta}_{1}\subset {V}_{1} and {\vartheta}_{2}\subset {V}_{2}), where subnetworks with similar topology and/or sequence similarity were identified within the networks being aligned. Later methods have tried to solve the global alignment problem, that is, aligning two PPINs in their entirety ({\vartheta}_{1}={V}_{1} and {\vartheta}_{2}={V}_{2}). Both the local and the global alignment problems are known to be NPhard [25, 26], and remain active areas of research. Another perspective takes into account the number of networks that need to be aligned, leading to two problem settings: pairwise network alignment (PNA), involving alignment of two networks at a time and multiple network alignment (MNA), where three or more PPINs have to be aligned to each other. In Additional File 1 (Overview of PPIN alignment algorithms), we classify and summarize the existing methods based on the Cartesian product of the aforementioned formulations and tabulate the results. As can be seen from this table, at the state of the art, the number of pairwise aligners significantly exceeds the number of multiple network alignment algorithms. Furthermore, there are few global multiple network aligners and those that are available tend to rapidly degrade in performance as the number of networks being aligned increases.
The research presented in this papers seeks to address the aforementioned lacunae through the design of a global multiple network aligner called SMAL (ScaffoldBased Multiple Network Aligner, pronounced small), which is based on the notion of combining pairwise alignments using a starlike alignment topology with a central "scaffold" PPIN. SMAL allows the use of pairwise network aligners without native MNA implementations (like Pinalog [27] and NETAL [28] for instance), to create MNAs. The staralignment heuristic, used in SMAL, as is well known, has been applied to other NPhard problems in bioinformatics including multiple sequence alignment and more recently for aligning RNAseq data [29]. The key features and contributions of SMAL include:

Generality: the staralignmentlike methodology proposed by us can be employed to convert results from any number of global pairwise alignments into a single multiple network alignment. Furthermore, the proposed approach does not restrict the specific pairwise aligner that a biologist may seek to employ.

Alignment Persistence: as networks are added to an already obtained MNA, previously identified alignments are retained.

Measure consistency: For pairwise alignments, a number of statistics have been proposed to quantify the alignment quality. As a corollary to alignment persistence, in the MNAs obtained with the proposed method, the statistics characterizing any constituent pairwise alignment do not change in the multiple alignment.

Invariance to alignment order: It is desirable that a MNA be invariant to the order in which the individual networks are considered. The proposed approach guarantees this property.

Conceptual simplicity: The multiple network alignments obtained with the proposed method can be related to pairwise alignment in conceptually straightforward manners, reducing thereby the cognitive load required for data interpretation by a domain specialist.

Low complexity: The proposed approach has lineartime complexity with respect to the number of networks being aligned. Consequently, as the number of networks that need to be aligned increases, the proposed approach, when compared to competitive methods, yields considerable advantages in terms of time required to obtain a MNA.

Alignment quality: SMAL allows creation of MNAs based on any existing pairwise alignment algorithm. In many cases, this leads to MNAs yielding better results on a given set of measures compared to alignments created by existing native MNA algorithms.
As part of the investigations presented in this paper, we demonstrate the multiple network alignments obtained with the proposed approach by utilizing prior (pairwise) alignments from SMETANA [30], IsoRankN [31], PINALOG [27] and NETAL [28] as inputs. The four methods selected by us are well known or recent and have publicly available implementations. We compare the MNA obtained using our method with those produced by the native multiple network alignment implementations present as part of some of these algorithms.
Past Work
The problem of PPIN alignment has received significant recent attention. The first PPIN network aligners were primarily designed to identify closely matching subnetworks, rather than solve the global PPIN alignment problem. In and of itself, this is a very challenging problem, as matching two graphs by determining the largest common subgraph is known to be NPhard [25]. Early algorithms, such as PathBLAST [16] and NetworkBLAST [32], used BLAST based search methodology. PathBLAST searched for highscoring pathway alignments involving linear chains of linked proteins. Proteins in a linear chain from the first input network were paired with their putative homologs in a linear chain in the second input network. Similarity was determined by sequence similarity as determined by BLAST. NetworkBLAST further expanded on this approach by including dense clusters of protein in the search for matching subgraphs. These were followed by MaWISH [33], which adopted an evolutionary model that extended the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment, and evaluated similarity between graph structures through a scoring function that accounted for evolutionary events. By contrast, in [34] a statistical model was used to compare the link pattern of each node in the PPIN. Nodes were aligned only if both the sequence and the link pattern were sufficiently similar. The match and split algorithm in [35], is notable for being one of the first to have provable criteria for correctness and efficiency in the context of network alignment. The method Phunkee [36] used the surrounding context of each subgraph within the adjacent network in conjunction with subgraph topology and BLAST data to obtain alignments. Finally, one of the most recent entries into the field is AlignNemo [37], which combined data from PPIN topology and protein homology to iteratively grow local alignments from a seed.
While a local network alignment algorithm seeks to find a set of homologous regions within the two PPINs, a global network alignment seeks to find the best overall alignment between them. That is, a global network alignment algorithm must define a single mapping across all parts of the input. These two problems are, in some sense, analogous to global and local sequence alignment; much like local sequence alignment is used to find conserved functional motifs, local network alignment can be used to find conserved functional components in PPINs (such as pathways, protein complexes etc.) Global sequence alignments, on the other hand, are used to compare whole genomes to understand variations between species; similarly, global PPIN alignment algorithms can be used to compare interactomes across species. However, the global network alignment problem has been shown to be NPhard [26].
While, some of the above local network alignment methods can and have been expanded to produce global alignment, one of the earliest methods to address the global network alignment problem was the eigendecompositionbased method IsoRank [18]. IsoRank conducts its analysis in two steps: it first constructs an eigenvalue problem using PPIN and protein sequence data and solves it to produce a vector R, which contains the similarity scores for all protein pairs between the two input networks. In the second step, IsoRank extracts from R highscoring, pairwise, mutually consistent matches and constructs the alignment. Other notable global network alignment algorithms include Graemlin 2.0 [38], which is a hillclimb algorithm that can be trained on a data set to optimize its scoring function, and a relatively large number of algorithms utilizing greedy heuristics, such as PISwap [39], GRAAL [40], MIGRAAL [14] and variants [41, 42]. This problem has also been formulated as a relaxation of a cost function by PATH and GA [43]. In both of these algorithms, the global network alignment problem is expressed as a balance between matching similar protein pairs and having many conserved interactions. The resulting cost function is optimized through two relaxations, one concave and one convex, over doubly stochastic matrices by PATH; and through permutation in the direction of the gradient starting from an initial solution by GA. Finally, one of the most recent efforts, SPINAL [26], is a polynomial time heuristic algorithm that constructs a global alignment in two stages. First, SPINAL constructs pairwise similarity scores though local pairwise neighborhood matching. It then iteratively grows a locally improved solution set to produce the final onetoone mapping. In both stages SPINAL takes advantage of neighborhood bipartite graphs and the contributors as a common primitive.
More complex than the formulations described above, is the problem of multiple network alignment (MNA), where more than two PPIN network have to be aligned. The computational complexity of MNA grows exponentially as the number of networks increases. MNA algorithms remain relatively rare. Of the few that exist, prominent ones include IsoRankN [31], which is based on spectral clustering on the induced graph of pairwise alignment scores, Submap [44], which utilizes subnetwork mapping followed by vertex selection strategy to extract the mappings from a maximum weight independent set (MWIS), and SMETANA [30], which uses a combination of probabilistic similarity measures to score the nodes and a greedy approach to construct the final alignment.
Data
In the experiments presented in this paper, we use PPINs from eight different species. These are listed in the following along with the abbreviations we use to refer to them: Arabidopsis thaliana (Arabi), Caenorhabditis elegans (Celeg), Drosophila melanogaster (Droso), Escherichia coli (Ecoli), Homo sapiens (Human), Mus musculus (Mouse), Rattus norvegicus (Rat), and Saccharomyces cerevisiae (Yeast). The PPINs and corresponding BLAST bit scores are identical to those reported in PINALOG [29], compiled from IntAct [45]. We note that BLAST bit scores were used only for pairs of proteins with a BLAST Evalue < 10^{5}.
Methods
The proposed approach begins by determining which of the participating networks can be used as an alignment scaffold (denoted hereafter simply as scaffold or center PPIN)  the network relative to which the entire multiple network alignment is subsequently constructed. The remaining networks are aligned in a pairwise manner with the scaffold PPIN using a pairwise alignment algorithm of choice. In the final step, the pairwise alignments are related to each other. Conceptually, the proposed method is related to the general methodology of starbased methods employed in multiple sequence alignment.
Definitions and notations
Let G_{ 1 } ... G_{ n } denote n proteinprotein interaction networks, where G_{ i }= (V_{ i }, E_{ i }). A global multiple network alignment of n graphs can be expressed as a mapping, Ψ: G^{n} → G, that projects the original graphs onto a structure called the alignment graph A′ = (V′, E′), such that a cost function for the mapping is optimized. The vertices in the alignment graph represent sets of aligned proteins and its edges correspond to conserved interactions. In the following, variables superscripted with a prime will refer to alignment graphs and unprimed variables will represent elements (graphs, edges and vertices) of specific PPINs. Given a vertex v′ ∈ V′ in the alignment graph, the vertex alignment cluster of v′, denoted C(v′) is the set of all nodes mapped to it. Formally, for an alignment involving a set of m networks \mathcal{N}=\left\{{G}_{i},{G}_{j},\dots {G}_{m}\right\}, the notion of a vertex alignment cluster is formally defined as:
That is, for a node in the alignment graph, its vertex alignment cluster consists of a set of proteins from the networks being mapped to it. It follows that, all nodes mapped to a specific node in an alignment graph may be considered to be aligned to each other. Similarly, given a node v ∈ V from any of the original networks, we define the vertex coalignment cluster of v as the set of all nodes aligned to node v in a multiple network alignment and denote it as \mathfrak{B}\left(v\right). A vertex coalignment cluster can be accessed using any of its nodes as a key (e.g. \left\{a,b\right\}=\mathfrak{B}\left(a\right)=\mathfrak{B}\left(b\right)). A vertex coalignment cluster \mathfrak{B}\left(v\right) of a node v will at minimum always contain v itself. The reader may note that the notion of vertex coalignment clusters is defined on vertices of PPINs while its dual notion of vertex alignment clusters is defined for vertices of the alignment graph.
The notions of alignment cluster and coalignment cluster can be extended to edges leading to edge alignment clusters and edge coalignment clusters (we omit the formal definitions as they are analogous to the ones for vertices). Edges in the alignment graph are induced by the vertex alignment and represent conserved interactions. For a pairwise alignment, for example, a given edge (u, v) in a network G_{ i } is said to be conserved in another network G_{ j } if there is an interaction (s, t) ∈ E_{ j } such that s\in \mathfrak{B}\left(u\right) and t\in \mathfrak{B}\left(v\right). For the edge (u, v) ∈ E_{ i } , its edge coalignment cluster {\mathfrak{E}}_{ij}\left(u,v\right), can be computed as in Eq. (2):
In Eq. (2), j can denote the index of any of the networks included in the alignment including the network that contains the interaction (u, v). In multiple network alignment involving n PPINs, generally only very few nodes have correspondences across all n species and consequently few edges are conserved across all the n species. To model this situation, we use the parameter k to consider sets of edges at different levels of conservation. That is, we specifically refer to the set of edges conserved in k species when evaluating the alignments.
A given interaction (u, v) ∈ E_{ i } is conserved in k ≤ n species, when there are k1 distinct species, such that there exist pairs of nodes (s, t) ∈ E_{ j } such that s\in \mathfrak{B}\left(u\right), t\in \mathfrak{B}\left(v\right), with the variable j indexing these species.
Overview of SMAL
The proposed method comprises four major steps: (1) Selection of a network as the scaffold for MNA, (2) Computing pairwise alignments between the scaffold and all other networks, (3) Combining pairwise node alignments with respect to the scaffold, and (4) Computing conserved edges.
Selection of the scaffolding network as the center of the starbased MNA
Since selecting an appropriate scaffold has significant influence on the quality of the MNA, the intuition would be to use a network as the center of the star which is most complete, well annotated and evolutionary most similar to the rest (Figure 1). This can be determined based on characteristics such as the maximum number of nodes or edges or the highest count of significant pairwise protein similarities between the networks (e.g. established by BLAST bit scores). By contrast, in certain cases, the specific biological question motivating the MNA, or a researcher's domain knowledge, might dictate which PPIN needs to be chosen as the scaffold.
The proposed algorithm for selecting the scaffold can be described as follows: first, a measure of similarity S_{ ij } defined for a pair of networks is selected. We then pick as the scaffold that specific PPIN for which the sum of S_{ ij } is maximized over all pairs of networks. That is, the network G_{ s } is chosen as the scaffold, if:
In Eq. (3) s is the index of the identified scaffold PPIN. Similarity between a pair of networks can be directly computed, using for example a measure like the Graphlet Degree Distribution agreement [45]. Alternatively, a pairwise alignment can be constructed and a measure of the alignment quality can be used. Such measures derived from pairwise alignments are described in some detail and further investigated in the "Results" section.
Pairwise alignments
Given a pairwise network alignment algorithm of choice, the n1 pairwise alignments between the center and the remaining networks G_{ sj } can be computed independently. That is, computation of one alignment has no influence on the results of another alignment. As we will show next, due to this property, the order of alignments in our approach can be arbitrary. Factors that may influence the choice of the alignment algorithm include: characteristics of the obtained alignments such as whether they map proteins in a onetoone or manytomany manner, optimization criteria such as maximizing the number of aligned proteins, maximizing conserved interactions or maximizing the size of connected components, computational efficiency, and ease of use. For more details on the characteristics of different pairwise alignment algorithms and implementations, we refer the reader to [47].
Combining pairwise node alignments to form the MNA node mappings
From this point onwards, we refer to the coalignment cluster of a node v ∈ V_{ s } in the pairwise alignment between networks G_{ i } and G_{ j } as {\mathfrak{B}}_{ij}\left(v\right). Let G_{ s } = (V_{ s }, E_{ s }) denote the scaffold network. Given the terminology introduced above, for each node v ∈ V_{ s }, \mathfrak{B}\left(v\right) denotes its vertex coalignment cluster. It is constructed as the union of all coalignment clusters from the pairwise alignments between the networks {G}_{j}\in \mathcal{N} and the scaffold G_{ s }.
The node alignment obtained with the proposed method can be described as a set of sets containing the alignments for all vertices (proteins) in the scaffold PPIN:
Due to the commutative and associative nature of the union operation over multiple sets, the order in which aligned proteins from the pairwise network alignments are combined can be arbitrary. While the resulting node alignment V^{*}is clearly dependent on the choice of the scaffolding network, the order in which pairwise alignments are themselves computed, or the order in which they are combined, does not matter.
We distinguish two types of pairwise alignments: onetoone and manytomany. Methods of the first type aim to find a single correspondence for a given node while methods of the second type can create clusters containing multiple nodes from each of the species that are all related to one another and thus account for phenomena like geneduplication. The aforementioned distinction, which might inform the choice of the pairwise network alignment algorithm, is preserved in SMAL. If \mathfrak{B}\left(v\right) contains at most one node from PPIN G_{ j } for any node v ∈ V_{ s }, as would be the case for a onetoone alignment algorithm, the resulting alignment cluster \mathfrak{B}\left(v\right)\in {V}^{*} generated by Eq. (4) will also contain at most one node from each of the aligned species. In this case, each node, including those from the scaffold, will be present in at most one alignment cluster. On the other hand, when multiple nodes of a given species are aligned to a given node v ∈ V_{ s } in \mathfrak{B}\left(v\right), Eq. (4) ensures that same multiple node alignment is also present in V^{*}. Further, if multiple nodes from the scaffold are aligned to one another, this leads to node duplication, vide infra.
The combination of aligned nodes, as described above, induces a relationship, which we term as weak correspondence transitivity. As an explanation, consider two networks G_{ a } and G_{ b } being aligned to a scaffold G_{ s }. Further, let node a ∈ V_{ a } and b ∈ V_{ b } correspond to the node u ∈ V_{ s } based on their respective pairwise alignments. Then \mathfrak{B}\left(a\right)=\left\{u,a\right\}, \mathfrak{B}\left(b\right)=\left\{u,b\right\}, and \mathfrak{B}\left(u\right)=\left\{u,a,b\right\}. Such a grouping implies a putative correspondence between nodes a and b. However, not all of these putative alignments may be found in a multiple network alignment. This is either due to noise in the data or because strict transitivity of the correspondences does not hold. We present results of our studies of this effect in detail in the "Results" section.
Computing conserved edges
For each edge (u, v) in the scaffold G_{ s } of a MNA, the set of associated conserved edges is given by its edge coalignment cluster defined by Eq. (2). The following equation can be formulated alternatively as shown in Eq. (5), or implemented directly.
That is, the conserved edges relative to a given edge in the scaffold PPIN in the MNA can be directly computed from the node alignment set V^{*} defined in Eq. (5). Analogous to the node alignment, the set of induced edges as derived by the proposed method then can be described as:
As with the node alignment, the conserved edges will depend on the choice of the center PPIN but will otherwise be independent from the order in which networks are aligned pairwise or combined in our starbased approach.
Differences to established MNA algorithms
In network alignments in general, a given vertex from any of the original networks is either dropped (not aligned to any other node) or included in the alignment graph V' exactly once.
Since SMAL maps alignment clusters from pairwise alignments onto a central PPIN, proteins can be duplicated. To elucidate, let's assume a scenario where the scaffold PPIN G_{ s } is aligned relative to two networks G_{ a } and G_{ b }. Consider the following two alignment clusters from pairwise alignments for given nodes u, v, w ∈ V_{ s }, a ∈ V_{ a } and b ∈ V_{ b }:
This will result in the following three alignment clusters in a starbased MNA as proposed here:
On the other hand, since the alignment graph of SMAL V^{*} contains only alignment clusters for the nodes of the center PPIN, some correspondences established by native multiple network alignments are not considered. Let there be nodes a ∈ V_{ a }, b ∈ V_{ b } that correspond when aligning G_{ s }, G_{ a } and G_{ b } with a native multiple network alignment algorithm but neither corresponds to any vertex in the scaffolding PPIN. That is, there exists an alignment cluster \mathfrak{B}\left(a\right)=\left\{a,b,X\right\}=\mathfrak{B}\left(b\right), where X is a set of nodes that are not part of the center PPIN or the empty set. Such correspondences would not be included by SMAL. Expanding SMAL to such correspondences could be achieved by considering all pairwise alignments (as opposed to only alignments between a center PPIN and the remaining networks) and merging resulting alignment clusters with V^{*}.
Implementation and complexity
Pseudocode 1: Method outline
1 Designate scaffold PPIN G_{ s }
# Obtain pairwise alignments with the scaffold PPIN using a method of choice.
2 For all remaining networks G_{ j }:
3 G_{ sj } ← pairwise_alignment(G_{ s }, G_{ j })
# Create node alignment
4 Initialize V^{*} = Ø
5 For each node of G_{ s }, v ∈ V_{ s }:
6 Initialize \mathfrak{B}\left(v\right)=\left\{v\right\}
7 For each pairwise alignment G_{ s }, G_{ j }:
8 \mathfrak{B}\left(v\right)\leftarrow \mathfrak{B}\left(v\right)\cup {\mathfrak{B}}_{sj}\left(v\right)
9 {V}^{*}\leftarrow {V}^{*}\cdot \mathfrak{B}\left(v\right) # concatenate sets
# Compute induced edges
10 Initialize E^{*} = Ø
11 For each edge of G_{ s }, (u, v) ∈ E_{ s }:
12 Initialize \mathfrak{E}\left(u,v\right)=\left\{\left(u,v\right)\right\}
13 For each pair \left(k,l\right)\in \mathfrak{B}\left(u\right)\times \mathfrak{B}\left(v\right):
14 if (k, l) form an edge, e.g. $t : (k, l) ∈ E_{ t }:
15 \mathfrak{E}\left(u,v\right)\leftarrow \mathfrak{E}\left(u,v\right)\cup \left(k,l\right)
16 {E}^{*}\leftarrow {E}^{*}\cdot \mathfrak{E}\left(u,v\right) # concatenate sets
In the pseudocode, selection of the scaffold is summarized in line 1. Different approaches of varying complexities have been mentioned and will be evaluated in the "Results" section. In terms of computational complexity, scaffold selection based on domain expertise does not incur a computational cost. A simple heuristic like the number of associated BLAST bit scores above a certain EValue for a given PPIN is also extremely fast (O\left(n\right), where n is the number of networks). Selection based on a similarity measure between all pairs of networks has complexity O\left({n}^{2}\times O\left(\varphi \right)\right), where O\left(\varphi \right) is the complexity of the applied similarity measure. The approach using measures over pairwise alignments outlined in the Methods section can be further broken down to O\left({n}^{2}\times \left(O\left(\phi \right)+O\left(\mu \right)\right)\right), where O\left(\phi \right) is the complexity of the pairwise alignment algorithm and O\left(\mu \right) the complexity of the measure over the alignment. For our nodebased measures, O\left(\mu \right)=O\left(\varrho \left{V}_{s}\right\right), where \varrho =max\left(\left\mathfrak{B}\left(v\right)\right\right); v ∈ V_{ s }, the maximum number of nodes in an alignment cluster in V^{*}. The actual size of \varrho depends on the alignment algorithm. For onetoone alignment algorithms, we know that \varrho \le \mathsf{\text{n}}. For manytomany algorithms, no nontrivial boundary can be established.
Once a scaffolding PPIN is selected, \left(n1\right) pairwise alignments are computed (lines 2 and 3). This step has complexity O\left(n\times O\left(\phi \right)\right) though no computation might be necessary if pairwise alignments have already been created during the scaffoldselection process.
Creation of the node alignments (lines 4 to 9) has complexity O\left(n\left{V}_{s}\right\right). The alignment clusters \mathfrak{B}\left(v\right) are sets of distinct nodes that get extended in each iteration of line 8. V^{*} consists of a list of such sets of elemental nodes. The structure is implemented as a dictionary of sets where each key is a node v ∈ V_{ s } and the corresponding value represents \mathfrak{B}\left(v\right).
The last step (lines 10 to 16) is not specific to our approach and most of the established alignment algorithms just omit it. It can be applied to any kind of node alignment. We include it in our algorithm since providing insights into conserved interactions is essential for many of the research questions that motivate MNAs in the first place. It also provides insights into the quality of the alignment via various measures as described later. Complexity of this last step has an upper bound of O\left({\varrho}^{2}\left{E}_{s}\right\right).
Overall, the complexity of SMAL with selection of the center PPIN via measures over pairwise alignments is O\left({n}^{2}\times \left(O\left(\phi \right)+O\left(\mu \right)\right)+n\left{V}_{s}\right+{\varrho}^{2}\left{E}_{s}\right\right). When the center is selected manually based on domain knowledge or any other accessible proxy as outlined above, complexity is reduced to O\left(n\times O\left(\phi \right)+n\left{V}_{s}\right+{\varrho}^{2}\left{E}_{s}\right\right). By far the most expensive step is computation of the pairwise alignments, that is O\left(\phi \right)\gg O\left(\left{V}_{s}\right\right) and O\left(\phi \right)\gg O\left(\left{E}_{s}\right\right).
Comparison between SMAL and native MNAs
To compare a native MNA generated by a given MNA algorithm to a SMAL MNA, where the pairwise alignments have been generated by the same algorithm, we first have to relate the native MNA to our chosen scaffold PPIN. This can be achieved by only retaining those node clusters that contain a protein from the designated scaffold PPIN and by duplicating clusters containing more then one scaffolding node (see pseudocode 2 in Additional file 2).
Measures for assessment
Since there is no single gold standard for evaluating biological network alignments, we use a number of different measures in our analysis. In addition to evaluating the overall quality of the alignments, we investigate the extent to which correspondences implied by combining pairwise alignments are valid biologically. For this purpose, we define two types of measures: Measures designated with the subscript s, which only evaluate correspondences with the scaffold. In other words, for each node v ∈ V_{ s } only the pairs v, u : u\in \mathfrak{B}\left(v\right) and for each edge e ∈ E_{ s } only the pairs e, f : f\in \mathfrak{E}\left(e\right) are taken into account in these measures. Thus, these measures represent a baseline as to how the pairwise alignments perform on the given data. The measures without subscript on the other hand evaluate all correspondences, that is for all \mathfrak{B}\left(v\right)\in {V}^{*}, consider all pairs s, t\in \mathfrak{B}\left(v\right) and for all \mathfrak{E}\left(e\right)\in {E}^{*}, all f, g\in \mathfrak{E}\left(e\right) respectively. These measures also capture putative alignments (Figure 2). Since alignment clusters containing more than one node or edge from the scaffold PPIN are associated with each contained scaffolding node or edge, such clusters are counted multiple times. We investigated this effect and computed measures for distinct clusters (without double counting). We determined that the key findings of this investigation are the same for both approaches.
Aligned nodes with high functional similarity (NF) or homology (NH)
To measure how well the biological functionality of the proteins is reflected in the alignment graph, we define an auxiliary function.
Functional similarity scores for each pair of aligned proteins are according to the funSim score in FunSimMat [48]. The funSim score combines similarity scores with respect to both involvement in biological processes and molecular function for a pair of proteins. Scores reach from 0 (no similarity) to 1 (maximum similarity) and are computed based on semantic similarity of the GO terms of the two proteins and their respective probabilities. Manual review appears to suggest that this threshold could be lowered further to capture more relevant proteinprotein correspondences without significantly increasing the number of false positives (Table 1 Figure 3). The threshold of 0.5 has been used in the literature to evaluate alignments [27] and is thus used here for easier comparison.
To capture how well the alignment recovers the evolutionary relationship of the nodes in the input networks, we define a set of related measures accounting for pairwise homologous proteins based on another auxiliary function.
In Eq. (9), the homologene group identifiers for each protein are retrieved from the NCBI homologene repository. Often when the NH and NF measures disagree, the reason is either incomplete (missing) data or, in the case of NF, the specification of a threshold value that is overly restrictive for identifying biologically relevant mappings. We introduce a combined measure that counts all nodes that are either functionally similar or homologous. As outlined above, we define two variations of each measure.
Number of aligned nodes (NA) and derived measures of precision
The number of aligned nodes is considered only to normalize other measures. Dividing by NA allows for establishing a measure of precision since NA captures all aligned nodes (e.g. true positives and false positives) while other metrics like NF or NH can be considered the true positives according to their specific biological perspective. For normalization of the two different varieties for each metric we define
Eq. (16) specifies the number of nodes aligned to the nodes of the center PPIN. The scaffolding nodes themselves are excluded. This gives NA_{ s } the same range as NF_{ s }, NH_{ s } and NForH_{ s } and allows for normalization of those measures in the range [0 1].
Conserved interactions with functionally similar (EF) or homologous (EH) endpoint proteins
Conserved interactions in general are a relevant measure of alignment quality (see number of aligned edges, EA, below). EF is a biologically motivated variation of this measure where the two pairs of interacting proteins, the endpoints of the edges, are considered as well. Only interactions where the aligned endpoint proteins are pairwise functionally similar as defined in Eq. (8) are counted towards this measure.
Analogous, based on Eq. (9) we define
With the same reasoning we presented for the combined nodebased measure NForH we define corresponding interactionbased measures as follows
where q, s\in \mathfrak{B}(u); r, t\in \mathfrak{B}\left(v\right), (s, t) ≠ (u, v) and (q, r) ≠ (s, t).
Number of conserved edges (EA)
The number of conserved edges in the alignment graph reflects how well the aligned proteins capture the topology and biological processes expressed in the input networks and allow evaluation of the quality of the alignment independent of biological measures like functional similarity or homology.
Analogous to NA, EA can also be used to derive biologically inspired precision measures on edges.
Number of interactions conserved in at least k distinct species (EAk)
In addition to the total number of conserved interactions EA, we define the number of interactions that are conserved in at least k species EAk as the number of edges (u, v) ∈ E_{ s } that have induced edges from at least k1 nonscaffold species associated with them. An edge with one induced edge from a different species would count towards EA2. An edge with induced edges from two additional distinct species would contribute to EA2 but also count towards EA3 and so forth. The tautological EA1 = E_{ s } does not provide information for characterizing an alignment.
Results
Effect of the scaffold selection on the SMAL MNA
To demonstrate measure consistency, we compared the performance of SMAL to that of pairwise network aligners. To estimate pairwise performance, for each algorithm, we computed all pairwise alignments and took the sum of each measure across all alignments involving a given algorithm and species. The highest, and second highest scoring species for each algorithm and measure is presented in Table 2. To generate a comparable table for SMAL, we produced a SMAL alignment for each algorithm in turn using each of the eight species PPINs as the scaffold, and computed the same measures for each of these MNAs. The scores of the highest and second highest scoring MNAs together with the corresponding scaffold species for each algorithm are presented in Table 3.
We observe that choice of algorithm and scaffold network greatly affect the alignment results. For instance, Human, Yeast and Drosophila networks, which contain a large number of proteins and interactions (Table 4), receive maximum scores when summing up over their pairwise alignments in almost all of the measures (Table 2). Arabidopsis, which is a small but highly clustered network, scores high on edgebased measures for alignment algorithms (IsoRankN and SMETANA), which can compute manytomany node alignments (Table 2). This is in line with the suggested heuristic of using simple network statistics like the number of nodes and edges as a proxy for selecting the scaffold put forward in the "Methods" section.
Comparing Table 2 and Table 3 we observe that a given choice of algorithm and measure will yield a similar species ranking. We term this effect measure consistency, whereby knowledge of an algorithm's pairwise performance on a given dataset can be extrapolated to estimate the expected performance of said algorithm in a SMAL alignment.
Precision of implied SMAL mappings compared to native MNAs
As mentioned in the "Methods" section, correspondences between nodes mapped to the same vertex in pairwise alignments with the scaffold are implied when creating the SMAL MNA. To evaluate this transitivity assumption, we measure the biological significance of the putative alignments made by SMAL. This is achieved by calculating the following measure of precision:
Eq. (26) represents the ratio of biologically significant implied node alignments and the total number of implied node alignments. The same equation can be applied to other measures, such as NF, NH, EF, EH or EForH to obtain corresponding measures of precision. We compare and present the relative change in precision between SMAL and native MNAs.
While there is a great deal of variability in the precision of MNA alignments produced by different algorithms as computed by equation (27) (see Figure 4), the precision of SMAL is on average 6.5% of the native MNA implementation for a given algorithm (Table 5) when excluding 4species MNAs, which are missing Yeast and ignoring 8species MNAs where SMETANA performs exceptionally poorly. Including all MNAs, SMAL is on average more precise than native MNA implementations with a relative change of precision of 143% on this data set. In the worst case, SMAL can have up to 24.5% worse precision than the native MNA. Thus, for situations when a native MNA implementation performs poorly (e.g. SMETANA with 8species), or when native MNA implementations do not exist (see "Case studies"), SMAL becomes a particularly useful alternative. Also, for certain measures and scaffolds, SMAL outperforms existing algorithms by significant margins (Figure 4). Finally, we find that the simple transitivity assumption made by SMAL holds up reasonably well (6.5% loss of precision on this dataset as outlined above) considering the largely reduced complexity compared with the native MNA implementations investigated here.
Case studies
PINALOG: MNAs with a high ratio of aligned homologous proteins
Pairwise alignment algorithms outnumber native multiple network aligners. SMAL allows any pairwise alignment algorithm to be used to produce MNAs. As outlined above, the characteristics of pairwise alignments are largely conserved in a SMAL MNA. Thus, if the characteristics of a pairwise aligner are favorable in a given research context, it becomes possible to create MNAs with similar characteristics with SMAL. PINALOG for example outperforms other network alignment algorithms considered by us in aligning homologous proteins. The average of the NH/NA measure over pairwise alignments of all eight species considered in this study was <0.01% for NETAL, 7.2% for IsoRankN, 8.1% for SMETANA and 19.6% for PINALOG respectively. A SMAL MNA based on PINALOG outperforms existing native MNAs on the same measure with NH/NA=19.1% versus 14.2% for native IsoRankN, followed by 9.2% for SMAL based on SMETANA, 8.4% for native SMETANA, 5.9% on SMAL based on SMETANA and finally <.1% for SMAL based on NETAL (Figure 5).
NETAL: MNAs with high numbers of conserved interactions
NETAL is the only algorithm in this study that does not use biological information for its alignments (e.g. BLAST bit scores for pairs of proteins) and consequently, NETAL alignments score lower on the biologically inspired measures. Yet NETAL is by far the fastest algorithm and identifies the highest number of conserved interactions in the pairwise alignments considered by us. Using NETAL with SMAL creates MNAs that maintain these valuable characteristics as shown in Figure 6.
Speed of alignments
In this study we worked with two native multiple network aligners (SMETANA and IsoRankN) and two pairwise aligners (PINALOG and NETAL) to illustrate the efficiency aspect of several very dissimilar approaches to network alignment. Table 6 gives an overview over the key parameters and characteristics that are relevant to this study.
Since the pairwise alignments are independent from each other, their computation can be parallelized and distributed across multiple cores or machines. Even without parallelization, SMAL outperformed native MNA alignment algorithms by large margins in our experiments, as shown in Figure 7. We note that the most computationally expensive part in this process, by far, was the creation of the pairwise alignments. This step took us from a few minutes to many hours depending on the pairwise aligner employed. By contrast, combining PNAs into a SMAL MNA including computation of the conserved edges took less than 10 seconds even for the largest alignments conducted as part of this study. All the time measurements reported in this paper were from computations conducted on a machine with dual six core 32nm Xeon processors at 3.47 GHz (hyperthreaded for 24fold parallelism) and 86 GB of registered, ECC DDR3 RAM @1066 MHz.
Conclusions
In this paper we introduced SMAL, a method for combining pairwise network alignments into a multiple network alignment. In contrast with other established methods, SMAL alignments are persistent in that established node correspondences do not change as additional networks are added. As the MNAs are also invariant to the order in which pairwise alignment are computed, SMAL can be enriched with additional PPINs at any point in time. This property makes the alignments suitable for iterative exploration of PPI data. SMAL is also significantly faster than other MNA algorithms and can be easily parallelized, allowing for the computation of very large MNAs covering many species. Our experiments indicate that native MNA algorithms, which are significantly slower than SMAL, may produce alignments, which, on average, score better than SMALbased alignments produced using the pairwise versions of the same algorithms. However, SMAL allows scientists to use any of the (much larger number of) specialized pairwise alignment algorithms available today to obtain MNAs. In many cases, this leads to superior MNAs as compared to those created with native MNA algorithms.
Abbreviations
 MNA:

Multiple Network Alignment
 PNA:

Pairwise Network Alignment
 PPI:

ProteinProtein Interaction
 PPIN:

ProteinProtein Interaction Network
 SMAL:

ScaffoldBased Multiple Network Aligner.
References
Finley RL, Brent R: Interaction mating reveals binary and ternary connections between drosophila cell cycle regulators. Proc Natl Acad Sci U S A. 1994, 91 (26): 1298012984. 10.1073/pnas.91.26.12980.
Bader GD, Hogue CW: Analyzing yeast proteinprotein interaction data obtained from different sources. Nat Biotechnol. 2002, 20 (10): 991997. 10.1038/nbt1002991.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, et al: A comprehensive twohybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A. 2001, 98 (8): 45694574. 10.1073/pnas.061034498.
Mena F, Li J, Bray J, Collins S, Guo X, Ignatchenko A, et al: Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature. 2006, 440 (7084): 637643. 10.1038/nature04670.
Mann M, Aebersold R: Mass spectrometrybased proteomics. Nature. 2003, 422 (6928): 198207. 10.1038/nature01511.
Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, et al: Evidence for dynamically organized modularity in the yeast proteinprotein interaction network. Nature. 2004, 430 (6995): 8893. 10.1038/nature02555.
Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M: Wholeproteome prediction of protein function via graphtheoretic analysis of interaction maps. Bioinformatics. 2005, 21 (suppl 1): i302i310. 10.1093/bioinformatics/bti1054.
Yook SH, Oltvai ZN, Barabasi AL: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4 (4): 928942. 10.1002/pmic.200300636.
Goh CS, Cohen FE: Coevolutionary analysis reveals insights into proteinprotein interactions. J Mol Biol. 2002, 324 (1): 177192. 10.1016/S00222836(02)010380.
Klingström T, Plewczynski D: Proteinprotein interaction and pathway databases, a graphical review. Briefings in Bioinformatics. 2011, 12 (6): 702713. 10.1093/bib/bbq064.
Kerrien S, Aranda B, Breuza L, Bridge A, BroackesCarter F, Chen C, et al: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012, 40 (Database issue): D841D846.
Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M: Biogrid: A general repository for interaction datasets. Nucleic Acids Res. 2006, 34 (Database issue): D535D539.
Jager S, Cimermancic P, Gulbahce N, Johnson J, McGovern K, Clarke SC, et al: Global landscape of hivhuman protein complexes. Nature. 2012, 481 (7381): 365370.
Kuchaiev O, Pržulj N: Integrative network alignment reveals large regions of global network similarity in yeast and human. Bioinformatics. 2011, 27 (10): 13901396. 10.1093/bioinformatics/btr127.
Flannick J, Novak A, Srinivasan BS, McAdams HH, Batzoglou S: Graemlin: General and robust alignment of multiple large interaction networks. Genome Res. 2006, 16 (9): 11691181. 10.1101/gr.5235706.
Kelley B, Sharan R, Karp R, Sittler T, Root D, Stockwell BR, et al: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A. 2003, 100 (20): 1139411399. 10.1073/pnas.1534710100.
Dutkowski J, Tiuryn J: Identification of functional modules from conserved ancestral proteinprotein interactions. Bioinformatics. 2007, 23 (13): i149i158. 10.1093/bioinformatics/btm194.
Singh R, Xu J, Berger B: Global alignment of multiple protein interaction networks with application to functional orthology detection. Proc Natl Acad Sci U S A. 2008, 105 (35): 1276312768. 10.1073/pnas.0806627105.
Kacar B, Gaucher EA: Experimental evolution of proteinprotein interaction networks. Biochem J. 2013, 453 (3): 311319. 10.1042/BJ20130205.
Franzosa EA, Xia Y: Structural principles within the humanvirus proteinprotein interaction network. Proc Natl Acad Sci U S A. 2011, 108 (26): 1053810543. 10.1073/pnas.1101440108.
Ideker T, Sharan R: Protein networks in disease. Genome Research. 2008, 18 (4): 644652. 10.1101/gr.071852.107.
LaCount D, Schoenfeld L, Ota I, Kurschner C, Bell R, Hesselberth JR, et al: A protein interaction network of the malaria parasite plasmodium falciparum. Nature. 2005, 438 (7064): 103107. 10.1038/nature04104.
Tekir SD, Ulgen KO: Systems biology of pathogenhost interaction: Networks of proteinprotein interaction within pathogens and pathogenhuman interactions in the postgenomic era. Biotechnology Journal. 2013, 8 (1): 8596. 10.1002/biot.201200110.
Murali TM, Dyer MD, Badger D, Tyler BM, Katze MG: NetworkBased Prediction and Analysis of HIV Dependency Factors. PLoS Computat Biol. 2011, 7 (9): e100216410.1371/journal.pcbi.1002164.
Papadimitriou C: Computational Complexity. 1995, Reading, MA: AddisonWesley
Aladag AE, Erten C: SPINAL: Scalable protein interaction network alignment. Bioinformatics. 2013, 29 (7): 917924. 10.1093/bioinformatics/btt071.
Phan HT, Sternberg MJ: PINALOG: a novel approach to align protein interaction networksimplications for complex detection and function prediction. Bioinformatics. 2012, 28 (9): 12391245. 10.1093/bioinformatics/bts119.
Neyshabur B, Khadem A, Hashemifar S, Arab S: NETAL: A new graphbased method for global alignment of proteinprotein interaction networks. Bioinformatics. 2013, 29 (13): 16541662. 10.1093/bioinformatics/btt202.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al: STAR: ultrafast universal RNAseq aligner. Bioinformatics. 2013, 29 (1): 1521. 10.1093/bioinformatics/bts635.
Sahraeian SM, Yoon BJ: SMETANA: Accurate and scalable algorithm for probabilistic alignment of largescale biological networks. PLoS One. 2013, 8 (7): e67995e67911. 10.1371/journal.pone.0067995.
Liao C, Lu K, Baym M, Singh R, Berger B: IsoRankN: Spectral methods for global alignment of multiple protein networks. Bioinformatics. 2009, 25 (12): i253i258. 10.1093/bioinformatics/btp203.
Sharan R, Suthram S, Kelley R, Kuhn T, McCuine S, Uetz P, et al: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A. 2005, 102 (6): 19741979. 10.1073/pnas.0409522102.
Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, et al: Pairwise alignment of protein interaction networks. J Comput Biol. 2006, 13 (2): 182199. 10.1089/cmb.2006.13.182.
Berg J, Lassig M: Crossspecies analysis of biological networks by Bayesian alignment. PNAS. 2006, 103 (39): 1096710972.
Narayanan M, Karp RM: Comparing protein interaction networks via a graph matchandsplit algorithm. J Comput Biol. 2007, 14 (7): 892907. 10.1089/cmb.2007.0025.
Cootes AP, Muggleton SH, Sternberg MJ: The identification of similarities between biological networks: Application to the metabolome and interactome. J Mol Biol. 2007, 369 (4): 11261139. 10.1016/j.jmb.2007.03.013.
Ciriello G, Mina M, Guzzi P, Cannataro M, Guerra C: AlignNemo: A local network alignment method to integrate homology and topology. PLoS One. 2012, 7 (6): e38107e38113. 10.1371/journal.pone.0038107.
Flannick J, Novak A, Do C, Srinivasan BS, Batzoglou S: Automatic parameter learning for multiple local network alignment. J Comput Biol. 2009, 16 (8): 10011022. 10.1089/cmb.2009.0099.
Chindelevitch L, Liao CS, Berger B: Local optimization for global alignment of protein interaction networks. Pac Symp Biocomput. 2010, 123
Kuchaiev O, Milenkovic T, Memisevic V, Hayes W, Przulj N: Topological network alignment uncovers biological function and phylogeny. J R Soc Interface. 2010, 7 (50): 13411354. 10.1098/rsif.2010.0063.
Memišević V, Pržulj N: CGRAAL: commonneighborsbased global GRAph ALignment of biological networks. Integr Biol. 2012, 4 (7): 734743. 10.1039/c2ib00140c.
MilenkoviÄ T, Ng WL, Hayes W, Pržulj N: Optimal network alignment with graphlet degree vectors. Cancer Inform. 2010, 9: 121137.
Zaslavskiy M, Bach F, Vert J: Global alignment of proteinprotein interaction networks by graph matching methods. Bioinformatics. 2009, 25 (12): i2591267. 10.1093/bioinformatics/btp196.
Ay F, Kellis M, Kahveci T: SubMAP: Aligning metabolic pathways with subnetwork mappings. J Comput Biol. 2011, 18 (3): 219235. 10.1089/cmb.2010.0280.
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, et al: The MIntAct projectIntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014, 42 (Database issue): D358
Pržulj N: Biological Network Comparison using graphlet degree distribution. Bioinformatics. 2007, 23: e177e183. 10.1093/bioinformatics/btl301.
Clark C, Kalita J: A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics. 2014, 30 (16): 23512359. 10.1093/bioinformatics/btu307.
Schlicker A, Albrecht M: FunSimMat: a comprehensive functional similarity database. Nucleic Acids Res. 2008, 36 (Database issue): D434D439.
Declarations
This research was supported by funding from National Science Foundation grant IIS0644418 (CAREER). Publication costs were covered in part, through a grant from the Center for Computing in Life Sciences at San Francisco State University.
This article has been published as part of BMC Bioinformatics Volume 16 Supplement 13, 2015: Proceedings of the 12th Annual MCBIOS Conference. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S13.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
There are no competing interests.
Authors' contributions
RS proposed the formulation and the properties which constitute the key characteristics of the method, provided research guidance, and (ironically) christened the method SMAL. JD developed the SMAL algorithm, implemented it, and did the theoretical analysis. JP conducted biological analysis. The manuscript was written by RS, JD, and JP.
Electronic supplementary material
12859_2015_7162_MOESM1_ESM.html
Additional File 1: Overview of PPIN alignment algorithms. Table in landscape format; HTML, viewable in any browser; filename: 1471210516S14S11S1.html. Abbreviations used in the table: LP  local pairwise aligner, GP  global pairwise aligner, LM  local multiple aligner, GM  global multiple aligner, FC  functional coherence, EC  edge correctness, GOC  Gene Ontology consistency, Sp  specificity, NS  number of solutions, HP  homologene pairs, NH  number of homologene pairs, CN  correct nodes, NC  number of correct solutions. Footnotes to the table: * n1 = V1, n2 = V2, m2 = E2, m2 = E2; ** n = max{V1,V2} m = max{E1,E2} (HTML 29 KB)
12859_2015_7162_MOESM2_ESM.pdf
Additional File 2: Pseudocode 2  transforming a native MNA for comparison with SMAL. The pseudocode outlines a method to transform a MNA obtained from a native MNA algorithm into a SMALlike MNA. Protein alignments that are not relevant to a given scaffold are stripped and alignment clusters containing multiple scaffold proteins are duplicated. This process allows for comparison between SMAL and other MNA algorithms. (PDF 87 KB)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Dohrmann, J., Puchin, J. & Singh, R. Global multiple proteinprotein interaction network alignment by combining pairwise network alignments. BMC Bioinformatics 16 (Suppl 13), S11 (2015). https://doi.org/10.1186/1471210516S13S11
Published:
DOI: https://doi.org/10.1186/1471210516S13S11