Skip to main content

ConnectedAlign: a PPI network alignment method for identifying conserved protein complexes across multiple species



In bioinformatics, network alignment algorithms have been applied to protein-protein interaction (PPI) networks to discover evolutionary conserved substructures at the system level. However, most previous methods aim to maximize the similarity of aligned proteins in pairwise networks, while concerning little about the feature of connectivity in these substructures, such as the protein complexes.


In this paper, we identify the problem of finding conserved protein complexes, which requires the aligned proteins in a PPI network to form a connected subnetwork. By taking the feature of connectivity into consideration, we propose ConnectedAlign, an efficient method to find conserved protein complexes from multiple PPI networks. The proposed method improves the coverage significantly without compromising of the consistency in the aligned results. In this way, the knowledge of protein complexes in well-studied species can be extended to that of poor-studied species.


We conducted extensive experiments on real PPI networks of four species, including human, yeast, fruit fly and worm. The experimental results demonstrate dominant benefits of the proposed method in finding protein complexes across multiple species.


A protein complex is a bimolecular that contains a number of proteins interacting with each other to perform different cellular functions which is described in many prior works such as the work proposed by Hu at al. in [1]. The identification of protein complexes in a protein-protein interaction (PPI) network [2] can, therefore, lead to a better understanding of the roles of such a network in different cellular systems. It is for this reason that the protein complex identification problem has received a lot of attentions, and a considerable number of techniques and algorithms have been proposed to address such problem. Graph structure is widely adopted in many applications [3, 4]. By representing a PPI network as a graph [5], whose vertices represent proteins and edges as interactions between proteins, these algorithms are able to identify clusters in single PPI network based on different graph properties [6]. For example, an uncertain graph model based method is proposed to detect protein complex from a PPI network [7]. To identify protein complexes, previous works proposed to consider not just topological but also biological information in the network [1]. However, they all focused on finding protein complexes in a single PPI network, and finding conserved protein complexes from multiple PPI networks still remains challenging.

Network alignment provides a possible way to identify protein complexes from multiple PPI networks [8]. Conserving functional and topological features are two goals for network alignment. Functional module represents a collection of molecular interactions that work together to achieve a particular functional objective in a biological process, while topological module represents locally dense neighborhoods in a PPI network [9]. Network alignment can be categorized into two classes: global alignment and local alignment. Global alignment [10] finds overall best functional orthologs among entire PPI networks, while local alignment identify smaller conserved subnetworks in part of the networks [11]. In the context of local alignment, when a given small network is aligned with large networks, the problem can be projected as network query problem. In this paper, we concern more on the local alignment, which is more related to our problem.

Traditional pairwise network alignment detects functional orthologs of proteins in PPI networks by maximizing the similarity between proteins, while ignoring the subnetwork structure of protein complex. Therefore, the disconnected subnetwork problem might be caused when applying those methods to identify conserved protein complexes. For example, in Fig. 1, there are two PPI networks Net x and Net y. When aligning complex (x1,x2,x3) in Net x to Net y, protein x1 and x2 are aligned with y1 and y2. But only maximizing pairwise similarity of proteins might lead x3 to be aligned with y6, which results in disconnected subnetwork in the alignment and doesn’t meet well with the requirement of protein complex.

Fig. 1

Disconnected sub-network problem. Proteins are represented by vertices, PPIs by solid lines, and links between bipartite graphs by dashed lines. Traditional pairwise local alignment might miss the desired protein complex. For example, x1,x2 are aligned to y1 and y2, but x3 might be aligned to y6 when maximizing the vertex similarity score, which results in disconnected substructure

Aligning multiple networks promises additional insights into the protein complexes as well as the knowledge-transfer across multiple species. However the alignment of multiple PPI networks has additional challenges. For example, if directly applying the methods of pairwise network alignment to the multiple network alignment, inconsistency problem might be caused. For example, as shown in Fig. 2, the substructure (x1,x2,x3) in Net x is aligned with (y1,y2,y3) in Net y. When they are expected to be further aligned with the (z2,z3,z4) in Net z from consistent perspective, (y4,y5,y6) might be the best alignment instead if it was a pairwise alignment between Net y and Net z. However, since the goal of multiple network alignment is to find conserved protein complexes across all PPI networks, (y1,y2,y3) should be a better result.

Fig. 2

Inconsistency problem. Applying traditional pairwise local alignment in multiple alignment might miss the desired protein complex. For example, when (x1,x2,x3) is aligned to (y1,y2,y3), (z2,z3,z4) might be aligned to (y4,y5,y6) while (y1,y2,y3) is the more consistent alignment. Then, inconsistency arises in Net y

In this paper, we propose a new approach to find conserved protein complexes by network alignment. The main contributions are as follows:

  • We identify the problem of finding conserved protein complex via aligning multiple PPI networks. In this way, the knowledge of protein complexes in well-studied species can be extended to that of many poor-studied species.

  • We propose an efficient method to find conserved protein complexes from multiple PPI networks. In this method, we take the feature of subnetwork connections into consideration, which improves the coverage significantly without compromising the consistency of aligned results.

  • Extensive experiments are conducted on the PPI networks of four species including human, yeast, fruit fly and worm. These results in terms of coverage and consistency illustrate the dominant benefit of the proposed method in finding protein complexes across species.


Problem definition

Definition 1.

Target network: A PPI network Gt=(Vt,Et) is called target network if the given protein complexes to be aligned belong to Gt, where Vt is the set of proteins and Et is the set of interactions between them.

The knowledge such as protein complexes of a target network can be extended to other PPI networks via network alignment. We define the other PPI networks as aligned networks.

Definition 2.

Aligned networks: Let G={Gi}(1≤iξ) be the set of aligned networks, where ξ is the number of PPI networks to be aligned with target network. Gi=(Vi,Ei)(1≤iξ) is the ith PPI network to be aligned, where Vi, Ei are the sets of proteins and their interactions.

Given target network, aligned networks and protein complexes of target network, we define the input of the problem as follows.

Input: (1) The set of aligned networks G={Gi,1≤iξ}, where ξ is the number of aligned networks. (2) The set of well studied protein complexes in target network Gt: S={S1,S2,...Sζ}, where ζ is the number of protein complexes to be aligned.

Then the alignment result as the output is defined as follows.

Output: Without loss of generality, for any protein complex M0, M0S, the alignment result is a matchset M={M1,M2,…,Mξ} consists of a set of ξ subnetworks, where MkGk, 1≤kξ, GkG, which satisfies: (1) any MkGk is a connected subnetwork of Gk; (2) maximizing the similarity score of {M0,M1,M2,…,Mξ}.

With the definitions and notations above, our algorithm of finding protein complexes across multiple PPI networks via network alignment mainly follows two procedures: assigning scores to proteins according to both biological and structural features, and then heuristically selecting proteins that form connected subnetwork in each PPI network which finally achieves optimized total score for multiple PPI networks.

Scoring strategy of network alignment

Overall, we utilize both the biological similarity between proteins and the topological structure to assign scores on subnetworks for subsequent heuristic selections of proteins. Formally, given a protein complex of target network M0Gt, its match result {M1,M2,…,Mξ} in aligned networks, where MkGk, is assigned with a real-valued score Φ:

$$ \Phi = \sum\limits_{k \in \{1,\ldots,\xi \}} \sum\limits_{v_{j} \in V_{M_{k}}} \left(\alpha * \delta_{bio}(v_{j}) + (1-\alpha) * \delta_{topo}(v_{j}) \right) $$

where ξ is the number of PPI networks, \(V_{M_{k}}\) is the set of proteins in Mk, α is a coefficient to trade off biological and topological scores, δbio and δtopo are the biological and topological scores respectively. In the following, we will describe the details of determining the δbio and δtopo.

Assume MkGk is the current subnetwork to be assigned a score, where Gk, 1≤kξ, is the current aligned network. At each time, choose another PPI network denoted as Gh, (hk)(1≤hξ), then Gt,Gk,Gh construct a group of triple networks. Denote MhGh as subnetwork of Gh to align with M0. For every h, we calculate score for the proteins in Gk in the triple networks.

We use Fig. 3 as an example to show the method of assigning scores, where M0 is the target subnetwork in target network Net x consisting of (x1,x2,x3), Mk is the subnetwork in aligned network Net y to be assigned scores consisting of (y1,y2,y3). And the subnetwork of (z2,z3,z4) in aligned network Net z is to be aligned with M0.

Fig. 3

Illustration of assigning scores. Net x is the target network, and M0 is the given protein complex. Net y is an aligned network. Taking y1 as example, its scores \((\delta _{bio}^{1}, \delta _{bio}^{2}, \delta _{bio}^{3}, \delta _{topo}^{1}, \delta _{topo}^{2})\) are 1, 1, 1, 2, 3, respectively

Definition 3.

Link: If a pair of proteins (u,v) comes from different PPI networks, and u,v are sequence similar, then (u,v) is called a link.

Sequence similarity [12] can be obtained with the BLASTP method [13]. We connect a dashed line to denote a link in this paper.

Definition 4.

Thread: If triple proteins (u,v,t) comes from three different PPI networks, and there exist links between (u,v), (u,t) and (v,t) at the same time. Then they form a thread.

The biological score of a protein consists of: (1) the number of links with the subnetwork M0, (2) the number of links with the subnetwork Mh, and (3) the number of threads among these three subnetworks which contain the current protein. We denote these three scores as \(\delta _{bio}^{1}\), \(\delta _{bio}^{2}\), \(\delta _{bio}^{3}\). Taking y1 in Fig. 3 as example, there are links (y1,x1), (y1,z2) and thread (y1,x1,z2). Therefore, \(\delta _{bio}^{1}\), \(\delta _{bio}^{2}\), \(\delta _{bio}^{3}\) of vertex are all “1". To avoid excessive influence of one factor, we adopt a transform techniques by multiplying a coefficient. The biological score of a protein u is:

$$ \delta_{bio}(u) = {\left(\delta_{bio}^{1}\right)}^{\frac{1}{\lambda}} + {\left(\delta_{bio}^{2}\right)}^{\frac{1}{\lambda}} + {\left(\delta_{bio}^{3}\right)}^{\frac{1}{\lambda}} $$

where \(\delta _{bio}^{1}\), \(\delta _{bio}^{2}\), \(\delta _{bio}^{3}\) are the numbers of links with M0, Mh and the number of threads respectively. λ(λ>1) is the parameter of transform.

Definition 5.

Component: a connected graph Gc=(Vc,Ec) is a component of subnetwork Mk if GcMk.

The topological score of a vertex consists of (1) the degree of current vertex; (2) the size of the maximal component that includes the current vertex. As the same with biological score, we adopt a transform techniques by multiplying a coefficient. The topological score of a vertex u is:

$$ \delta_{topo}(u) = {\left(\delta_{topo}^{1}\right)}^{\frac{1}{\omega}} + {\left(\delta_{topo}^{2}\right)}^{\frac{1}{\omega}} $$

where \(\delta _{topo}^{1}\) is u’s degree in its subnetwork, and \(\delta _{topo}^{2}\) is the size of the maximal component that includes u. ω is a parameter of transform. In our method, ω>1.

Alignment algorithm

Given the multiple PPI networks and target protein complex from the target PPI network, the alignment process is shown in Algorithm 1, which It mainly includes:

(1) Generate initial candidate pools.

Only those proteins that have links with given protein complex can be selected as candidate proteins since links represent the biological similarity between proteins across PPI networks according to Definition 3. For each aligned network Gi1≤iξ, we construct a pool for a given protein complex M0, where M0Gt. All vertices in Gi are put into the pool of Gi if they have links with any vertex in M0, as shown in Line 5 of Algorithm 1. Then, the initial subnetworks M are selected randomly from the pools.

(2) Simulated annealing process.

Simulated annealing process adopts iteration method for global optimal solution. In each loop, a protein from the candidate pool is chosen randomly to be determined as aligned protein in the corresponding PPI network (Line 14 of Algorithm 1). On the other hand, there are two kinds of proteins that could be moved out from the current alignment solution (Line 13 of Algorithm 1). The first kind is the protein whose score is the lowest in the current solution: \( \{ v | v \in V_{M_{\varepsilon }} \wedge {\text{argmin}}_{v}score(\textit {v}) \} \). The other kind is the protein whose corresponding vertex in the current subnetwork is not connected with other vertices, i.e., its degree is zero. As shown in Line 16–19 of Algorithm 1, if the new candidate solution achieves higher score, it will take place the previous solution. If not, it still has chance to replace the prior solution with a probability of \(\left (rand(0,1)<e^{\frac {\Delta \Phi }{sT_{i}}}\right)\), where \(e^{\frac {\Delta \Phi }{sT_{i}}}\) returns the selection threshold for the selection of simulated annealing process. Finally, the algorithm returns the best solution as the alignment of protein complexes M={M1,M2,…,Mξ}.

Results and discussion

In this section, we evaluate the performance of our method through extensive experiments. We compare our method to LocalAli [14] since LocalAli is the most recent local alignment method for PPI networks. We measure the coverage and consistency of the alignment networks.

Dataset and experimental setup

Real-world PPI networks of four species are used in our experiments, including Homo sapiens (human), Dorsophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Saccharomyces cerevisiae (yeast) [15]. The detailed numbers of proteins and interactions for each species are listed in the Table 1.

Table 1 Proteins and interactions of four species

We also obtained the corresponding sequences of all proteins from manually annotated and reviewed database UniProtKB/Swiss-Prot [16] for calculating pairwise protein similarity, i.e., e-value, by conducting BLASTP 2.3.0 (downloaded from the NCBI BLAST [17]) and setting e−7 as the e-value cutoff, to select the potential homologous proteins across different species. The corresponding Gene Ontology (GO) annotations of the proteins are collected from the Uniprot-GOA database for the alignment evaluations.

As human and yeast are the two best studied species [18], we build data sets by assigning them alternatively as the target PPI network for the alignment, and choose two from the rest of our collected PPI networks as aligned networks. There are total of six datasets generated, with each dataset as a group of multiple PPI networks to perform alignment. The composition of the six datasets are listed in Table 2.

Table 2 Datasets composition

With most local alignment algorithms that are pairwise, LocalAli [14] is one of the few most recent local alignment approaches. In LocalAli, a framework is proposed to reconstruct the evolution history of conserved modules based on a maximum-parsimony evolutionary model. LocalAli aims to identify functionally conserved modules from multiple biological networks, which is able to be used as a comparison method to our proposed algorithm. We run LocalAli with its default parameters on the six datasets in Table 2 to obtain target protein complexes, by retrieving every matchset in its results and obtain whose proteins form a component in the target network. The components from the target network are used as the input of our algorithm. In the experiment, we set the parameters α=0.5,θ=1.1,K=20,N=100,Tmax=100,λ=4.5,ω=3. The results are compared with LocalAli in terms of coverage and consistence.


A larger and denser connected component can give more insight of common topology of the network and it could be more biologically significant. The coverage analyzes the numbers of proteins in the aligned subnetworks from each aligned PPI networks with the given motifs in the target network.

As shown in Table 3, We compare our algorithm with LocalAli [14] on the six datasets, where D1 D3 are assigning human PPI network as the target network and D4 D6 get the yeast as the target network. For each dataset, since we utilize the largest component in the according target PPI network from the LocalAli as our target protein complex for alignment, the average number of proteins in every target network are all the same to that of the LocalAli, i.e., ratio is 100% for the target network. The ratio is the result obtained by dividing the average size of protein complexes of our proposed method by that of LocalAli. As in the aligned networks, our method can generate larger sizes of aligned protein complexes than that of the LocalAli among all datasets. One exception is in the dataset D3, where two method obtained equal coverage in one of the aligned networks, while obtaining much higher coverage in the other aligned networks. Similar situation exist in dataset D6. In dataset D1, D2, and D4, our algorithm achieves significantly higher coverage in all aligned networks, with the largest one has nearly 248% coverage to the LocalAli.

Table 3 Comparison of coverage


The calculation of the consistency utilizes the Gene Ontology (GO) annotations associated to each of the proteins, with three basic types of ontologies describing biological properties: biological process (BP), molecular function (MF) and cellular component (CC) [19]. It is assumed that proteins with more similar GO annotations are more functionally coherent [20]. We calculate and analyze such functional similarity by the fraction of aligned proteins that share same GO annotations. The larger the fraction, the more biological significance the alignment has.

The consistency, specifically measured by the mean entropy (ME) and mean normalized entropy (MNE), serves as a specificity metric to measure the quality of alignment. To calculate ME, we first obtain the entropy E(M) of a matchset M, i.e. the protein complexes aligned to one protein complex in the target species among all participated PPI networks, with following formulation:

$$ E(M)=E(v_{1},v_{2},\ldots v_{n}) = - \sum\limits_{i=1}^{d}p_{i} \times log (p_{i}) $$

where pi is the fraction of all proteins in the matchset M with the annotation GOi, and d represents the total number of different GO terms in M. Thus the aligned matchset with more consistency will have lower entropy. The ME of the matchset is then calculated by averaging the entropies of all matchsets generated from the alignment to all the protein complexes in the target species, and the lower the ME of the alignment results, the higher consistency a method performs, indicating a better biological quality.

Similar to ME, for the MNE, we first calculate the normalized entropy NE(M) for a matchset as:

$$ NE(M)=NE(v_{1},v_{2},\ldots v_{n}) = -\frac{1}{log d} \sum\limits_{i=1}^{d}p_{i} \times logp_{i} $$

where pi and d have the same interpretation of those in the E(M). The MNE of the alignment results is then computed by calculating the average of the normalized entropy of all matchsets with their size. The lower MNE, the better functional consistency an alignment method achieves.

The comparison of consistency between the results from LocalAli and our algorithm is shown in Table 4. The ratio is the result obtained by dividing the ME or MNE of our proposed method by that of LocalAli then subtracting one. We can observe that in D1, D4, D5 and D6, our method generates aligned protein complexes with slightly higher ME and MNE than that of the LocalAli, where the ratio of the consistency less to LocalAli range from 0.76 to 6.48%. Meanwhile, we achieve higher ME and MNE than LocalAli in D2 and D3, with 8.12% better consistency at most.

Table 4 Comparison of consistency

For PPI network alignment, it is more important to achieve the alignment of functional modules than the alignment of proteins alone. The proposed ConnectedAlign achieves this goal without losing the consistence and coverage. In the future, the genome information could be used for biological network alignment [21].


In this paper, we proposed a novel approach to identify conserved protein complexes across different species. Given target protein complexes in the target network, the proposed method can find conserved protein complexes in multiple aligned PPI networks. Since we take the biological feature and topological feature into consideration, including subnetwork connectivity, our method achieves higher coverage significantly, and keeps stable consistence compared with previous network alignment method. The experimental results demonstrate the significant benefits of our proposed alignment method.


  1. 1

    Hu AL, Chan KCC. Utilizing both topological and attribute information for protein complex identification in ppi networks. IEEE/ACM Trans Comput Biol Bioinforma. 2013; 10(3):780–92.

    Article  Google Scholar 

  2. 2

    Li M, Niu Z, Chen X, Zhong P, Wu F, Pan Y. A reliable neighbor-based method for identifying essential proteins by integrating gene expressions, orthology, and subcellular localization information. Tsinghua Sci Technol. 2016; 21(6):668–77.

    CAS  Article  Google Scholar 

  3. 3

    Feng Q, Huang N, Jiang X, Wang J. Dealing with several parameterized problems by random methods. Theor Comput Sci. 2018; 734(22):94–104.

    Article  Google Scholar 

  4. 4

    Gao J, Ping Q, Wang J. Resisting re-identification mining on social graph data. World Wide Web-internet Web Inf Syst. 2018.

  5. 5

    Li M, Yang J, Wu FX, Pan Y, Wang J. Dynetviewer: a cytoscape app for dynamic network construction, analysis and visualization. Bioinformatics. 2018; 34(9):1597–9.

    Article  Google Scholar 

  6. 6

    Maloddognin N, Pržulj N. L-graal: Lagrangian graphlet-based network aligner. Bioinformatics. 2015; 31(13):2182–9.

    CAS  Article  Google Scholar 

  7. 7

    Zhao B, Wang J, Li M, Wu FX, Pan Y. Detecting protein complexes based on uncertain graph model. IEEE/ACM Trans Comput Biol Bioinforma. 2014; 11(3):486–97.

    Article  Google Scholar 

  8. 8

    Faisal FE, Lei M, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP J Bioinforma Syst Biol. 2015; 2015(1):1–19.

    Article  Google Scholar 

  9. 9

    Bhowmick SS, Seah BS. Clustering and summarizing protein-protein interaction networks: A survey. IEEE Trans Knowl Data Eng. 2016; 28(3):638–58.

    Article  Google Scholar 

  10. 10

    Elmsallati A, Clark C, Kalita J. Global alignment of protein-protein interaction networks: A survey. IEEE/ACM Trans Comput Biol Bioinforma. 2016; 13(4):689–705.

    CAS  Article  Google Scholar 

  11. 11

    Gao J, Song B, Ke W, Hu X. Balanceali: Multiple ppi network alignment with balanced high coverage and consistency. IEEE Trans Nanobioscience. 2017; 16(5):333–40.

    Article  Google Scholar 

  12. 12

    Notredame C, Higgins DG, Heringa J. T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000; 302(1):205–17.

    CAS  Article  Google Scholar 

  13. 13

    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17):3389–402.

    CAS  Article  Google Scholar 

  14. 14

    Hu J, Reinert K. Localali: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks. Bioinformatics. 2015; 31(3):363–72.

    CAS  Article  Google Scholar 

  15. 15

    Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, et al.The intact molecular interaction database in 2012. Nucleic Acids Res. 2012; 40(D1):841–6.

    Article  Google Scholar 

  16. 16

    Consortium U. The universal protein resource (uniprot) in 2010. Nucleic Acids Res. 2010; 38(suppl 1):142–8.

    Article  Google Scholar 

  17. 17 Accessed 12 Jan 2018.

  18. 18

    Remmele CW, Luther CH, Balkenhol J, Dandekar T, Müller T, Dittrich MT. Integrated inference and evaluation of host-fungi interaction networks. Front Microbiol. 2015; 6(764):1–18.

    Google Scholar 

  19. 19

    Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O’Donovan C. The goa database: gene ontology annotation updates for 2015. Nucleic Acids Res. 2015; 43(D1):1057–63.

    Article  Google Scholar 

  20. 20

    Yu N, Li Z, Yu Z. A survey on encoding schemes for genomic data representation and feature learning from signal processing to machine learning. Big Data Min Analytics. 2018; 1(3):191–210.

    Article  Google Scholar 

  21. 21

    Bérard S, Chateau A, Pompidor N, Guertin P, Bergeron A, Swenson KM. Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics. 2016; 17(1):17–30.

    Article  Google Scholar 

  22. 22

    Song B, Gao J, Hu X. Identifying conserved protein coplexes across multiple species via network alignment. In: Proceedings of the 13th International Symposium on Bioinformatics Research and Applications (ISBRA 2017). Honolulu: LNCS: 2017. p. 1008–1009.

    Google Scholar 

Download references


The abridged abstract of this work was previously published in the Proceedings of the 13th International Symposium on Bioinformatics Research and Applications (ISBRA 2017), Lecture Notes in Computer Science: Bioinformatics Research and Applications [22].


Publication of this article was supported partially by National Natural Science Foundation of China (NSFC) under grant 61471369, 61672536.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 19 Supplement 9, 2018: Selected articles from the 13th International Symposium on Bioinformatics Research and Applications (ISBRA 2017): bioinformatics. The full contents of the supplement are available online at

Author information




JG, BS, XH and JW conceived the study and developed the model. BS wrote the code, cooperated with JG. JG and FY participated in algorithm development. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jianliang Gao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Song, B., Hu, X. et al. ConnectedAlign: a PPI network alignment method for identifying conserved protein complexes across multiple species. BMC Bioinformatics 19, 286 (2018).

Download citation


  • Network alignment
  • Big data
  • Graph data analysis