We implemented the web server WebNetCoffee in several programming languages, which include C++, html, php, css and mysql. The graph library LEMON version 1.2.3 [29] was used in the implementation of NetCoffee. The Apache HTTP Server and MySQL provide the fundamental web server environment. The server runs on a CPU of Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz.
The workflow of NetCoffee
Given a set of networks G1,G2,⋯,Gk, k≥3, each network can be modeled as a graph Gi=(Vi,Ei), where Vi and Ei represent proteins and interactions appearing in networks. Proteins aligned in one group is a matchset, which is a subset of \(\cup _{i=1}^{k}V_{i}\). The global network alignment problem is to search for a set of mutually disjoint matchsets for two or more PPI networks. We assumed that the sequence similarity and topology similarity can imply the functional conservation of proteins in different species.
An integrated model was adopt to measure the similarity of a given pair of nodes by using both topology and sequence information in the NetCoffee algorithm [17]. It employs simulated annealing to optimize a target function, aiming to search for an optimal global one-to-one map based on similarity of both network topology and protein sequences. There are four major steps: 1) building PPI networks and a library of bipartite graphs; 2) calculating integrated weight using triplet extension; 3) collecting candidate edges with maximum matching; 4) optimization with the simulated annealing approach. NetCoffee was distinguished itself with other existing algorithms by its fast speed and biologically meaningful alignment. It can perform a GAMN job on three or more PPI networks. The alignment result consists of a lot of matchsets, each of which represents a putative functional ortholog group.
Using WebNetCoffee
The WebNetCoffee provides a simple web interface for performing GAMN tasks. The home page of WebNetCoffee briefly introduces the foundation of NetCoffee and the resources of online datasets. The help page can quickly guide a new user to perform a GAMN task and query the results through a jobid step by step. Besides, it also gives more detailed description for each panel in the result page.
Users can launch a WebNetCoffee job on both the online datasets and users’ own datasets (see in Fig. 1a,b). Each job can be assigned a user-specified job title. The default parameter of alpha is 0.5, which is used to balance the contribution of topology and sequence score in the alignment result. To launch a job on the online datasets, totally, 15 species from four databases are available for users’ options. One can choose three or more species for each GAMN task. To avoid users’ very large computational tasks, WebNetCoffee allows performing a job on 3-5 networks in BioGRID and STRING, 3-11 networks in IntAct and DIP. Each file of users’ datasets uploaded to the server is restricted to be less than 200M.
In case many tasks were submitted to the server simultaneously, we designed a job queue to manage all the jobs. Each submitted job would firstly go into the job queue, waiting for a time slot in the server. There is a watchdog managing the submitted jobs with the principle of first come first serve (FCFS). It checks the status of the job queue in the background at regular intervals. The earliest job will start to run when a time slot was assigned to it. Users can query their results through a jobid within one week after the job finished. In one week, it will expire automatically. For the protection of privacy, each user can only see these jobs submitted by themselves (with the same IP address) in the job list. It also allows users to set a password before launching a job, which can avoid privacy leaks when multiple users shares a same IP address.
In the result page, we present statistics of test datasets in the first part, which include nodes and edges of each input network, the final alignment score and input parameters etc (see in Fig. 1c). To visualize the process of simulated annealing, the convergence curve is plotted in the second part. From Fig. 1d, users can see how fast the computation can converge to a stable score. In the third part, there is a large table separated in many pages, each page contains at most ten matchsets (see in Fig. 1e). Each matchset implies a group of functionally conserved proteins, which can be used in the “annotation transfer”. Additionally, our web server can provide information from open accessible databases to annotate the alignment results, such as the Uniprot ID, and gene ontology annotations. Each protein is linked to its GenPept page in the open accessible database NCBI protein. Users can easily check their sequence similarity and download function annotations (GO terms) by a simple click. Besides, users can also search for specific matchsets with a pattern in a search box. For example, if a substring “P535” was queried in the search box, the protein accession identifiers matched to the pattern “%P355%” will be extracted from the result table. In the column of visualization, the graphical view of induced sub-networks would be extracted from its PPI networks (see in Fig. 1f). The test dataset and alignment results of each task are available in the result page, which makes it very easy to run other methods on our online datasets and to compare the alignment quality.