Collective prediction of protein functions from proteinprotein interaction networks
 Qingyao Wu^{1, 2},
 Yunming Ye^{1, 2}Email author,
 Michael K Ng^{3},
 ShenShyang Ho^{4} and
 Ruichao Shi^{1, 2}
https://doi.org/10.1186/1471210515S2S9
© Wu et al.; licensee BioMed Central Ltd. 2014
Published: 24 January 2014
Abstract
Background
Automated assignment of functions to unknown proteins is one of the most important task in computational biology. The development of experimental methods for genome scale analysis of molecular interaction networks offers new ways to infer protein function from proteinprotein interaction (PPI) network data. Existing techniques for collective classification (CC) usually increase accuracy for network data, wherein instances are interlinked with each other, using a large amount of labeled data for training. However, the labeled data are timeconsuming and expensive to obtain. On the other hand, one can easily obtain large amount of unlabeled data. Thus, more sophisticated methods are needed to exploit the unlabeled data to increase prediction accuracy for protein function prediction.
Results
In this paper, we propose an effective Markov chain based CC algorithm (ICAM) to tackle the label deficiency problem in CC for interrelated proteins from PPI networks. Our idea is to model the problem using two distinct Markov chain classifiers to make separate predictions with regard to attribute features from protein data and relational features from relational information. The ICAM learning algorithm combines the results of the two classifiers to compute the ranks of labels to indicate the importance of a set of labels to an instance, and uses an ICA framework to iteratively refine the learning models for improving performance of protein function prediction from PPI networks in the paucity of labeled data.
Conclusion
Experimental results on the realworld Yeast proteinprotein interaction datasets show that our proposed ICAM method is better than the other ICAtype methods given limited labeled training data. This approach can serve as a valuable tool for the study of protein function prediction from PPI networks.
Keywords
Background
We have witnessed a revolution in sequencing technologies in last decade. The biological sciences are undergoing an explosion in the amount of genome sequences. There are increasing interests about using computational methods to identify the biological functions of the protein sequences [1], as experimentally determining protein functions is timeconsuming and it cannot catch up with the fast growth of newly found proteins [2].
Various studies have applied machine learning methods to protein data from biological experiments to predict the functions for unknown proteins. (e.g. [3, 4]). Classical computational approaches for protein function prediction represent each protein as a set of features, and employ machine learning algorithms to automatically predict the protein function based on these features. The most wellestablished methods [5] are the BLAST [6] approach based on sequence, PROSITE [7] based on sequence motifs, and PFAM [8] based on profile methods.
In recent years, the development of experimental methods for genome scale analysis of molecular interaction networks offers new ways to infer protein function in the context of proteinprotein interaction (PPI) network, wherein proteins and detected PPIs are represented by nodes and edges, respectively. The basic idea is that the direct interaction partners of a protein are likely to share similar biological functions [9]. Assignment of protein functions using PPI data has also been extensively studied, such as neighborhood counting based method [10], graph theoretic methods [11], hierarchical clusteringbased methods [12] and graph clustering methods [13]. Although many efforts have been made in protein function prediction, most of them were based on either sequence similarity that ignores the protein interactions, or PPI information without using attributes derived from the content of protein sequence. The former method often fails to work if a query protein has no or very little sequence similarity to any proteins of known labels, the latter method has similar problem if there are insufficient relevant PPI information.
To explicitly use the information of the content of the data and the links information of the PPI network to improve the prediction performance, collective classification (CC) is proposed. It received considerable attentions in the last decade. Various CC algorithms has been proposed in the literature [14], such as the iterative classification algorithm (ICA) [15], Gibbs sampling (Gibbs) [16], and variants of the weightedvote relational neighbor algorithm (wvRN) [17]. Here, we focus on ICAtype approaches, which consist of a local classifier, such as k NN, to infer the class labels of related instances. The key idea is to construct new relational feature vectors by summarizing the label information from neighborhood nodes, and then use the relational features together with the attribute features derived from the content of data to learn local classifiers for prediction.
This paper describes an effective Markov chain based CC algorithm (ICAM) to tackle the label deficiency problem in CC for protein function prediction from PPI networks. Our idea is to model the classifier M_{ AR } via the Markov chain with restart. The Markov chain model computes the ranks of labels to indicate the importance of a set of labels to an instance by propagating the label information in a graph constructed from labeled and unlabeled data. The ICAM algorithm further refines the Markov chain model using an ICA framework to generate the possible labels for a given instance. By these techniques, M_{ AR } can be learned more effectively. Experiments on the realworld Yeast PPI datasets have demonstrated that our proposed ICAM method improves the classification performance when compared with the ICAtype CC methods. The main contributions of this paper are as follows.

We study the label deficiency problem of collective classification (CC) and show that the protein function prediction problem from PPI networks can be formulated as a CC task.

We extend the ICAtype CC algorithm and propose the ICAM algorithm to leverage the unlabeled portion of the data to improve the classification performance of CC via the Markov chain with restart.

We demonstrate the effectiveness of our proposed ICAM algorithm using the Yeast benchmark datasets. We find that ICAM leads to significant accuracy gains compared to other ICAtype methods when there are limited numbers of labeled data available.
Methods
Preliminaries
Assume that the PPI network data are represented as a graph G(V, E, X_{ A }, Y, c), where V is a set of nodes, E is a set of edges representing the interactions between the instances. Each instance/node v_{ i } ∈ V is described by an attribute vector x_{ i } ∈ X_{ A }. Each Y_{ i } ∈ Y is a set of labels for v_{ i }, and c is the number of possible labels. Assume that we have a set of labeled nodes V^{ K } ⊂ V with known labels Y^{ K } = {Y_{ i }v_{ i } ∈ V^{ K }}, and the task is to predict the labels Y^{ U } for unlabeled nodes V^{ U } = V  V^{ K }. In this paper, we are primarily interested in generating a ranking of possible labels for a given protein such that its correct functions receive higher ranking than the less unlikely one.
The ICAM algorithm
Inspired by the ICA approach, we introduce the ICAM algorithm for collective classification. The algorithm is summarized in Algorithm 1. Similar to the ICA framework, the ICAM algorithm has two parts as follows: bootstrap and iterative inference. The bootstrap part learns an attributeonly classifier M_{ A } from the known nodes, and uses M_{ A } to predict labels for the unknown nodes V^{ U } (step 12). In the iterative inference part, the relational features X_{ R } are updated based on the estimated class labels of data (step 4). Specifically, X_{ R } of the (i + 1)th iteration is based on the known and predicted labels from the ith iteration. Next, the algorithm trains a collective classifier M_{ AR } using both attribute features X_{ A } and relational features X_{ R } to compute the labels for unlabeled data. The iterative process stops when the predictions of M_{ AR } are stabilized or a fixed number of iteration is reached.
An important component of the ICA algorithm is to build the relational features that summarizes the relational information, and to construct new feature vectors to train the classifier M_{ AR }. For instance, Neville et al. [15] summarize the labels of neighboring nodes as relational features as illustrated in Figure 1(b), where node "B" has two positive neighboring nodes and two negative neighboring nodes. Here, the relational features is "¡2, 2¿", and then "¡2, 2¿" is appended onto the original feature vector, <x_{i,1}, x_{i,2}, ⋯>, as new features, " <x_{i,1}, x_{i,2}, ⋯, 2, 2 >". ICAtype CC methods usually increase accuracy for network data using a large amount of labeled data to train M_{ AR }. In this scenario, the supervision knowledge can be effectively propagated in the network and improve the learning accuracy [18]. However, the labeled data are timeconsuming to obtain and the number of labeled data is very limited. Most of the nodes may not link to the labeled nodes, as illustrated in Figure 1(a). As a result, the prediction accuracy of the collective classifier M_{ AR } will be decreased greatly.
Algorithm 1 ICAM (V, E, X_{ A }, Y^{ K }, n)
Input:
V = nodes, E = edges, X_{ A } = attribute feature vectors, Y^{ K } = labels of known nodes, n = # of iterations,
Procedure:
1: M_{ A } = learnClassifer(X_{ A }, Y^{ K });
2: ${Y}^{U}=predict\left({M}_{A},{X}_{A}^{U}\right);$
3: for t = 0 to n do
4: X_{ R } = aggregation(V, E, Y^{ K } ∪ Y^{ U });
5: Retrain M_{ AR } = learnClassifer(X_{ A }, X_{ R } , Y^{ K });
6: ${Y}^{U}=predict\left({M}_{AR},V,E,{X}_{A}^{U},{X}_{R}^{U},{Y}^{K}\right);$
7: end for
8: return Y^{ U }
where γ is a constant independent of Y_{ i }. The attribute classifier to estimate p(Y_{ i }x_{ i }) is referred to as M_{ A }, and the relational classifier to estimate p(Y_{ i }r_{ i }) is referred to as M_{ R }.
There are two main advantages of this prediction method. First, this method allows us to train classifiers M_{ A } and M_{ R } for attribute features X_{ A } and relational features X_{ R } in parallel. Second, in the collective inference process, the classifier M_{ R } can be retrained in each iteration based on the reconstructed relational features X_{ R } to improve the prediction accuracy of the collective classifier M_{ AR }.
Various traditional supervised learning methods can be used to train M_{ A } and M_{ R } where the classifier, such as k NN, naive bayes and logistic regression [16, 20], is learned from a separate training data with a large amount of labeled data. However, when dealing with label deficiency problem in PPI networks, we propose to use transductive learning method for acquiring additional information from unlabeled data to improve the classification performance. Specifically, we set up Markov chain based learning models to estimate p(Y_{ i }x_{ i }) and p(Y_{ i }r_{ i }).
Markov chain based learning
where u = [u_{ j }] is the steadystate probability of relevance scores of different nodes, P is the affinity matrix associated with the instances in Markov chain transition probability graph, and q is the label distribution vector containing the elements of labeled instances being 1 and 0 for others. Here, the steadystate probability (relevance score of the instances) captures the global structure of the graph and relationship between the nodes. The advantage of this random walk procedure is that it converges to a unique solution for any initial u(0). The process converges fast, needing just a few iterations. The random walk and related methods have been shown to have good performance on the learning tasks mentioned above. In the following, we introduce the learning of M_{ A } via the Markov chain with restart using all the instances (both labeled and unlabeled). The process of learning M_{ R } is similar.
where x_{ i }  x_{ j } is the Euclidean distance between the ith feature vector and the jth feature vector in X_{ A }. The parameter σ is a positive number to control the linkage in the manifold [26]. The mbym matrix A, with its (i, j)th entry given by a_{i,j}, is always nonnegative. Similar to (1), using the Gaussian kernel to r_{ i } ∈ X_{ R } leads to the affinity matrix R for relational features. We then set up Markov chain models for classifiers M_{ A } and M_{ R } based on A and R, respectively.
For the classifier M_{ A }, we construct an mbym Markov transition probability matrix P by normalizing the entries of A with respect to each column, i.e., each column sum of P is equal to one, ${\sum}_{i}^{}\left[P\right]{\phantom{\rule{2.77695pt}{0ex}}}_{i,j}=1$. For such P, we model the probabilities of visiting the other instances from the current instance in a Markov chain transition probability graph. We construct a transition probability graph, all the labeled and unlabeled instances are linked together. Intuitively, a random walker starts from nodes with known label to propagate labels among labeled instances to the other unlabeled instances. The walker iteratively visits its neighborhood of nodes with the transition probability graph based on A.
Next we use the idea in topicsensitive PageRank [27] as a Markov chain with restart [25] to solve the learning problem. The random walker has a probability of α to return to labeled instances at each step. It can be interpreted that during each iteration each instance receives the label information from its neighbors via the random walk, and also retains its initial label information. The parameter α specifies the relative amount of the information from its neighbors and its initial label information. Using this approach, we compute the steady state probabilities that the random walker finally stay at different instances. These steady state probabilities give ranking of labels to indicate the importance of a set of labels to an unlabeled instance.
where l_{ d } is the number of instances with the label class d in the training data.
The steady probability distribution vector U is solved by the iteration method with an initial matrix U_{0} where each column is a probability distribution vector. The overall algorithm is summarized in Algorithm 2.
Algorithm 2 Markov Chain based Classifer
Input: P, Q and U_{0}, α, and the tolerance ϵ
Output:the steady probability distribution matrix U
Procedure:
1: Set t = 1;
2: Compute U_{ t } = (1  α)PU_{t1}+ α Q;
3: If U_{ t }  U_{t1} < ϵ, then stop, set U = U_{ t }; otherwise set t = t + 1 and goto Step 2.
Experimental results
In this section, we compare the performance of ICAM algorithm with other ICAtype collective classification algorithms: ICA, Gibbs and ICML. We show that the proposed algorithm outperforms these algorithms given limited number of labeled training data.
KDD Cup 2001 data and baselines
The first experiment is conducted for Yeast gene function prediction from KDD Cup 2001 [28]. The dataset includes 1,243 genes and 1,806 interactions among the pair of genes encoding the proteins physically interact with one another. These interaction relationships are symmetric. The protein functions are autocorrelated in this dataset and a subset of the data have been withheld for testing. The task is to predict the functions of the proteins encoded by the genes. There are 14 functions and a protein can have one (or several) function(s).
We compare our proposed method with the following three baseline learners:
1. ICA. The Iterative Classification Algorithm (ICA) algorithm proposed by Neville et al. [15] is one of the simplest and most popular CC methods that is frequently used as baseline for CC evaluation in previous studies. For multilabel problem, we transform it into multiple singlelabel prediction problems using oneagainstall strategy and employ ICA to make prediction for each singlelabel problem.
2. Gibbs. This baseline is another ICAtype CC algorithm using the ICA iterative classification framework. In each iteration, Gibbs resamples the label of each node based on the estimated label distribution [16]. We also use one againstall strategy to convert the multilabel problem into multiple singlelabel problems for the Gibbs algorithm.
3.ICML. This baseline is a multilabel CC algorithm proposed by Kongetal. [29]. ICML extends the ICA algorithm to multilabel problems by considering dependencies among the label set in the iteration process.
In the experiments, we use k NN as node classifier for ICA, Gibbs and ICML. The parameter k was automatically selected in the range of 10 to 30 at an increment of 5 using 3fold cross validation on the training set. For the proposed ICAM method, we learn the classifiers M_{ A } and M_{ R }using Markov chain based models to perform separate predictions. We set the value of α in the Markov chain model to be 0.95 as suggested in [23].
Evaluation criteria
We evaluate the performance of our proposed method with four multilabel evaluation measures: average precision, coverage, ranking loss, and oneerror. They are commonly used for multilabel learning algorithm evaluation.
Given a multilabel dataset D = {(x_{ i }, Y_{ i }) 1 ≤ i ≤ m}, where ${x}_{i}\in \mathcal{X}$ is an instance and ${Y}_{i}\subseteq \mathcal{Y}$ is the true labels of x_{ i }, and Y_{ i } = (Y_{i 1}, Y_{i 2}, ..., Y_{ ic }) ∈ {0, 1}^{ c }. Here x_{ i } belongs to the jth label when Y_{ ij } = 1, otherwise Y_{ ij } = 0, and c is the number of possible labels. The evaluation measures are defined using the following two outputs provided by the learning algorithms: s(x_{ i }, l) returns a realvalue that indicates the confidence for the class label l to be a proper label of x_{ i }; rank_{ s } (x_{ i }, l) returns the ranks of class label l derived from s(x_{ i }, l).
where ${\mathcal{R}}_{i}=\left\{\left({l}_{\mathsf{\text{1}}},{l}_{\mathsf{\text{2}}}\right)h\left({x}_{i},{l}_{\mathsf{\text{1}}}\right)\le h\left({x}_{i},{l}_{\mathsf{\text{2}}}\right),\left({l}_{\mathsf{\text{1}}},{l}_{\mathsf{\text{2}}}\right)\in {Y}_{i}\times {\u0232}_{i}\right\}$.
where ${\mathcal{P}}_{i}=\left\{{l}^{\prime}\in {Y}_{i}ran{k}_{s}\left({x}_{i},{l}^{\prime}\right)\le ran{k}_{s}\left({x}_{i},l\right)\right\}$.
The smaller the value of coverage ranking loss and oneerror, the better the performance. As for average precision, the bigger the value the better the performance, we report the results of 1average precision. Thus, for all evaluation metrics, the smaller the value the better the performance.
Results on KDD Cup 2001 data
In this experiment, we test the performance of our proposed ICAM algorithm on the KDD Cup 2001 dataset. We randomly select 50% of data as training set, and use the remaining 50% of data as test set. The experiment is conducted 10 times by randomly selected training/test split (each with a different random seed), and we report the results of mean as well as standard deviation of each compared algorithm. The mean as well as standard deviation of each compared method over the same 10 trails are reported.
The performance (mean ± standard deviation) of compared algorithms on the Yeast protein dataset.
Methods  Coverage  Ranking Loss  Oneerror  1Average Precision 

ICA  4.217 ± 0.273  0.140 ± 0.013  0.042 ± 0.005  0.155 ± 0.005 
Gibbs  4.319 ± 0.195  0.148 ± 0.008  0.043 ± 0.005  0.154 ± 0.006 
ICML  4.409 ± 0.091  0.153 ± 0.006  0.043 ± 0.007  0.162 ± 0.006 
ICAM  3.748 ± 0.164  0.100 ± 0.008  0.041 ± 0.005  0.151 ± 0.005 
We can see from the figure that ICAM (the black line) has the best performance in general. ICAM outperforms other algorithms using different number of training data, especially when the size of training data is small. Specifically, ICAM achieves coverage improvement of 0.4916 over the second best method Gibbs (ICAM:4.2213 versus Gibbs:4.7129) and achieves 0.0439 improvement on ranking loss (ICAM:0.1184 versus ICML:0.1623) when the number of training instance is 200. As the size of training data increases, ICAM consistently achieves better performance than other learning algorithms across all evaluation criteria.
We find that ICAM outperforms the other ICAtype methods substantially in terms of coverage. On the other hand, ICAM only slightly outperforms other methods in terms of oneerror. We note that oneerror and coverage are two different quantitative measures. Oneerror evaluates how many times the topranked label is not in the set of possible labels. Thus, if the goal of a prediction system is to assign a single function to a protein (singlelabel classification), the oneerror is identical to test error. Whilst coverage measures how far we need, on the average, to go down on the list of the labels in order to cover all the possible labels assigned to a protein. Coverage is loosely related to precision at the level of perfect recall [30]. The experimental results indicate that the toprank label predictions from other ICAtype methods are as accurate as those from ICAM, but the predictions from ICAM are more complete than other ICAtype methods. A reasonable explanation for this finding is that the ICAtype methods focused on the singlelabel setting. In this case, the multilabel problem is first transformed into multiple singlelabel prediction problems, and then the ICAtype methods use independent classifiers induced from labeled training data for each problem. Nevertheless, ICAtype approaches ignore the effect of unlabeled data and the interdependence of the protein functions. On the other hand, our proposed ICAM approach is based on Markov chain based transductive learning method that uses both label and unlabeled data for label propagation. The Markov chain based method takes the correlation of the classes into account to effectively compute ranking of labels to an instance. Therefore, ICAM provides an opportunity to leverage the individual ICAtype classifiers to achieve higher coverage of predictions.
Results on KDD Cup 2002 data
To validate the effectiveness of the proposed method when there are only a limited number of positive labeled training data, we conduct additional experiments on a relatively large scale Yeast dataset from KDD Cup 2002. It consists of 4507 instances (i.e., genes) from experiments with a set of cerevisiae (Yeast) strains. Each instance is described by various types of information that characterize the gene associated with the instance. The data sources for describing the instances include abstracts from the scientific literature (MEDLINE), gene localization and functions. We represent each instance by a feature vector with 20545 dimensions. The pairs of genes whose encoded proteins physically interact with one another. Such proteinprotein interaction network consists of 1218 links.
Each instance is labeled with one of three class labels "nc", "control" and "change". The "change" label indicates instances in which the activity of the hidden system was significantly changed, but the activity of the control system was not significantly changed. The goal of the KDD Cup 2002 task is to learn a model that can accurately predict the genes that affect the hidden system but not the control system. In this case, the positive class consists of those genes with "change" labels and the negative class consists of those genes with either "nc" or the "control" label. This partition is highly imbalanced. The rate of positive instances is only 1.2%. Therefore, we base our evaluation analysis on Receiver Operating Characteristic (ROC) curves, which reflect the true positive rate of a classifier as a function of its false positive rate. ROC curves are commonly used for evaluating highly skewed binary classification problems. Recent study has shown that ROC curves have a deep connection to the precisionrecall (PR) curves [32].
Experiments on collaboration networks
In this section, we compare the performance of the proposed ICAM algorithm with other collective classification algorithms on 2 multilabel collaboration networks datasets to validate the effectiveness of the proposed method more thoroughly. These collaboration networks datasets are collected from the DBLP computer science bibliography website, and used in prior work to study the multilabel collective classification problems [29]. Their characteristics are listed in Table 2. Specifically, we extract DBLP coauthorship networks that contain authors who have published papers during the years 20002010 as the nodes of the networks, and link any two authors who have collaborated with each other. At each node, we extract a bagofwords representation of all the paper titles published by the author, and used it as the attributes of the node. Each author has one (or multiple) research topic(s) of interests from 6 research areas. The representative conferences from each area are selected as class labels. If an author has published papers in any of these conferences, we assume the author is interested in the corresponding research class. The task is to classify each author with a set of multiple research classes of interest. The conferences corresponding to the class labels of two datasets (DBLPA and DBLPB) are given as follows.
 1
Database: ICDE, VLDB, SIGMOD, PODS, EDBT
 2
Data Mining: KDD, ICDM, SDM, PKDD, PAKDD
 3
Artificial Intelligence: IJCAI, AAAI
 4
Information Retrieval: SIGIR, ECIR
 5
Computer Vision: CVPR
 6
Machine Learning: ICML, ECML
 1
Algorithms & Theory: STOC, FOCS, SODA, COLT
 2
Natural Language Processing: ACL, ANLP, COLING
 3
Bioinformatics: ISMB, RECOMB
 4
Networking: SIGCOMM, MOBICOM, INFOCOM
 5
Operating Systems: SOSP, OSDI
 6
Parallel Computing: POD, ICS
The description of experimental datasets used in the experiments on collaboration networks.
Datasets  Number of Instances  Number of Attributes  Number of Links  Number of Classes 

DBLPA  23,806  12,588  150,042  6 
DBLPB  16,020  8,595  95,108  6 
Conclusion
In this paper, we studied the label deficiency problem in collective classification (CC). We showed the protein function prediction problem from PPI networks can be formulated as a problem, and proposed an effective and novel Markov chain based CC learning algorithm, namely ICAM. It focuses on how to use labeled and unlabeled data to enhance the classification performance of PPI network data. Experimental results on two realworld Yeast PPI network datasets and two collaboration network datasets showed that our proposed ICAM method is effective in learning CC tasks in the paucity of labeled data. In future, we will consider other semisupervised learning techniques for collective classification in PPI network data and we will also research on other complex biological networks, such as heterogeneous network classification.
Declarations
Acknowledgements
This research was supported in part by NSFC under Grant No.61272538, National Key Technology R&D Program of MOST China under Grant No. 2012BAK17B08, National Commonweal Technology R&D Program of AQSIQ China under Grant No.201310087, and Shenzhen Science and Technology Program under Grant No.CXY201107010206A. M.K. Ng's research was supported in part by Centre for Mathematical Imaging and Vision, HKRGC Grant No. 201812.
Declarations
The publication costs for this article were funded by the corresponding author (Y. Ye).
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 2, 2014: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S2.
Authors’ Affiliations
References
 Eisenberg D, Marcotte EM, Xenarios I, Yeates TO: Protein function in the postgenomic era. Nature. 2000, 405 (6788): 823826. 10.1038/35015694.View ArticlePubMedGoogle Scholar
 Pandey G, Kumar V, Steinbach M, Meyers CL: Computational Approaches to Protein Function Prediction. 2012, WileyInterscienceGoogle Scholar
 Clare A, King RD: Predicting gene function in saccharomyces cerevisiae. Bioinformatics. 2003, 19 (suppl 2): 4249.View ArticleGoogle Scholar
 Borgwardt KM, Ong CS, Schönauer S, Vishwanathan S, Smola AJ, Kriegel HP: Protein function prediction via graph kernels. Bioinformatics. 2005, 21 (suppl 1): 4756. 10.1093/bioinformatics/bti1007.View ArticleGoogle Scholar
 Sleator RD, Walsh P: An overview of in silico protein function prediction. Archives of microbiology. 2010, 192 (3): 151155. 10.1007/s0020301005499.View ArticlePubMedGoogle Scholar
 Altschul SF: Evaluating the statistical significance of multiple distinct local alignments. Theoretical and Computational Methods in Genome Research. 1997, 114.View ArticleGoogle Scholar
 Sigrist CJ, Cerutti L, De Castro E, LangendijkGenevaux PS, Bulliard V, Bairoch A, Hulo N: Prosite, a protein domain database for functional characterization and annotation. Nucleic acids research. 2010, 38 (suppl 1): 161166.View ArticleGoogle Scholar
 Finn RD, Mistry J, SchusterBöckler B, GriffithsJones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R: Pfam: clans, web tools and services. Nucleic acids research. 2006, 34 (suppl 1): 247251.View ArticleGoogle Scholar
 Sharan R, Ulitsky I, Shamir R: Networkbased prediction of protein function. Molecular systems biology. 2007, 3 (1):Google Scholar
 Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from proteinprotein interactions. Bioinformatics. 2006, 22 (13): 16231630. 10.1093/bioinformatics/btl145.View ArticlePubMedGoogle Scholar
 Nabieva E, Jim K, Agarwal A, Chazelle B, Singh M: Wholeproteome prediction of protein function via graphtheoretic analysis of interaction maps. Bioinformatics. 2005, 21 (suppl 1): 302310. 10.1093/bioinformatics/bti1054.View ArticleGoogle Scholar
 Brun C, Chevenet F, Martin D, Wojcik J, Gueénoche A, Jacq B: Functional classification of proteins for the prediction of cellular function from a proteinprotein interaction network. Genome biology. 2004, 5 (1): 66.View ArticleGoogle Scholar
 Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T: Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006, 22 (8): 10211023. 10.1093/bioinformatics/btl039.View ArticlePubMedGoogle Scholar
 McDowell L, Aha DW: Semisupervised collective classification via hybrid label regularization. Proceedings of the 29th International Conference on Machine Learning. 2012Google Scholar
 Neville J, Jensen D: Iterative classification in relational data. Proc AAAI2000 Workshop on Learning Statistical Models from Relational Data. 2000, 1320.Google Scholar
 Jensen D, Neville J, Gallagher B: Why collective inference improves relational classification. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 593598.Google Scholar
 Macskassy SA, Provost F: Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research. 2007, 8: 935983.Google Scholar
 Shi X, Li Y, Yu P: Collective prediction with latent graphs. Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 11271136.Google Scholar
 McDowell L, Aha D: Semisupervised collective classification via hybrid label regularization. 2012, 975982. arXiv preprint arXiv:1206.6467Google Scholar
 Mcdowell LK, Gupta KM, Aha DW: Casebased collective classification. Proceedings of the 20th International FLAIRS Conference. 2007, 399404.Google Scholar
 Macropol K, Can T, Singh A: Rrw: repeated random walks on genomescale protein networks for local cluster discovery. BMC bioinformatics. 2009, 10 (1): 28310.1186/1471210510283.PubMed CentralView ArticlePubMedGoogle Scholar
 Li X, Ng MK, Ye Y: Multicomm: Finding community structure in multidimensional networks. IEEE Transactions on Knowledge and Data Engineering. 2013, 99 (1):Google Scholar
 Wu Q, Ng MK, Ye Y: Markovmiml: A markov chainbased multiinstance multilabel learning algorithm. Knowledge and Information Systems.Google Scholar
 Ng MK, Wu Q, Ye Y: Cotransfer learning via joint transition probability graph based method. Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining. 2012, 19. ACMView ArticleGoogle Scholar
 Tong H, Faloutsos C, Pan JY: Random walk with restart: fast solutions and applications. Knowledge and Information Systems. 2008, 14 (3): 327346. 10.1007/s1011500700942.View ArticleGoogle Scholar
 Zelnikmanor L, Perona P: Selftuning spectral clustering. Advances in Neural Information Processing Systems. 2004, 16011608.Google Scholar
 Haveliwala TH: Topicsensitive pagerank: A contextsensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering. 2003, 784796.Google Scholar
 Cheng J, Hatzis C, Hayashi H, Morishita S, Page D, Sese J: Kdd cup 2001 report. ACM SIGKDD Explorations Newsletter. 2002, 3 (2): 4764. 10.1145/507515.507523.View ArticleGoogle Scholar
 Kong X, Shi X, Yu PS: Multilabel collective classification. SIAM International Conference on Data Mining (SDM). 2011, 618629.Google Scholar
 Schapire RE, Singer Y: Boostexter: A boostingbased system for text categorization. Machine learning. 2000, 39 (23): 135168.View ArticleGoogle Scholar
 Zhou ZH, Zhang ML, Huang SJ, Li YF: Multiinstance multilabel learning. Artificial Intelligence. 2012, 176 (1): 22912320. 10.1016/j.artint.2011.10.002.View ArticleGoogle Scholar
 Davis J, Goadrich M: The relationship between precisionrecall and roc curves. Proceedings of the 23rd International Conference on Machine Learning. 2006, 233240.Google Scholar
 Chang CC, Lin CJ: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2011, 2 (3): 27Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.