Learning a Markov Logic network for supervised gene regulatory network inference
 Céline Brouard^{1}Email author,
 Christel Vrain^{2},
 Julie Dubois^{1, 2},
 David Castel^{3},
 MarieAnne Debily^{3, 4} and
 Florence d’AlchéBuc^{1, 5}Email author
DOI: 10.1186/1471210514273
© Brouard et al.; licensee BioMed Central Ltd. 2013
Received: 16 August 2012
Accepted: 3 September 2013
Published: 12 September 2013
Abstract
Background
Gene regulatory network inference remains a challenging problem in systems biology despite the numerous approaches that have been proposed. When substantial knowledge on a gene regulatory network is already available, supervised network inference is appropriate. Such a method builds a binary classifier able to assign a class (Regulation/No regulation) to an ordered pair of genes. Once learnt, the pairwise classifier can be used to predict new regulations. In this work, we explore the framework of Markov Logic Networks (MLN) that combine features of probabilistic graphical models with the expressivity of firstorder logic rules.
Results
We propose to learn a Markov Logic network, e.g. a set of weighted rules that conclude on the predicate “regulates”, starting from a known gene regulatory network involved in the switch proliferation/differentiation of keratinocyte cells, a set of experimental transcriptomic data and various descriptions of genes all encoded into firstorder logic. As training data are unbalanced, we use asymmetric bagging to learn a set of MLNs. The prediction of a new regulation can then be obtained by averaging predictions of individual MLNs. As a side contribution, we propose three in silico tests to assess the performance of any pairwise classifier in various network inference tasks on real datasets. A first test consists of measuring the average performance on balanced edge prediction problem; a second one deals with the ability of the classifier, once enhanced by asymmetric bagging, to update a given network. Finally our main result concerns a third test that measures the ability of the method to predict regulations with a new set of genes. As expected, MLN, when provided with only numerical discretized gene expression data, does not perform as well as a pairwise SVM in terms of AUPR. However, when a more complete description of gene properties is provided by heterogeneous sources, MLN achieves the same performance as a blackbox model such as a pairwise SVM while providing relevant insights on the predictions.
Conclusions
The numerical studies show that MLN achieves very good predictive performance while opening the door to some interpretability of the decisions. Besides the ability to suggest new regulations, such an approach allows to crossvalidate experimental data with existing knowledge.
Background
Gene regulatory network inference has received a lot of attention over the last decade due to the abundance of highthroughput data. A gene regulatory network (see for instance [1]) usually refers to a set of genes whose expression varies over time due to the inhibitive or inductive roles of regulators. Deciphering these regulations at work in the cell will provide a thorough understanding of the cell behaviour and will eventually aid in controlling or repairing when needed. Inference of gene regulatory networks as a problem of empirical inference fits the framework of machine learning as described in [2]. Three main families of inference algorithms have been developed so far: (1) unsupervised modelfree approaches that use information theory to extract a nonoriented graph of dependence between variables, (2) unsupervised reversemodeling approaches that model the network behavior as a (dynamical) system [3] and (3) supervised edge prediction approaches that focus on the graph of regulation and only predict the presence/absence of regulations [47]. In the first family, relevance networks like ARACNE [8], CLR [9] and TDARACNE [10] use a mutual information score between the expression profiles of each pair of genes and given a threshold, decide to predict an interaction or not. The second family is based on model of behavior of the network, either static or dynamic. In case of static models devoted to steadystate data, Gaussian Graphical Models (GGM) [11, 12] allow to build a linear regression model that expresses how one gene can be predicted using the set of remaining genes. Interestingly, GGM build a network using partial correlation coefficients, providing a stronger measure of dependence compared to correlation coefficients used in relevance networks. A powerful approach to regression and network inference based on an ensemble of randomized regression trees [13] has also proven to outperform competitors in inferring gene regulatory networks in recent DREAM competitions. Bayesian networks [14] provide another important approach in static modeling. Learning a Bayesian network involves learning the acyclic oriented graph that describes the parental relations between variables and the conditional probabilities that govern the behavior of the network. While appropriate to gene regulation cascades, Bayesian networks cannot, however, model cycles in the network. Other models incorporating dynamical modeling have therefore been proposed in the literature: dynamical Bayesian networks and differential equations [1517].
Taking a different angle, supervised edge prediction methods build a decision function that associates a class label to a pair of vertices (genes or proteins) without searching for a model of the network behavior. These methods assume that the network to infer is partially known and that information on the vertices are available. They have been mainly developed for proteinprotein interaction network inference, using kernel methods [1823]. The principle underlying [20, 21] is to build pairwise Support Vector Machines (SVM) with an appropriate definition of kernels between pairs of proteins from a kernel defined between individual proteins. Pairwise kernels can also be combined into a linear combination (usually an average) to deal with multiple sources of information. In [23], another point of view is taken: local models (still SVMs) are attached to each target protein in order to predict whether a candidate protein interacts with the considered target, and these models are then combined. Recently, the work of [22] has shown that the local model is equivalent to a pairwise SVM considering a local definition of a pairwise kernel.
In the case of gene regulatory network inference, the supervised setting of edge prediction has been explored less. It was first introduced by Qian et al. [4] using gene expression as unique descriptor and further developed by Mordelet et al. with the SIRENE method [5]. Similarly to [23], SIRENE estimates a local model for each transcription factor and then combines all local models together. The method requires a list of known transcription factors that serve as targets. Other advances in supervised edge inference concern with the problem of lack of true negative examples and therefore focus on learning from positive only and unlabeled examples. Some methods develop strategies to select reliable negative examples from the unlabeled set and then solve a classical balanced binary classification problem [24, 25]; others adjust the probability of being positive estimated by a classifier trained on positive and unlabeled examples [6, 7, 26].
Choosing between the three kinds of network inference methods, namely modelfree, modeldriven and supervised approaches, relies on the goal of the study. Modelfree approaches give a good first network approximation when only one kind of data is available. Reversemodeling delivers a model of the network that can be used to predict its behavior but requires a sufficient amount of observations, if possible acquired with different initial conditions or perturbations. Supervised edge prediction is relevant when a sufficiently large set of regulations is known a priori and various sources of gene annotations are available. It will be especially meaningful when the biologist wants to increase the corpus of existing knowledge.
This paper deals with the latter prediction problem. We assume that a directed graph of regulations is known partially for a target set of genes. For instance, it is the result of the biologists’s experience and careful mining of the literature. Besides the graph structure, we also suppose that a set of various descriptors of genes and their products are available for the target set of genes, such as gene expression data, Gene Ontology (GO) annotation, proteinprotein interaction and also genes location on chromosomes. Our goal is to build a decision function that predicts if an ordered pair of regulator and regulee candidates belongs to the class Regulation or No Regulation.
In this work, we address four issues raised by supervised edge prediction and implement the whole approach on a new experimental dataset related to the ID2 genetic regulatory network in human keratinocytes. The first issue concerns the available sources of information about genes and proteins. These sources provide multiple views of the data which are by definition heterogeneous and very often highly structured. The second issue is related to network inference interpretability: many of the proposed methods are black boxes, while biologists are interested in how the predictions have been obtained. The third issue, as raised by many authors, deals with imbalanced data: very few positive examples of “regulation” is available compared to the huge number of negative examples of “no regulation”. Finally, the fourth issue we tackle in this paper, concerns the performance assessment of a supervised edge prediction tool. Although the best performance assessment comes when biologists go back to the experimental laboratory to test prediction of new regulations with additional and independent experiments, there is a lot of room for in silico studies to measure the ability of an edge prediction tool to provide evidence for regulations. The first and the second issue call for a common framework of representation for all the views of the data. For that purpose, we use firstorder logic to represent both data and background knowledge. In order to benefit from the tools of statistical learning and to avoid some of the weaknesses of pure inductive logic programming raised, for instance, in [27], we choose a Markov Logic network (MLN) [28, 29] as the edge predictor. MLN allows to make predictions using a set of weighted firstorder logic rules, thus providing interesting insights on decisions. The third issue is systematically solved by using asymmetric bagging [30, 31], a well known and generic method that converts a classifier devoted to wellbalanced tasks to unbalanced tasks, which was also discussed in [6] among other approaches. It is worth noticing that we do not solve the issue of false negative, e.g. the fact that among the “no regulation” examples, there might be “regulation” examples that have not been validated yet. The reader interested by this issue is invited to study the works of Cerulo et al. [7] and Mordelet & Vert [6]. Finally, as a fourth contribution, we define and perform three typical numerical studies that can be drawn in order to test a machine learning method devoted to edge prediction: one is a basic test with artificially balanced samples in which we just test the ability of the learning method to obtain good performance; the second one consists of building a regulation predictor in a realistic setting from unbalanced datasets using asymmetric bagging and measuring its ability to discover regulations that were not known before; in the third last study, we proceed in the same way but test the ability of the classifier to label correctly pairs of genes with genes from the training network and genes coming from a new candidate set. In order to assess the performance of the MLNbased approach, we define a pairwise Support Vector Machine (SVM) devoted to ordered pairs of genes and use it as a baseline using a straightforward simplification of the tensor product pairwise kernel. Kernelbased methods as well as firstorder logic provide a framework to take into account different sources and features of the data: in this study, two simple definitions of pairwise kernels that combine multiple pairwise kernels expressing heterogeneous information are proposed. While the goal of the study is to take advantage of the heterogeneity of features to describe a pair of genes, we also study the behavior of MLN compared to pairwise SVM in the case of single source of quantitative information such as gene expression.
In order to show the interest of solving these four issues, we have applied our approach to the ID2 genetic regulatory network in human keratinocytes and a new dataset of gene expression using RNA interference. The ID2 protein (Inhibitor of Differentiation 2) acts as a negative regulator of basic helixloophelix transcription factors. Previous studies have suggested a potential role for ID2 in epidermis homeostasis reflected by the high expression level of ID2 in proliferating keratinocytes and its downregulation upon the onset of differentiation [32]. However, the precise implications of ID2 in the process, and in particular its genetic interactions, remain largely unknown. In an attempt to decipher the ID2 genetic regulation network in human keratinocytes, we conducted a transcriptomic analysis by microarray experiments of HaCaT cells presenting stable overexpression or transient knockdown achieved by RNA interference of ID2 expression. As a starting point, we retrieved the regulatory networks associated with the differentially expressed genes in cells with high and low level of ID2 from the Ingenuity Pathway Analysis (IPA) database. We selected a subset of these networks with ontologies of interest for the biologists (cell cycle regulation, cancer, gene expression and signal transduction), merged the corresponding networks and kept only the transcriptional/expression regulations between the genes. The resulting network was finally used to label the couples of genes as a training set.
Methods
Learning directed edges from a set of known regulations
Let$\mathcal{G}$ be the set of genes of interest. We want to learn a function h that takes the descriptors of a gene G_{1} and a gene G_{2} and predicts if the gene G_{1} regulates G_{2}. Two types of descriptors are considered: descriptors of genes, for instance protein locations within the cell, and relationships between genes reflecting, for instance, if two genes are located on the same chromosome. Let us denote by$\mathcal{X}$ the set of descriptors on genes and by$\mathcal{R}$ the set of relations. A special descriptor expresses the class: given an ordered pair of two genes G_{1} and G_{2}, it is true when G_{1} regulates G_{2}.
In this work we have chosen to use a firstorder logic representation, which allows for an easy representation of several objects (here, genes) and their relationships. Facts representing information about objects and their relations are expressed by atomic expressions, called atoms. They are written P(t_{1},…,t_{ n }), where P is a predicate and t_{1},…,t_{ n } are terms; a term being either a variable or a constant. In the remainder strings corresponding to constants will start with uppercase letters and strings corresponding to variables with lowercase letters. An atom is said to be ground if all its variables are set to specific values. A ground atom can be true or false, depending of the truth value of the property it expresses. It can therefore be seen as a boolean variable.
where ${\mathcal{X}}_{i}={x}_{i}$ represents the description of G_{ i }, ${\mathcal{R}}_{12}={r}_{12}$ represents the relations between G_{1} and G_{2} and $\mathcal{B}=b$ represents the background knowledge. θ is a threshold, whose value will be discussed in the experiments. As shown by this formalization, the learning framework we consider is beyond the classical framework of machine learning in which data is represented only by attributes; it belongs to the ILP (Inductive Logic Programming) domain, a subfield of machine learning that aims at studying relational learning in firstorder formalisms [33].
The model we have chosen is a Markov Logic Network, as introduced in [29]. Such a model is defined as a set of weighted firstorder formulas. In this paper, we consider only a subset of firstorder logic, composed of rules A_{1}∧…∧A_{ n } ⇒ Regulates(g_{1},g_{2}), where A_{1}, …, A_{ n } are atoms. Such restrictions correspond to Horn clauses. The lefthand side of the rule (A_{1}∧…∧A_{ n }) is called the body of the rule whereas the righthand side is called the head of the rule.
Learning a Markov Logic network
Statistical relational learning (SRL) relates to a subfield of machine learning that combines firstorder logic rules with probabilistic graphical frameworks. Among the promising approaches to SRL, Markov Logic Networks (MLNs) introduced by Richardson and Domingos [28, 29] are an appealing model. An MLN$\mathcal{M}$ is defined by a set of formulas F = {f_{ i }i = 1,…,p} and a weight vector w of dimension p, where the clause f_{ i } has an associated weight w_{ i } (negative or positive) that reflects its importance. Therefore, an MLN provides a way of softening firstorder logic and encapsulating the weight learning into a probabilistic framework.
where the predicate Processbio(Gname,Proc) says that gene G is involved in the biological process annotation Proc of Gene Ontology [34].
 1.
Processbio(B,Cell_proliferation) ∧ Processbio(A,Negative_regulation_of_cell_proliferation) ⇒ Regulates(A,B)
 2.
Processbio(A,Cell_proliferation) ∧ Processbio(B,Negative_regulation_of_cell_proliferation) ⇒ Regulates(B,A)
 3.
Processbio(A,Cell_proliferation) ∧ Processbio(A,Negative_regulation_of_cell_proliferation) ⇒ Regulates(A,A)
 4.
Processbio(B,Cell_proliferation) ∧ Processbio(B,Negative_regulation_of_cell_proliferation) ⇒ Regulates(B,B)
where n_{ i }(x) is the number of true groundings of the clause f_{ i } in the world x, and $Z={\sum}_{x}P(X=x)$ is the partition function used for normalization.
For instance, if we consider a world where Processbio(B, Cell_proliferation), Processbio(A,Negative_regulation_ of_cell_proliferation) are true and the other ground atoms are false, then the first instantiated clause is false in this world, whereas all the other instantiated clauses are true (because their premises are false and the logical implication is false). Thus, the number of true groundings of the clause (1) is 3.
where n_{ i }(x,y) is the number of true groundings of f_{ i } in the world (x,y).
Learning the candidate rules with Aleph
The system Aleph, developed by Srinivasan [35], is a well known ILP learner that implements the method proposed in [36]. Aleph, like other relational learners, takes as input ground atoms corresponding to positive and negative examples and background knowledge. It also needs language biases, which restrict the set of clauses that can be generated, thus allowing to reduce the size of the search space. These restrictions can correspond to information specified on the predicates, like the place where they occur in the rule, the types of their arguments or the way they will be used (instantiated or not). In our case, we specified that the predicate Regulates occurs in the head of the rule, and the other ones in the body of the rule. Other constraints, such as the maximum number of atoms in a clause or the number of variables, can be defined in order to restrict the form of the rules that can be learned.
 1.
Select a positive example not yet covered by a rule
 2.
Build the most specific clause r that covers this example and that satisfies the language biases from the background knowledge. This clause is called the “bottom clause”.
 3.
Search a clause more general than the bottom clause: perform a topdown search (from the most general to the most specific clause) in the search space bounded by r.
 4.
Add the clause with the best score to the current theory and prune redundant clauses.
 5.
Repeat until all positive examples are covered.
Weight learning
Richardson & Domingos [29] proposed performing generative weight learning for a fixed set of clauses by optimizing the pseudo loglikelihood. Several approaches have been proposed for discriminative learning, where the conditional loglikelihood is optimized instead [37, 38]. Huynh & Mooney [39] introduced a weight learning algorithm that targets the case of MLNs containing only nonrecursive clauses. In this particular case, each clause contains only one target predicate, thus the grounding of the clauses will contain only one grounded target predicate. This means that the query atoms are all independent given the background atoms. Because of this special assumption on the structure of the model, their approach can perform exact inference when calculating the expected number of true groundings of a clause. Recently, Huynh & Mooney [40] have introduced a discriminative weight learning method based on a maxmargin framework.
where ${\mathcal{F}}_{{Y}_{j}}$ is the set of clauses concluding on the target atom Y_{ j }, and ${n}_{i}(x,{y}_{[{Y}_{j}={y}_{j}]})$ is the number of true groundings of the ith clause when the atom Y_{ j } is set to the value Y_{ j }. For finding the vector of weights w optimizing this objective function, we used the limitedmemory BFGS algorithm [41] implemented in the software ALCHEMY [42].
Materials for inference of the ID2 genetic regulatory network
Data
We conducted a transcriptomic analysis of microarray experiments of HaCaT cells presenting distinct expression levels of ID2. We analyzed three conditions: wildtype cells (wt), stable overexpression (prcID2) or transient knockdown achieved by RNA interference (siID2) of ID2 expression and their corresponding controls. Differentially expressed genes in prcID2 or siID2 versus the corresponding control cells were identified by a ttest analysis using a pvalue cutoff of 0.005, a foldchange threshold of 1.5 and Benjamini & Hochberg multiple testing correction [43]. The resulting genes were mapped to genetic networks as defined by IPA tools and the significantly enriched networks associated with cell cycle regulation, cancer, gene expression and signal transduction were merged. In this merged network, only edges and their associated nodes (genes) corresponding to expression/transcriptional regulations were conserved. Genes with incomplete information for all the features were removed. This process led to the selection of a network containing a set of 63 genes, denoted by ${\mathcal{G}}_{A}$.
In order to use MLNs, we need to describe known properties of genes within the firstorder logic setting.
Encoding data

The predicate Expwt(Gname,L) states that the expression level of gene G in the wildtype cells is L. In the following results, expression levels values were discretized using equal width discretization [44]: we divided the interval of gene expression values into 5 intervals of equal width.

Similarly, the predicate that states that the expression level of gene G is L is Expsiid2(Gname,L) when the expression of ID2 has been decreased, and Expprcid2(Gname,L) when it has been increased.
Three other predicates express an increase, a decrease or a lack of change of the expression level between the experience on the wildtype cells and the other experiences: Expmore(Gname,Exp), Expless(Gname,Exp) and Expsame(Gname,Exp), where Exp is either Prcid2 or Siid2.
In order to characterize regulatory interactions, we used other features describing genes. Some of these features concern proteins and not directly genes.
● Physical interaction between proteins: Physical interaction between proteins can provide a hint about the role played by the genes coding for these proteins. In our study, we used the protein interaction data from the IntAct database [45]. We encoded the information of a physical interaction by a predicate containing the name of the genes that are assumed to code the proteins: Inteprot(G_{1}name,G_{2}name).
● Subcellular localization of proteins:
● Another interesting information about proteins is their localization in the cell. All proteins were analyzed using the Ingenuity Pathway Analysis Knowledge Base (Ingenuity Systems, www.ingenuity.com), and we encoded the information on the subcellular localization by a predicate ProtLocCell(Gname,Loc) where G is the name of the gene that codes the protein and Loc is the name of the cellular compartment where the protein was found.
● Biological processes:
● We used Gene Ontology [34] to describe the genes by the biological processes in which they are involved. To do so, we have defined a predicate Processbio(Gname,Proc), which says that a gene G is involved in the process Proc.
● Chromosomal location of genes:
● We extracted the genes location on chromosomes and chromosomal bands from the Entrez Gene database [46]. This information is encoded by the predicates Locchro(Gname,Chro) and Locband (Gname,Arm_begin,Band_begin,Arm_end,Band_end). From these predicates, we built two other predicates that we used instead: Samechro(G_{1}name,G_{2}name) and Sameband(G_{1}name,G_{2}name). These predicates provide information on the proximity between the gene locations of G_{1} and G_{2}.
Choice of a baseline for comparison
This pairwise kernel is the asymmetric version of the kernel proposed in [20, 22] for pairs of proteins to solve supervised proteinprotein interaction network inference tasks. Alternative definitions of pairwise kernel have also been proposed, like the metric learning pairwise kernel [47] and the cartesian kernel [48, 49].
where $\stackrel{\u0304}{k}({G}_{j},{G}_{k})=\frac{1}{6}{\sum}_{i=1}^{6}{k}_{i}({G}_{j},{G}_{k})$.
Let us notice that kernels are appropriate tools to gather heterogeneous sources of information into the same framework and that combining multiple kernels allows active data integration. Once an SVM is built it is hard to open the “black box” and interpret the decision function.
Results and discussion
Description of the experimental studies
Summary of the three experimental studies
Study  Positive set  Negative set  Protocol 

1  ${R}_{1}^{+}$  R1,i −,i = 1,…,30  10CV on ${R}_{1}^{+}\cup {R}_{1,i}^{}$ 
2  ${R}_{1}^{+}$  R1,i−,i = 1,…,30  AB on ${R}_{1}^{+}\cup {R}_{1,i}^{}$, test on ${R}_{2}^{+}\setminus {R}_{1}^{+}$ 
3  ${R}_{2}^{+}$  R2,i−,i = 1,…,30  AB on ${R}_{2}^{+}\cup {R}_{2,i}^{}$, test on R_{3} 
In the first study, we considered the set of 106 regulations provided by Ingenuity in 2007 between the genes in ${\mathcal{G}}_{A}$, denoted by ${R}_{1}^{+}$. All the unknown regulations ($\left\overline{{R}_{1}^{+}}\right=3863$) were considered as negative examples. The goal of this first study was to test a Markov Logic Network on a wellbalanced classification task.
For the second study, we considered the set ${R}_{2}^{+}$ of regulations provided by Ingenuity in 2009 for the same set of genes ${\mathcal{G}}_{A}$. We figured out that 51 new regulations have been discovered by Ingenuity between 2007 and 2009 and we were interested in the prediction task on the updated network. Usual bagging applied to an unbalanced dataset will provide biased classifiers. To build a classifier appropriate for an unbalanced prediction task, we used asymmetric bagging [30, 31].
In supervised classification, asymmetric bagging consists of performing random sampling only on the overrepresented class, such that the number of examples in the subsample is equal to the number of examples in the underrepresented class. This way, each generated predictor was trained on a balanced dataset. Their predictions on the test set were combined to provide a single prediction. Studies described in [30, 31] have shown that asymmetric bagging provide better results than normal bagging on unbalanced datasets.
In the last study, we solved a network completion task in real conditions. We selected a new set of genes ${\mathcal{G}}_{B}$ and tried to infer the known regulations between the genes of ${\mathcal{G}}_{B}$ and ${\mathcal{G}}_{A}$. Asymmetric bagging was also applied.
The lists of genes in ${\mathcal{G}}_{A}$ and ${\mathcal{G}}_{B}$ are given in the Additional file 1 and details on Aleph parameters are available in the Additional file 2. Regarding Alchemy, we used the implementation of the discriminative weights learning procedure and tested different values of the regularization parameter λ.
Evaluation metric
We used area under the ROC (resp. PrecisionRecall) curves as evaluation metrics, denoted by AUCROC (resp. AUCPR). These curves were obtained by tuning the threshold θ from 0 to 1 in order to predict regulations from posterior probabilities. It is well known that a ROC curve shows the behavior of the True Positive Rate (also called recall), $\mathit{\text{TPR}}=\frac{\mathit{\text{tp}}}{p}$, according to the value of the False Positive Rate, $\mathit{\text{FPR}}=\frac{\mathit{\text{fp}}}{n}$, while a PR curve assesses the behavior of the precision, $\mathit{\text{Precision}}=\frac{\mathit{\text{tp}}}{\mathit{\text{tp}}+\mathit{\text{fp}}}$, according to the value of the recall. A ROC curve expresses the price to be paid in terms of wrongly predicted negative examples when retrieving correctly a number of positive cases. A PR curve, usually plot in information retrieval tasks, puts emphasis on the confidence of positive predictions. We standardized our precisionrecall curves similarly to what was proposed in [50].
Average crossvalidation measurements on balanced samples
We first tested the performance of an MLN and compared it to that of a pairwise SVM on a wellbalanced classification task. To do that, we subsampled the negative example set and generated subsamples of negative examples of the same size as the positive examples set.
Averaged AUCs for crossvalidation measurements on balanced samples using MLNs
MLN  

λ  AUCROC  AUCPR 
20  80.8 ± 6.1  82.7 ± 5.4 
50  84.3 ± 3.5  85.5 ± 4.0 
100  84.4 ± 2.8  86.2 ± 3.2 
500  83.4 ± 2.7  86.0 ± 2.7 
750  83.3 ± 2.8  85.8 ± 2.8 
Averaged AUCs for crossvalidation measurements on balanced samples using SVMs
Pairwise SVM  

C  Pairwise sum  Sum  
AUCROC  AUCPR  AUCROC  AUCPR  
0.001  70.9 ± 3.5  73.1 ± 3.4  82.5 ± 2.3  84.3 ± 2.1 
0.01  70.9 ± 3.5  73.1 ± 3.4  82.5 ± 2.3  84.3 ± 2.1 
0.1  70.9 ± 3.5  73.1 ± 3.4  82.5 ± 2.3  84.3 ± 2.1 
1  76.4 ± 3.1  78.7 ± 3.0  85.2 ± 2.8  87.3 ± 2.5 
10  77.5 ± 3.2  79.4 ± 3.5  84.3 ± 3.4  86.3 ± 3.1 
100  77.5 ± 3.2  79.4 ± 3.5  84.3 ± 3.4  86.3 ± 3.1 
1000  77.5 ± 3.2  79.4 ± 3.5  84.3 ± 3.4  86.3 ± 3.1 
Prediction on the updated graph
In this second study, we addressed a network completion task while keeping the same set of nodes. Two years after the dataset described previously was obtained, the tool Ingenuity was used again to provide an updated set ${R}_{2}^{+}$ of regulations between the 63 genes of interest on this date. We noticed that 51 new regulations were discovered by Ingenuity between these two dates. We were therefore interested in the prediction task of the updated graph, i.e. to see if we could retrieve these new regulations from the data of 2007. We used the dataset ${R}_{1}^{+}$ from 2007 containing 106 regulations as positive training set and tried to infer the 51 new regulations in ${R}_{2}^{+}\setminus {R}_{1}^{+}$ using asymmetric bagging. To that end, we randomly sampled 30 negative examples training sets ${R}_{1,i}^{}$, i = 1,…,30 with ${R}_{1,i}^{}\subseteq \overline{{R}_{1}^{+}}\setminus {R}_{2}^{+}$ and $\left{R}_{1,i}^{}\right=\left{R}_{1}^{+}\right$.
We selected the threshold maximizing the averaged F_{1}measure, that is the value maximizing precision and recall at the same time.
Prediction of regulations on the updated network
Bagged MLNs  

λ  TPR  
50  64.7  
100  72.6  
500  80.4  
750  84.3  
1000  9 0 . 2  
2000  88.2  
5000  84.3  
Bagged pairwise SVMs  
C  Pairwise sum  Sum 
TPR  TPR  
0.001  9 0 . 2  58.8 
0.01  88.3  58.8 
0.1  88.3  58.8 
1  74.5  52.9 
10  64.7  43.1 
100  64.7  43.1 
1000  64.7  43.1 
Prediction with a new set of genes
For the third statistical analysis, we addressed a network completion task when new candidate nodes are added. We used a dataset refined in the biology laboratory. 209 high confidence differentially expressed genes in prcID2 versus the corresponding control cells were identified. From these genes, we selected 37 genes that were not part of ${\mathcal{G}}_{A}$ and for which we had an annotation for each predicate. These genes were also chosen from the ones that had at least one regulation link with one of the genes from ${\mathcal{G}}_{A}$ or with one gene of this new set. From these 37 genes, we selected a subset of 24 genes, called ${\mathcal{G}}_{B}$, that had at least a biological process annotation from GO in common with genes from ${\mathcal{G}}_{A}$. The goal of this study was to try to complete a known network using an additional set of candidates genes, which is usually the problem of interest for the biologists. We used Ingenuity to retrieve the known regulations between genes from ${\mathcal{G}}_{A}$ and ${\mathcal{G}}_{B}$, being aware that when no regulation is mentioned in the literature, it does not mean that it does not exist but only that it has not been discovered yet. We called this set ${R}_{3}^{+}$.
Prediction of regulations between the set of genes ${\mathcal{G}}_{\mathit{A}}$ and ${\mathcal{G}}_{\mathit{B}}$
Bagged MLNs  

λ  AUCROC  AUCPR  
50  72.8  6.7  
100  73.1  7.7  
500  73.2  9.2  
750  73.4  9.5  
1000  73.1  9.5  
5000  73.0  9.8  
10000  72.8  9.5  
Bagged pairwise SVMs  
C  Pairwise sum  Sum  
AUCROC  AUCPR  AUCROC  AUCPR  
0.001  62.8  4.0  66.2  7.8 
0.01  62.8  4.0  66.2  7.8 
0.1  62.8  4.0  66.2  7.8 
1  65.3  7.7  67.4  8.6 
10  65.4  6.1  67.5  8.3 
100  65.4  6.1  67.5  8.3 
1000  65.4  6.1  67.5  8.3 
Although each predictor was trained on a balanced dataset, with same numbers of positive and negative examples of regulation, this test was made under real conditions: we considered the whole set of positive ($\left{R}_{3}^{+}\right=55$) and negative examples ($\left\overline{{R}_{3}^{+}}\right=2969$) to assess the performance in prediction. On the testtraining interactions, the predictor with bagged MLNs performed quite well, showing an AUCROC of about 0.73. This was really a very good result which implies low degradation in performance especially for the false positive rate that only slightly increases. The AUC values obtained with bagged MLNs are above the values obtained with the two bagged SVMs. We performed a statistical test in order to compare the AUCROC values obtained with the different classifiers. We used the nonparametric test on Mann Whitney statistics developed by [51] and the implementation provided by the R package pROC [52]. The obtained pvalues are given in the Additional file 4. We observe from this results that the pvalues are less than 0.05 and therefore that the AUCROC values of bagged MLNs and bagged pairwise sum SVMs are significantly different. Regarding the comparison between bagged MLNs and bagged sum SVMs, the difference between AUCROC values is not significant, indicating similar predictive performance.
AUCPR of bagged MLNs outperforms the best pairwise SVM. Therefore in a real prediction task, e.g. a network completion task, MLN exhibits a very interesting behaviour, even if the AUCPR still needs to be increased.
Prediction of regulations between the set of genes ${\mathcal{G}}_{\mathit{A}}$ and ${\mathcal{G}}_{\mathit{B}}$ when using only gene expression data as descriptors
Bagged MLNs  

λ  AUCROC  AUCPR  
50  61.5  2.4  
100  62.5  2.5  
500  59.5  2.3  
750  64.6  2.5  
1000  64.9  2.5  
5000  64.0  2.5  
10000  62.7  2.4  
Bagged pairwise SVMs  
C  Pairwise sum  Sum  
AUCROC  AUCPR  AUCROC  AUCPR  
0.001  60.2  3.0  62.8  3.9 
0.01  60.2  3.0  62.8  3.9 
0.1  60.2  3.0  62.8  3.9 
1  62.8  4.2  64.8  6.4 
10  60.9  4.8  64.0  6.1 
100  60.9  4.8  64.0  6.1 
1000  60.9  4.8  64.0  6.1 
To conclude, we have shown in this section that bagged sum SVM performs well in Task1 and Task3, while bagged pairwise sum SVM performs well in Task2. Contrary to the SVM classifiers, MLNs behaved well in the three tasks. Now another interesting criterion to choose a method for network inference is to measure its ability to provide insights on the taken decisions.
Resulting logical rules
 1.
ProtLoccell(g _{2},Plasma_membrane) ∧ Expsiid2(g _{2},Level3) ∧ Expsiid2(g _{1},Level3) ⇒ Regulates(g _{1},g _{2})
 2.
Processbio(g _{2},Cell_proliferation) ∧ Processbio(g _{1},Negative_regulation_of_cel_proliferation) ⇒ Regulates(g _{1},g _{2})
 3.
Expsiid2(g _{1},Level3) ∧ Expprcid2(g _{1},Level4) ∧ Expsiid2(g _{2},Level4) ∧ Expprcid2(g _{2},Level5) ⇒ Regulates(g _{1},g _{2})
 4.
Expprcid2(g _{1},Level5) ∧ Expwt(g _{2},Level2) ∧ Expprcid2(g _{2},Level4) ⇒ Regulates(g _{1},g _{2})
The first rule means that a gene overexpressed in transient knockdown of ID2 regulates overexpressed genes in the same condition and that code for proteins in plasma membrane. Obviously, this rule alone is far too general but within a set of rules with positive and negative weights, it brings a piece of evidence for regulation. The second rule may seem trivial but it has been retrieved from data: it says that genes involved in negative regulation of cell proliferation regulate genes involved in cell proliferation. The next rule means that an increase of the expressions of G_{1} and G_{2} in the condition of over expression of ID2 compared to transient knockdown of ID2 indicates a regulation between G_{1} and G_{2}. Regarding the last rule, it indicates that a high expression value of G_{1} in the prcID2 condition and the increase of the expression of G_{2} between wildtype condition and prcID2 implies the existence of a regulation between these two genes.
These rules are examples of what has been obtained in a first attempt to build a whole strategy to get a supervised edge predictor. However the quality of the learnt rules strongly depends on the nature of the chosen predicates and the ILP learning phase. We notice that a substantial improvement can be reached in terms of rules if the biologist makes explicit some constraints on the rules. For instance, one might want rules that include at least relations on both input genes in their premises. We will favor this research direction in the future.
Another information that can be extracted from the learnt MLN concerns the statistics of presence of some of the predicates in the premises of the rules. In our experimental studies, chromosomal location of genes did not appear as an important property to conclude about regulation.
Conclusions
Recent years have witnessed the preeminence of numerical and statistical machine learning tools in computational biology. Among them, kernelbased methods present numerous advantages, including the ability to deal with heterogeneous sources of information by encoding them into similarities (kernels). On top of that, multiple kernel learning allows to select sources of informations thought the learning of sparse linear combination of kernels [19, 53, 54]. However kernelbased methods remain black boxes: using non linear kernels, the decision function learnt with a SVM is not at all interpretable. This is an inherent drawback of SVMs because biologists are generally not only interested in the prediction made by a classifier but also in the reason why such an example has been labeled in a given way.
This work explores another direction through a new hybrid tool based on firstorder logic representation and probabilistic graphical modeling. Once learnt, a MLN provides a weighted set of rules that conclude on the target predicate, here the regulates predicate. To our knowledge this work is the first application of MLN to gene regulatory network inference and one of the very first real applications of MLN on noisy and medium scale biological data. As described in the previous sections, learning a MLN involves several steps including data encoding, choice of constraints and hyperparameters in the ILP learner and the weight learner as well as an appropriate learning protocol scheme for achieving the learning task. All these steps require a high level of collaboration between biologists and computer scientists which is facilitated by the common language of firstorder logic. Therefore, in one hand, the encoding process can be seen as a limitation since each new application requires specific work about the choice and the definition of the predicates to be used. Compared to the kernel design, this step is expensive. However, on the other hand, it produces a corpus of interpretable facts and rules encoding the nature of the relationship between genes that the biologist can inspect. Moreover, it is worth pointing out the fact that it is relatively easy in this context to impose known rules or to perform incremental learning at the level of the rule learner. There is also a lot of relevant information that can be made available that we did not incorporate to describe genes. For instance, adding knowledge of regulatory motifs of genes and DNAbinding sites of regulatory proteins, could improve the performance of the predictor. This means that a proper representation of sequences should be described either directly in firstorder logic as it was done in [55], or using an extension of firstorder logic to sequence variables like those of [56]. This is certainly a direction to be explored in future works.
Another issue is scalability to larger networks composed of thousands of genes. This would be a concern for pairwise kernelbased methods for instance for the later task to compute the Gram matrix between training and test data. For MLN, scaling to a larger number of genes like thousands of genes should be made possible using the latest improvement in MLN learning implemented in FELIX [57] using dual composition.
Another interesting question is to compare decision trees with MLNs. Decision trees are usually built from attributevalue representations but have been extended to firstorder logic in [58]. They also provide a set of interpretable rules but in a less general form than in MLNs. In a decision tree, rules are factorisable and a given example to be classified will only satisfy one rule. On the contrary, a MLN devoted to supervised classification a given example can satisfy many rules. Interestingly, combining decision trees to learn compact representations of MLNs has been recently proposed in [59].
Finally the biologist interested in the ID2 genetic regulatory network in human keratinocytes gets two main results from this work additionally to a set of facts and rules describing the network. First, learning such a supervised pairwise classifier can be seen as a crossvalidation of both experiments and existing literature. The ability of the learning algorithm to build a good edge prediction tool shows indeed that textmining and careful curation can produce networks that are consistent. Inversely, the experimental data measured in the wet laboratory are proven to make sense. Second, the last in silico study can provide a list of predicted regulations with new candidate genes, some of them being known but some of the others, considered currently as false positive, may involve new regulators and new targets. This calls for an experimental wet lab validation to test the relevance of the potential new regulations.
Declarations
Acknowledgements
We thank Xavier Gidrol (CEA) and Vincent Frouin (CEA) for their fruitful comments at the beginning of the study. This work was supported by the Agence Nationale de la Recherche [grant ANR05ARA]. The work of CB and FAB was completed using funding from [ANR09SYSC009].
Authors’ Affiliations
References
 Levine M, Davidson EH: Gene regulatory networks for development. PNAS. 2005, 102 (14): 49364942. 10.1073/pnas.0408031102.PubMed CentralView ArticlePubMedGoogle Scholar
 Learning and Inference in Computational Systems Biology. Edited by: Lawrence N, Girolami M, Rattray M, Sanguinetti G. 2010, Cambridge: MIT PressGoogle Scholar
 Sima C, Hua J, Jung S: Inference of gene regulatory networks using timeseries data: a survey. Curr Genomics. 2009, 10 (6): 416429. 10.2174/138920209789177610.PubMed CentralView ArticlePubMedGoogle Scholar
 Qian J, Lin J, Luscombe NM, Yu H, Gerstein M: Prediction of regulatory networks: genomewide identification of transcription factor targets from gene expression data. Bioinformatics. 2003, 19 (15): 19171926. 10.1093/bioinformatics/btg347.View ArticlePubMedGoogle Scholar
 Mordelet F, Vert JP: SIRENE: supervised inference of regulatory networks. Bioinformatics. 2008, 24 (16): i76—i82View ArticlePubMedGoogle Scholar
 Mordelet F, Vert JP: A bagging SVM to learn from positive and unlabeled examples. ArXiv eprints. 2010Google Scholar
 Cerulo L, Elkan C, Ceccarelli M: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics. 2010, 11: 22810.1186/1471210511228.PubMed CentralView ArticlePubMedGoogle Scholar
 Margolin A, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006, 7 (Suppl 1): S710.1186/147121057S1S7.PubMed CentralView ArticlePubMedGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski , Cottarel G, Kasif S, Gardner TS: Largescale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLOS Biol. 2007, 5: e810.1371/journal.pbio.0050008.PubMed CentralView ArticlePubMedGoogle Scholar
 Zoppoli P, Morganella S, Ceccarelli M: TimeDelayARACNE: reverse engineering of gene networks from timecourse data by an information theoretic approach. BMC Bioinformatics. 2010, 11: 15410.1186/1471210511154.PubMed CentralView ArticlePubMedGoogle Scholar
 Schafer J, Strimmer K: A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol. 2005, 4: Article 32Google Scholar
 de la Fuente A, Bing N, Hoeschele I, Mendes P: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics. 2004, 20 (18): 35653574. 10.1093/bioinformatics/bth445.View ArticlePubMedGoogle Scholar
 HuynhThu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using treebased methods. Plos ONE. 2010, 5: e1277610.1371/journal.pone.0012776.PubMed CentralView ArticlePubMedGoogle Scholar
 Friedman N, Linial M, Nachman I, Pe’er D: Using bayesian networks to analyze expression data. J Comput Biol. 2000, 7: 601620. 10.1089/106652700750050961.View ArticlePubMedGoogle Scholar
 Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003, 301 (5629): 102105. 10.1126/science.1081900.View ArticlePubMedGoogle Scholar
 Chen KC, Wang TY, Tseng HH, Huang CYF, Kao CY: A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics. 2005, 21 (12): 28832890. 10.1093/bioinformatics/bti415.View ArticlePubMedGoogle Scholar
 Bansal M, Gatta GD, di Bernardo D: Inference of gene regulatory networks and compound mode of action from time course gene expression profiles. Bioinformatics. 2006, 22 (7): 815822. 10.1093/bioinformatics/btl003.View ArticlePubMedGoogle Scholar
 Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics. 2004, 20: i363—i370View ArticlePubMedGoogle Scholar
 Kato T, Tsuda K, Asai K: Selective integration of multiple biological data for supervised network inference. Bioinformatics. 2005, 21 (10): 24882495. 10.1093/bioinformatics/bti339.View ArticlePubMedGoogle Scholar
 BenHur A, Noble WS: Kernel methods for predicting proteinprotein interactions. Bioinformatics. 2005, 21 (suppl 1): i38i46. 10.1093/bioinformatics/bti1016.View ArticlePubMedGoogle Scholar
 Martin S, Roe D, Faulon JL: Predicting proteinprotein interactions using signature products. Bioinformatics. 2005, 21: 218226. 10.1093/bioinformatics/bth483.View ArticlePubMedGoogle Scholar
 Hue M, Vert JP: On learning with kernels for unordered pairs. Proceedings of the 27th International Conference on Machine Learning; Haifa, Israel. Edited by: Furnkranz J, Joachims T. 2010, Omnipress, 463470.Google Scholar
 Bleakley K, Biau G, Vert JP: Supervised reconstruction of biological networks with local models. Bioinformatics. 2007, 23 (13): i57i65. 10.1093/bioinformatics/btm204.View ArticlePubMedGoogle Scholar
 Ceccarelli M, Cerulo L: Selection of negative examples in learning gene regulatory networks. Proceedings of the IEEE Interactional Conference on Bioinformatics and Biomedecine Workshop. 2009, Washington: IEEE Computer Society, 5661.Google Scholar
 Cerulo L, Paduano V, Zoppoli P, Ceccarelli M: A negative selection heuristic to predict new transcriptional targets. BMC Bioinformatics. 2013, 14 (Suppl 1): S310.1186/1471210514S1S3.PubMed CentralView ArticlePubMedGoogle Scholar
 Elkan C, Noto K: Learning classifiers from only positive and unlabeled data. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08. Las Vegas, NV, USA. 2008, New York: ACM, 213220.View ArticleGoogle Scholar
 Giordana A, Saitta L: Phase transitions in relational learning. Mach Learn. 2000, 41 (2): 217251. 10.1023/A:1007620705405.View ArticleGoogle Scholar
 Richardson M, Domingos P: Markov Logic: a unifying framework for statistical relational learning. Introduction to Statistical Relational Learning. Edited by: Getoor L, Taskar B. 2007, Cambridge: the MIT Press, 339371.Google Scholar
 Richardson M, Domingos P: Markov Logic networks. Mach Learn. 2006, 62 (12): 107136.View ArticleGoogle Scholar
 Kubat M, Matwin S: Addressing the Curse of Imbalanced Training Sets: OneSided Selection. Proceedings of the 14th International Conference on Machine Learning; Nashville, Tennessee, USA. Edited by: Douglas HF. 1997, San Francisco: Morgan Kaufmann, 179186.Google Scholar
 Tao D, Tang X, Li X, Wu X: Asymmetric bagging and random subspace for support vector machinesbased relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell. 2006, 28 (7): 10881099.View ArticlePubMedGoogle Scholar
 Langlands K, Down GA, Kealey T: Id proteins are dynamically expressed in normal epidermis and dysregulated in squamous cell carcinoma. Cancer Res. 2000, 60: 59295933.PubMedGoogle Scholar
 De Raedt L: Logical and Relational Learning. 2008, Berlin, Heidelberg: SpringerView ArticleGoogle Scholar
 Consortium TGO: Gene ontology: tool for the unification of biology. Nat Genet. 2000, 25: 2529. 10.1038/75556.View ArticleGoogle Scholar
 Srinivasan A: The Aleph manual. 2007, [http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph.html]Google Scholar
 Muggleton S, De Raedt L: Inductive logic programming: theory and methods. J Logic Program. 1994, 19 (20): 629679.View ArticleGoogle Scholar
 Singla P, Domingos P: Discriminative training of Markov Logic Networks. Proceedings of the 20th national conference on Artificial intelligence  Volume 2; Pittsburgh. 2005, Menlo Park: The AAAI Press, 868873.Google Scholar
 Lowd D, Domingos P: Efficient weight learning for Markov Logic Networks. Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases; Warsaw, Poland. 2007, Berlin Heidelberg: Springer, 200211.Google Scholar
 Huynh T, Mooney R: Discriminative structure and parameter learning for Markov Logic Networks. Proceedings of the 25th International Conference on Machine Learning. 2008, Helsinki: Omnipress, 416423.Google Scholar
 Huynh TN, Mooney RJ: Maxmargin weight learning for Markov logic networks. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I, Bled, Slovenia. 2009, Berlin Heidelberg: Springer, 564579.View ArticleGoogle Scholar
 Liu DC, Nocedal J: On the limited memory BFGS method for large scale optimization. Math Program. 1989, 45 (3): 503528.View ArticleGoogle Scholar
 Kok S, Sumner M, Richardson M, Singla P, Poon H, Lowd D, Wang J, Domingos P: The Alchemy system for statistical relational AI. Tech. rep., Department of Computer Science and Engineering, University of Washington, Seattle, WA; 2009. http://alchemy.cs.washington.edu
 Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995, 57: 289300.Google Scholar
 Li Y, Liu L, Bai X, Cai H, Ji W, Guo D, Zhu Y: Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks. BMC Bioinformatics. 2010, 11: 52010.1186/1471210511520.PubMed CentralView ArticlePubMedGoogle Scholar
 Aranda B, Achuthan P, AlamFaruque Y, Armean I, Bridge1 A, Derow C, M Feuermann ATG, Kerrien S, Khadake J, Kerssemakers J, Leroy C, Menden M, Michaut M, L MontecchiPalazzi S, Neuhauser N, Orchard S, Perreau V, Roechert B, van Eijk K, Hermjakob H: The IntAct molecular interaction database in 2010. Nucleic Acids Res. 2010, 38 (Database issue): 525531.View ArticleGoogle Scholar
 Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: genecentered information at NCBI. Nucleic Acids Res. 2011, 39 (suppl 1): D52D57.PubMed CentralView ArticlePubMedGoogle Scholar
 Vert JP, Qiu J, Noble W: A new pairwise kernel for biological network inference with support vector machines. BMC Bioinformatics. 2007, 8 (Suppl 10): S810.1186/147121058S10S8.PubMed CentralView ArticlePubMedGoogle Scholar
 Kashima H, Kato T, Yamanishi Y, Sugiyama M, Tsuda K: Link propagation: a fast semisupervised learning algorithm for link prediction. Proceedings of the 9th SIAM International Conference on Data Mining; Sparks, Nevada, USA. 2009, SIAM, 10991110.Google Scholar
 Kashima H, Oyama S, Yamanishi Y, Tsuda K: Cartesian kernel: an efficient alternative to the pairwise kernel. IEICE Trans Inf Syst. 2010, E93D (10): 26722679. 10.1587/transinf.E93.D.2672.View ArticleGoogle Scholar
 Davis J, Goadrich M: The relationship between precisionrecall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh. 2006, OmniPress, 233240.Google Scholar
 DeLong ER, DeLong DM, ClarkePearson DL: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988, 44 (3): 837845. 10.2307/2531595.View ArticlePubMedGoogle Scholar
 Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M: pROC: an opensource package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011, 12: 7710.1186/147121051277.PubMed CentralView ArticlePubMedGoogle Scholar
 Lanckriet GRG, Deng M, Cristianini N, Jordan MI, Noble WS: Kernelbased data fusion and its application to protein function prediction in yeast. Proceedings of the Pacific Symposium on Biocomputing; Hawaii. 2004, Singapore: World Scientific, 300311.Google Scholar
 Gönen M, Alpaydin E: Multiple kernel learning algorithms. J Mach Learn Res. 2011, 12: 22112268.Google Scholar
 Muggleton S, King RD, Stenberg MJ: Protein secondary structure prediction using logicbased machine learning. Protein Eng. 1992, 5 (7): 647657. 10.1093/protein/5.7.647.View ArticlePubMedGoogle Scholar
 Kutsia T, Buchberger B: Predicate logic with sequence variables and sequence function symbols. MKM, Volume 3119 of Lecture Notes in Computer Science. Edited by: Asperti A, Bancerek G, Trybulec A. 2004, Berlin Heidelberg: SpringerVerlag, 205219.Google Scholar
 Niu F, Zhang C, Re C, Shavlik J: Scaling inference for Markov Logic via dual decomposition. Proceeding of the 12th IEEE International Conference on Data Mining (ICDM); Brussels, Belgium. 2012, Washington: IEEE Computer Society, 10321037.Google Scholar
 Blockeel H, Raedt LD: Topdown induction of firstorder logical decision trees. Artif Intell. 1998, 101 (12): 285297. 10.1016/S00043702(98)000344.View ArticleGoogle Scholar
 Khosravi H, Schulte O, Hu J, Gao T: Learning compact Markov logic networks with decision trees. Mach Learn. 2012, 89 (3): 257277. 10.1007/s1099401253076.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.