 Research article
 Open Access
Learning gene regulatory networks from only positive and unlabeled data
 Luigi Cerulo^{1, 2}Email author,
 Charles Elkan^{3} and
 Michele Ceccarelli^{1, 2}
https://doi.org/10.1186/1471210511228
© Cerulo et al; licensee BioMed Central Ltd. 2010
Received: 28 October 2009
Accepted: 5 May 2010
Published: 5 May 2010
Abstract
Background
Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact.
Results
A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement.
Conclusions
Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
Keywords
 Support Vector Machine
 Feature Vector
 Gene Regulatory Network
 Unlabeled Data
 Unsupervised Method
Background
Inferring the topology of gene regulatory networks is fundamental to understand the complexity of interdependencies among gene up and down regulation. Characterizing experimentally the transcriptional cisregulation at a genome scale is still an expensive challenge, even for wellstudied model organisms. In silico methods represent a promising direction that, through a reverse engineering approach, aim to extract gene regulatory networks from prior biological knowledge and available genomic and postgenomic data. Different model architectures to reverse engineer gene regulatory networks from gene expression data have been proposed in literature [1]. Such models represent biological regulations as a network where nodes represent elements of interactions, eg. genes, proteins, metabolites, while edges represent the presence of interaction activities between such network components. Four main network model architectures can be distinguished: i) information theory models, ii) boolean network models, iii) differential and difference equation models, iv) Bayesian models.
Information theory models correlate two genes by means of a correlation coefficient and a threshold. Two genes are predicted to interact if the correlation coefficient of their expression levels is above a threshold. For example, TDARACNE [2], ARACNE [3], and CLR [4] infer the network structure with a statistical score derived from the mutual information and a set of pruning heuristics.
Boolean networks use a binary variable to represent the state of a gene activity and a directed graph, where edges are represented by boolean functions, to represent the interactions between genes. REVEAL [5] is an algorithm that infers a boolean network model from gene expression data. Differential and difference equations describe gene expression changes as a function of the expression level of other genes. They are particular suitable to model the dynamic behavior of gene networks. The basic mathematical model of such approaches are a set of Ordinary Differential Equations (ODE) [6].
A large variety of machine learning algorithms have been proposed in literature and are available as working tools [12]. In the context of gene regulatory networks a first attempt has been made with Bayesian Networks, Linear Regression, Decision Trees, and Support Vector Machines (SVM) [13]. Among all the Support Vector Machine algorithm has attracted the attention of the bioinformatics community.
SIRENE [14] is the stateoftheart method for the reconstruction of gene regulatory networks with a Support Vector Machine algorithm. The authors test SIRENE on a benchmark experiment of Escherichia coli genes composed by a compendium of gene expression data and a set of known regulations. A critical point of a binary supervised classifier algorithm is that the input consists normally of positive and negative examples. Actually, although prior known regulatory connections can safely be taken as a partial set of positive training examples, the choice of negative examples is not straightforward as no or few information is available regarding the fact that a given pair of genes are not interacting. The only available information is a partial set of interacting gene pairs, i.e. positive examples, and unlabeled data which could include both positive and negative examples. A common adopted solution is to consider all, or a random subset of, unlabeled examples as negative [14]. Whatever is the supervised algorithm, training with false negatives could affect the performance of the classifier, as it learns wrongly potentially positive examples as negatives. Learning from only positive and unlabeled data is a hot topic in the literature of data mining for the classification of web documents [15, 16]. They differ from semi supervised learning, i.e. learning with a small set of labeled examples (both positive and negative), in the sense that the classification algorithm learns from a small subset of positive example and a huge set of unlabeled examples (both negative and positive). In literature two main classes of approaches can be distinguished:

Selection of reliable negatives. The first class of approaches depends on a starting selection of reliable negative examples that usually depends on the application domain [17, 18]. In [16] a two step strategy has been proposed in text classification domains: in the first step a set of reliable negative examples are selected from the unlabeled set by using the term frequency and inverse document frequency measure (tdidf); in the second step a sequence of classifiers are applied and then the best classifier is selected. In [19] a similar approach is used to predict noncoding RNA genes, where the first set of negative examples is built by maximizing the distances of negative sample points to the known positive sample points by using a distance metric built upon the RNA sequence. Such a negative set is iteratively refined by using a binary classifier based on current positive and negative examples until no further additional negative examples can be found. In [20] we proposed a method applied to gene regulatory networks that selects a reliable set of negatives by exploiting the known network topology.

Probability estimate correction. The second class of approaches does not need labeled negative examples and basically tries to adjust the probability of being positive estimated by a traditional classifier trained with labeled and unlabeled examples. A general purpose method has been proposed in [15] where the authors show that, under certain circumstances, a classifier trained from only positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. Such a result is used to show how to learn a classifiers from a non traditional training set.
In this paper we show that the probability estimation approach introduced in [15], called PosOnly, is a viable solution to the problem of learning gene regulatory networks without negative examples. It turns the problem of classify between positive and negative samples into the "simpler" problem of separating between labeled and unlabeled samples under the assumption that all the positive examples are randomly sampled from a uniform distribution. To this purpose we compare the PosOnly method with some recently proposed approaches to the supervised inference of regulatory networks: the traditional approach that considers unlabeled examples as negatives [14] (SVMOnly), and a method aimed at the selection of reliable negative examples [20] (PSEUDORANDOM).
Methods
PosOnly method
where V is a validation set drawn in the same manner as the training set and P ⊆ V is the set of labeled (i.e. positives) examples in V . A threshold, usually set to 0.5, discriminates if x belongs to the positive, p(y = 1x) > 0.5, or negative, p(y = 0x) > 0.5, class.
PSEUDORANDOM method
where TC(P) is the transitive closure of P, i.e. the graph composed by the same nodes of P and the set of edges (g_{ i }, g_{ j }) such that there is a nonnull path from g_{ i }to g_{ j }in P ; while, Transpose(X) is the graph containing the edges of X reversed. Such a set is further extended with a small fraction of candidate negatives drawn randomly from N ∪ Q.
Research questions
In the following we detail: i) the research questions we aim at answering in this paper; and ii) the methods we followed to pursue such an aim. The main goal is to evaluate, by means of a benchmark experiment, the performances of the approaches, PosOnly, PSEUDORANDOM introduced in the previous Section, that address the problem of learning gene regulations with positive only data. Such approaches are then compared with a classifier trained with labeled and unlabeled examples (SVMOnly) and with the most widely used unsupervised information theoretic methods, ARACNE [3] and CLR [4]. In particular we aim at answering:

RQ1: How do PosOnly, PSEUDORANDOM, and SVMOnly performances vary with the percentage of known positives? In particular, this research question aims to compare the performances of PosOnly, PSEUDORANDOM, and SVMOnly when the percentage of known positives varies from 10% to 100%.

RQ2: How do PosOnly, PSEUDORANDOM, and SVMOnly performances vary with the number of genes composing a regulatory network? In particular, this research question aims to evaluate the performances of PosOnly, PSEUDORANDOM, and SVMOnly when network size varies from 10 to 500.

RQ3: How do PosOnly, PSEUDORANDOM, and SVMOnly performances compare with unsupervised information theoretic approaches, such as ARACNE and CLR? In particular, this research question aims to compare supervised learning approaches, PosOnly, PSEUDORANDOM, and SVMOnly, with unsupervised information theoretic approaches at different network sizes and at different percentage of known positives.
The learning scheme, the datasets used, and the benchmark process to answer the above mentioned research questions are introduced in the following. To compare PosOnly, PSEUDORANDOM, SVMOnly, and unsupervised methods we performed a stratified 10fold cross validation assuming different percentage of known positive examples within a gene regulatory network of size G. To perform an assessment a gold standard of the network is necessary. Simulated networks are widely used to test gene network inference algorithms as the complete set of genegene interactions is available. This is not true with experimental data where only a partial set of interactions is known from the literature and usually collected into public databases. This forces for different evaluation processes depending on which dataset, simulated or experimental, would be used.
Learning scheme
For both PosOnly and SVMOnly we used the Support Vector Machine (SVM), with Platt scaling [21], to estimate the probability p(s = 1x). In the case of SVMOnly such a probability is assumed to coincide with p(y = 1x), instead, in the case of PosOnly such a probability is scaled with the empirical estimation c ⋍ p(s = 1y = 1) and then obtain p(y = 1x) ⋍ p(s = 1x) = c. For comparison purpose we used the Support Vector Machine with Platt scaling also for PSEUDORANDOM which is trained with the set of known positives and the set of negatives selected with the transitive closure heuristic.
subject to: y^{ T }α = 0
where C and γ are parameters that can be set empirically with a grid search cross validation [23].
Benchmark process with simulated data
The process consists of the following three steps:
1) Random generation of a genegene regulatory network of G genes
We generated simulated data with GeneNetWeaver http://gnw.sourceforge.net, a tool used to generate in silico benchmarks in the DREAM3 challenge initiative [24, 25]. The GeneNetWeaver tool is able to obtain network topologies of a given size G by extracting randomly subnetworks from the genetogene interaction networks of Escherichia coli or Saccharomyces cerevisiae (Yeast). The tool generates steady state levels for the wildtype and the nullmutant knockdown strains for each gene. This means that for a network of G genes there are G + 1 experiments (wildtype and knockdown of every gene) leading to a feature vector composed of 2 × (G + 1) attributes. The data corresponds to noisy measurements mRNA levels which have been normalized such that the maximum value in a given dataset is one. Autoregulatory interactions were removed, i.e. no selfinteractions are considered in the networks. As reported in the DREAM3 documentation, the tool takes great care to generate both network structure and dynamics that are biologically plausible.
2) Random selection of P nonself interactions which are assumed to be known
This leads to a remaining set Q of nonself interaction assumed to be unknown, and N of all noninteractions. The fraction of with respect to is assumed to vary as: . In a learning scheme, P is the set of labeled, and positive, examples, and Q ∪ N is the set of unlabeled examples. For each network of size G, the second step is repeated among ten random selection of P positives.
3) Cross validation of PosOnly, PSEUDORANDOM, and SVMOnly classification performances
Confusion matrix of the ith cross validation trial
predicted  

actual  positive  negative 
P_{ i }∪ Q_{ i }  TP _{ i }  FN _{ i } 
N_{ i }  FP _{ i }  TN _{ i } 
Benchmark process with experimental data
To overcome computational limitation with the huge amount of experimental data we set up a benchmark process similar to the one adopted to evaluate the SIRENE supervised approach [14]. SIRENE predicts regulations in Escherichia coli by splitting the problem of regulatory network inference into many local binary classification subproblems, each associated with a Transcription Factor (TF). For each TF, we train an SVM classifier with a gaussian kernel to discriminate between genes known to be regulated and genes known not to be regulated by the TF, based on the expression patterns of such genes. The SIRENE inspired benchmark process we adopted with experimental data consists in the following steps:
1) Selection of an experimental genegene network
As experimental data we used the expression and regulation data made publicly available by [26] of Escherichia coli, widely used in literature as an experimental benchmark [14]. The expression data, collected under different experimental conditions, consist of 445 E. coli Affymetrix Antisense2 microarray expression profiles for 4345 genes. Such data were standardized to zero mean and unit standard deviation. The regulation data consist of 3293 experimentally confirmed regulations between 154 TF and 1211 genes, extracted from the RegulonDB (version 5) database [27].
2) Random selection of P* genes regulated by a given TF, assumed to be known
With experimental data, the complete set of genegene interactions is unknown and the partitions, P, Q, and N, cannot be simulated. Then, to differ them from the actual partitions referred above, we name such partitions as P*, the set of genes regulated by a given TF assumed to be known; Q*, the set of interaction assumed to be unknown; and N* the set of all noninteractions. The fraction of P* with respect to Q* is assumed to vary as: . In a learning scheme, P* is the set of labeled, and positive, examples, and Q* ∪ N* is the set of unlabeled examples. The second step is repeated for each TF among ten random selection of P positives.
3) Cross validation of PosOnly, PSEUDORANDOM, and SVMOnly classification performances
The validation consists of a stratified tenfold cross validation and proceeds as follow. Partition P*, Q*, and N* randomly into three subsets each of roughly the same size (P_{1}, Q_{1}, N_{1}), ..., ( , , ). For each ith partition a trial is performed with one subset reserved for testing ( , , ), and the other two for training the classifier. A cross validations of a classifier performance leads to precision and recall indexes, PR* and RC*, which need to be correctly interpreted. As P* ⊆ P, Q* ⊆ Q, and N ⊆ N*, it is easy to see that PR* ≤ PR and . Hence, the value of precision, PR*, constitute a lower bound estimation of the actual precision, while the value of recall, RC*, can be correctly characterized when , which is the percentage of actually known genegene interactions, can be estimated in advance.
However, in domains such as Escherichia coli and Saccharomyces cerevisiae this can be assumed very high ( ~ 1), which means that the fraction of unknown of genegene regulations is very low.
Selection of C and γ parameters
The selected C and γ SVM parameters
Network size  C  γ 

10  500  0.05 
50  500  0.01 
100  500  0.005 
500  500  0.001 
Results and Discussion
In this section we discuss the results answering RQ1, RQ2, and RQ3 obtained in the context of simulated and, whereas possible, experimental data. To allow for replicability, raw data are available at the following url: https://www.scoda.unisannio.it/rawdata/bmcbioinformatics1009.tgz.
RQ1: How do PosOnly, PSEUDORANDOM, and SVMOnly performances vary with the percentage of known positives?
Results on Simulated data
Results on Experimental data
Figures 7 and 8 show the precision and recall obtained at different percentage of known positives. The precision of PSEUDORANDOM and SVMOnly decrease with the percentage of known positives, instead their recall decrease showing a similar behavior, although SVMOnly exhibits a better precision while PSEUDORANDOM exhibits a better recall. The precision of PosOnly increases with the percentage of known positives but always lower than those exhibited by PSEUDORANDOM and SVMOnly. Instead, the recall of PosOnly is always higher than those exhibited by PSEUDORANDOM and SVMOnly. It decreases in the interval between P = 10% and P = 50%, reaching a minimum of 0:56, and then increases reaching a maximum of 0.76 at P = 100%.
Figure 9 shows the combination of precision and recall performance by means of FMeasure. It can be noticed that also in the experimental dataset all algorithms exhibit a progressively increment in performance when the number of known positives grows from 10% to 100% reaching an almost convergent value at P = 100%. PosOnly outperforms both PSEUDORANDOM and SVMOnly showing a statistically significant difference when the percentage of known positive is lower than P = 50%.
RQ2: How do PosOnly, PSEUDORANDOM, and SVMOnly performances vary with the number of genes composing a regulatory network?
This research question can be answered only in the context of simulated data as in experimental data the number of genes cannot be varied.
Results on Simulated data
RQ3: How do PosOnly, PSEUDORANDOM, and SVMOnly performances compare with unsupervised information theoretic approaches, such as ARACNE and CLR?
Results on Simulated data
Figure 11 is particulary suitable to show the minimum percentage of known positives where the performance of learning methods starts to outperform the performance of unsupervised information theoretic methods, i.e. intersection between supervised and unsupervised curves. In can be noticed that PosOnly outperforms, or at least exhibit similar performances, at every percentage of known positives especially for large networks; while the intersection of PSEUDORANDOM and SVMOnly with unsupervised information theoretic methods curves occurs at different percentage of known positives. In particular such an intersections is dependent of the network size in both organisms and is lower for larger networks. This is mainly due to the fact that unsupervised methods works better with small networks making supervised methods more suitable form large networks.
Results on Experimental data
Conclusions
We performed an experimental evaluation of a supervised learning algorithm, namely PosOnly, which is able to learn from only positive and unlabeled examples. Such a method is particulary suitable in the context of gene regulatory networks where a partial set of known regulatory connections is available in public databases. In such a contexts it is crucial to take into account that the only available information are a partial set of genegene interactions, i.e. positive examples, and unlabeled data which could include both positive and negative examples.
The data mining community developed a number of approaches to deal with such a problem. In this paper we adopted the approach introduced in [15] that we compared, through a benchmark experiment performed with simulated and experimental data, with a negative selection method introduced in [20] (PSEUDORANDOM) and with the current state of the art of supervised methods, namely SVMOnly [14]. We showed that PosOnly, outperforms significantly both methods PSEUDORANDOM and SVMOnly in simulated data, instead exhibit a slightly lower performance in experimental data. A comparison with unsupervised information theoretic methods has been performed showing that the performance of unsupervised information theoretic methods decreases drastically with the number of genes composing a regulatory network, instead the performances of PosOnly, PSEUDORANDOM, and SVMOnly decrease more slowly.
If one uses the PosOnly and SVMOnly methods to rank candidates, then the rankings should be the same. They are indeed the same in our experiments. In this case, the contribution of [15] is to show that the simple SVMOnly method actually is correct, something that is not obvious. At first sight the SVMOnly method is too naive as a solution to the positiveonly problem; surprisingly, it is valid if all that is needed is a ranking of test examples.
If one wants to estimate probabilities for test examples, or if one wants to categorize candidates correctly at any given threshold (either 0.5 or some other value), then it is not correct to use probabilities produced by a standard classifier, whereas it is correct to use adjusted probabilities obtained with the PosOnly method. This happens, for example, if one wants to infer the overall gene regulatory network and a decision must be performed to classify the presence or absence of an arc between a pair of nodes/genes.
Note that the PosOnly method used in this paper is not the only valid way of obtaining correct probabilities. The paper [15] provides two other methods that are somewhat more complicated. In this research we use only the simplest method since it works well and will be easy for other researchers to apply. Any evaluation measure that is sensitive only to rank will indicate that the PosOnly and SVMOnly methods have equal performance. An example of such a measure is AUC, the area under the receiver operating characteristic (ROC) curve. However, measures that are sensitive to the correctness of conditional probabilities, for example mean squared error, will show that PosOnly performs better. Measures that are sensitive to the correctness of thresholds for making decisions, including FMeasure as used in our research, will also show that PosOnly performs better.
Results presented in this paper are partial and no general conclusions can be drawn. Threats to validity that can affect the results reported in the previous Section. In particular, our results can be affected by the limitations of the synthetic network generation tool and on the measurement errors in the experimental microarray data.
Threats to external validity, concerning the possibility to generalize our findings, affect the study although we evaluated the heuristics on two model organisms, and on a statistically significant sample of random regulatory networks. Nevertheless, analyses on further organisms are desirable, as well as the use of different simulated network generation tools. Instead, the study can be replicated as the tools are available for downloading, as well as simulated and experimental datasets. The benchmark process is detailed in Methods Section and we made raw data available for replication purposes.
Although more data is needed to validate empirically such results a biological validation is necessary to test the effectiveness of such approaches in real contexts. With respect to other gene network inference models, supervised methods need a set of known regulatory connection being available to learn the prediction model. As more genomic data become available such a limitation becomes less critical and we believe that machine learning methods could play a crucial role in the inference of new gene regulatory connections.
Declarations
Acknowledgements
We would like to thank the anonymous reviewers for their very constructive comments on early versions of this manuscript. This work was supported by a research project funded by MiUR (Ministero dell'Universitμa e della Ricerca) under grant PRIN200820085CH22F.
Authors’ Affiliations
References
 Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R: Gene regulatory network inference: Data integration in dynamic modelsA review. Bio Systems 2008, 96(1):86–103.View ArticlePubMedGoogle Scholar
 Zoppoli P, Morganella S, Ceccarelli M: TimeDelayARACNE: Reverse engineering of gene networks from timecourse data by an information theoretic approach. BMC Bioinformatics 2010, 11: 154. 10.1186/1471210511154View ArticlePubMedPubMed CentralGoogle Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006, 7(Suppl 1):S7. 10.1186/147121057S1S7View ArticlePubMedPubMed CentralGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biol 2007, 5: e8. 10.1371/journal.pbio.0050008View ArticlePubMedPubMed CentralGoogle Scholar
 Liang S, Fuhrman S, Somogyi R: Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput 1998, 18–29.Google Scholar
 de Jong H: Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol 2002, 9: 67–103. 10.1089/10665270252833208View ArticlePubMedGoogle Scholar
 Werhli AV, Husmeier D: Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat Appl Genet Mol Biol 2007, 6: Article15.PubMedGoogle Scholar
 Morganella S, Zoppoli P, Ceccarelli M: IRIS: a method for reverse engineering of regulatory relations in gene networks. BMC Bioinformatics 2009, 10: 444. 10.1186/1471210510444View ArticlePubMedPubMed CentralGoogle Scholar
 BenHur A, Noble WS: Kernel methods for predicting proteinprotein interactions. Bioinformatics 2005, 21(suppl 1):i38–46. 10.1093/bioinformatics/bti1016View ArticlePubMedGoogle Scholar
 Yamanishi Y, Bach F, Vert JP: Glycan classification with tree kernels. Bioinformatics 2007, 23(10):1211–1216. 10.1093/bioinformatics/btm090View ArticlePubMedGoogle Scholar
 Song J, Yuan Z, Tan H, Huber T, Burrage K: Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. Bioinformatics 2007, 23(23):3147–3154. 10.1093/bioinformatics/btm505View ArticlePubMedGoogle Scholar
 Witten IH, Frank E: Data mining: practical machine learning tools and techniques. Morgan Kaufmann series in data management systems, Morgan Kaufman; 2005.Google Scholar
 Grzegorczyk M, Husmeier D, Werhli AV: Reverse Engineering Gene Regulatory Networks with Various Machine Learning Methods. Analysis of Microarray Data 2008.Google Scholar
 Mordelet F, Vert JP: SIRENE: supervised inference of regulatory networks. Bioinformatics 2008, 24(16):i76–82. 10.1093/bioinformatics/btn273View ArticlePubMedGoogle Scholar
 Elkan C, Noto K: Learning classifiers from only positive and unlabeled data. KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA: ACM 2008, 213–220. full_textView ArticleGoogle Scholar
 Liu B, Dai Y, Li X, Lee WS, Yu PS: Building Text Classifiers Using Positive and Unlabeled Examples. ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, Washington, DC, USA: IEEE Computer Society 2003, 179.View ArticleGoogle Scholar
 Yu H, Han J, chuan Chang KC: PEBL: Web Page Classification without Negative Examples. IEEE Transactions on Knowledge and Data Engineering 2004, 16: 70–81. 10.1109/TKDE.2004.1264823View ArticleGoogle Scholar
 Li X, Liu B: Learning to Classify Texts Using Positive and Unlabeled Data. IJCAI03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9–15, 2003 2003, 587–594.Google Scholar
 Wang C, Ding C, Meraz RF, Holbrook SR: PSoL: a positive sample only learning algorithm for finding noncoding RNA genes. Bioinformatics 2006, 22(21):2590–2596. 10.1093/bioinformatics/btl441View ArticlePubMedGoogle Scholar
 Ceccarelli M, Cerulo L: Selection of negative examples in learning gene regulatory networks. Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on 2009, 56–61. full_textView ArticleGoogle Scholar
 Lin HT, Lin CJ, Weng RC: A note on Platt's probabilistic outputs for support vector machines. Mach Learn 2007, 68(3):267–276. 10.1007/s1099400750186View ArticleGoogle Scholar
 Chang CC, Lin CJ:LIBSVM: a library for support vector machines. 2001. [http://www.csie.ntu.edu.tw/~cjlin/libsvm]Google Scholar
 Hsu CW, Chang CC, Lin CJ: A practical guide to support vector classification. Department of Computer Science and Information Engineering, National Taiwan University; 2003.Google Scholar
 Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods. Journal of Computational Biology 2009, 16(2):229–239. 10.1089/cmb.2008.09TTView ArticlePubMedGoogle Scholar
 Stolovitzky G, Monroe D, Califano A: Dialogue on ReverseEngineering Assessment and Methods : The DREAM of HighThroughput Pathway Inference. Annals of the New York Academy of Sciences 2007, 1115: 1–22. 10.1196/annals.1407.021View ArticlePubMedGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biol 2007, 5: e8. 10.1371/journal.pbio.0050008View ArticlePubMedPubMed CentralGoogle Scholar
 Salgado H, GamaCastro S, PeraltaGil M, DíazPeredo E, SánchezSolano F, SantosZavaleta A, MartínezFlores I, JiménezJacinto V, BonavidesMartínez C, SeguraSalazar J, MartínezAntonio A, ColladoVides J: RegulonDB (version 5.0): Escherichia coli K12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 2006., (34 Database):Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.