minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information
 Patrick E Meyer^{1}Email author,
 Frédéric Lafitte^{1} and
 Gianluca Bontempi^{1}
https://doi.org/10.1186/147121059461
© Meyer et al; licensee BioMed Central Ltd. 2008
Received: 02 July 2008
Accepted: 29 October 2008
Published: 29 October 2008
Abstract
Results
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) genetogene interaction. Four different entropy estimators are made available in the package minet (empirical, MillerMadow, SchurmannGrassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like Fscores, PRcurves and ROCcurves in order to compare the inferred network with a reference one.
Conclusion
The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Keywords
Background
Modelling transcriptional interactions by large networks of interacting elements and determining how these interactions can be effectively learned from measured expression data are two important issues in system biology [1]. It should be noted that by focusing only on transcript data, the inferred network should not be considered as a proper biochemical regulatory network, but rather as a genetogene network where many physical connections between macromolecules might be hidden by shortcuts. In spite of some evident limitations the bioinformatics community made important advances in this domain over the last few years [2, 3]. In particular, mutual information networks have been succesfully applied to transcriptional network inference [4–6]. Such methods, which typically rely on the estimation of mutual information between all pairs of variables, have recently held the attention of the bioinformatics community for the inference of very large networks (up to several thousands nodes) [4, 7–9].
R is a widely used open source language and environment for statistical computing and graphics [10] which has become a defacto standard in statistical modeling, data analysis, biostatistics and machine learning [11]. An important feature of the R environment is that it integrates generic data analysis and visualization functionalities with offtheshelf packages implementing the latest advances in computational statistics. Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data [12] mainly based on the R programming language. This paper introduces the new R and Bioconductor package minet, where the acronym stands for Mutual Information NETwork inference. This package is freely available on the R CRAN package resource [10] as well as on the Bioconductor website [12].
1 Mutual information networks
Mutual information networks are a subcategory of network inference methods. The rationale of this family of methods is to infer a link between a couple of nodes if it has a high score based on mutual information [9].
Mutual informaton network inference proceeds in two steps. The first step is the computation of the mutual information matrix (MIM), a square matrix whose i, jth element
M I M_{ ij }= I(X_{ i }; X_{ j }) (1)
is the mutual information between X_{ i }and X_{ j }, where X_{ i }∈ $\mathcal{X}$, i = 1,...,n, is a discrete random variable denoting the expression level of the i th gene. The second step is the computation of an edge score for each pair of nodes by an inference algorithm that takes the MIM matrix as input.
The adoption of mutual information in network inference tasks can be traced back to the Chow and Liu's tree algorithm [13, 14]. Mutual information provides a natural generalization of the correlation since it is a nonlinear measure of dependency. Hence with mutual information generalized correlation networks (relevance networks [7]) and also conditional independence graphs (e.g. ARACNE [8]) can be built. An advantage of these methods is their ability to deal with up to several thousands of variables also in the presence of a limited number of samples. This is made possible by the fact that the MIM computation requires only $\frac{n(n1)}{2}$ estimations of a bivariate mutual information term. Since each bivariate estimation can be computed fastly and is low variant also for a small number of samples, this family of methods is adapted for dealing with microarray data. Note that since mutual information is a symmetric measure, it is not possible to derive the direction of an edge using a mutual information network inference technique. Notwithstanding the orientation of the edges can be obtained by using algorithms like IC which are well known in the graphical modelling community [15].
1.1 Relevance Network
The relevance network approach [7] has been introduced in gene clustering and was successfully applied to infer relationships between RNA expressions and chemotherapeutic susceptibility [6]. The approach consists in inferring a genetic network where a pair of genes {X_{ i }, X_{ j }} is linked by an edge if the mutual information I(X_{ i }; X_{ j }) is larger than a given threshold I_{0}. The complexity of the method is O(n^{2}) since all pairwise interactions are considered.
Note that this method does not eliminate all the indirect interactions between genes. For example, if gene X_{1} regulates both gene X_{2} and gene X_{3}, this would cause a high mutual information between the pairs {X_{1}, X_{2}}, {X_{1}, X_{3}} and {X_{2}, X_{3}}. As a consequence, the algorithm will set an edge between X_{2} and X_{3} although these two genes interact only through gene X_{1}.
1.2 CLR Algorithm
and μ_{ i }and σ_{ i }are respectively the sample mean and standard deviation of the empirical distribution of the values I(X_{ i }, X_{ k }), k = 1,...,n. The CLR algorithm was successfully applied to decipher the E. Coli TRN [4]. CLR has a complexity in O(n^{2}) once the MIM is computed.
1.3 ARACNE
The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [8] is based on the Data Processing Inequality [16]. This inequality states that, if gene X_{1} interacts with gene X_{3} through gene X_{2}, thenI(X_{1}; X_{3}) ≤ min (I(X_{1}; X_{2}), I(X_{2}; X_{3})).
ARACNE starts by assigning to each pair of nodes a weight equal to the mutual information. Then, as in relevance networks, all edges for which I(X_{ i }; X_{ j }) <I_{0} are removed, with I_{0} a given threshold. Eventually, the weakest edge of each triplet is interpreted as an indirect interaction and is removed if the difference between the two lowest weights is above a threshold W_{0}. Note that by increasing I_{0} the number of inferred edges is decreased while the opposite effect is obtained by increasing W_{0}.
If the network is a tree and only pairwise interactions are present, the method guarantees the reconstruction of the original network, once it is provided with the exact MIM. ARACNE's complexity is O(n^{3}) since the algorithm considers all triplets of genes. In [8] the method was able to recover components of the TRN in mammalian cells and outperformed Bayesian networks and relevance networks on several inference tasks [8].
1.4 MRNET
MRNET [9] infers a network using the maximum relevance/minimum redundancy (MRMR) feature selection method [17, 18]. The idea consists in performing a series of supervised MRMR gene selection procedures where each gene in turn plays the role of the target output.
that maximizes the scores_{ j }= u_{ j } r_{ j },
where u_{ j }is a relevance term and r_{ j }is a redundancy term. More precisely,u_{ j }= I(X_{ j }; Y)
measures the average redundancy of X_{ j }to each already selected variables X_{ k }∈ S. At each step of the algorithm, the selected variable is expected to allow an efficient tradeoff between relevance and redundancy. It has been shown in [19] that the MRMR criterion is an optimal "pairwise" approximation of the conditional mutual information between any two genes X_{ i }and X_{ j }given the set S of selected variables I(X_{ i }; X_{ j }S).
The MRNET approach consists in repeating this selection procedure for each target gene by putting Y = X_{ i }and V = X \ {X_{ i }}, i = 1,...,n, where X is the set of the expression levels of all genes. For each pair {X_{ i }, X_{ j }}, MRMR returns two (not necessarily equal) scores s_{ i }and s_{ j }according to (4). The score of the pair {X_{ i }, X_{ j }} is then computed by taking the maximum of s_{ i }and s_{ j }. A specific network can then be inferred by deleting all the edges whose score lies below a given threshold I_{0} (as in relevance networks, CLR and ARACNE). Thus, the algorithm infers an edge between X_{ i }and X_{ j }either when X_{ i }is a wellranked predictor of X_{ j }(s_{ i }> I_{0}) or when X_{ j }is a wellranked predictor of X_{ i }(s_{ j }> I_{0}).
An effective implementation of the bestfirst search for quadratic problems is available in [20]. This implementation demands an O(f × n) complexity for selecting f features using a best first search strategy. It follows that MRNET has an O(f × n^{2}) complexity since the feature selection step is repeated for each of the n genes. In other terms, the complexity ranges between O(n^{2}) and O(n^{3}) according to the value of f. In practice the selection of features stops once a variable obtains a negative score.
Implementation of the inference algorithms in minet
All the algorithms discussed above are available in the minet package. The RELNET algorithm is implemented by simply running the command build.mim which returns the MIM matrix which can be considered as a weighted adjacency matrix of the network. CLR, ARACNE and MRNET are implemented by the commands aracne(mim), clr(mim), mrnet(mim) respectively that return a weighted adjacency matrix of the network.
It should be noted, that the modularity of the minet package makes possible to assess network inference methods on similarity matrices other than MIM [21].
2 Mutual information estimation
An informationtheoretic network inference technique aims at identifying connections between two genes (variables) by estimating the amount of information common to any pair of genes. Mutual information is a measure which calculates dependencies between two discrete random variables. An important property of this measure is that it is not restricted to the identification of linear relations between the random variables [16].
If X is a continuous random variable taking values between a and b, the interval [a, b] can be discretized by partitioning it into $\mathcal{X}$ subintervals, called bins, where the symbol $\mathcal{X}$ denotes the bin index vector. We use also nb(x_{ k }) to denote the number of data points in the k th bin and the symbol $m={\displaystyle {\sum}_{k\in \mathcal{X}}nb({x}_{k})}$ to denote the number of samples. If X is a random vector each element X_{ i }can be discretized separately into ${\mathcal{X}}_{i}$ bins with index vector ${\mathcal{X}}_{i}$.
and I(X_{ i }; X_{ j }) is the mutual information between the random variables X_{ i }and X_{ j }.
Hence, each mutual information calculus demands the estimation of three entropy terms (Eq. 5). A fast entropy estimation is therefore essential for an effective network inference based on MI. Entropy estimation has gained much interest in feature selection and network inference over the last decade [22]. Most approaches focus on reducing the bias inherent to entropy estimation. In this section, some of the fastest and most used entropy estimators are stressed. Other interesting approaches can be found in [22–26].
2.1 Empirical and MillerMadow corrected estimators
It has been shown that the variance of the empirical estimator is upperbounded by $var({\widehat{H}}^{emp})\le \left(\frac{{(\mathrm{log}\phantom{\rule{0.5em}{0ex}}m)}^{2}}{m}\right)$ which depends only on the number of samples whereas the asymptotic bias of the estimate $bias({\widehat{H}}^{emp})=\frac{\left\mathcal{X}\right1}{2m}$ depends also on the number of bins $\mathcal{X}$ [23]. As $\mathcal{X}$ ≫ m, this estimator can still have a low variance but the bias can become very large [23].
where $\mathcal{X}$ is the number of bins with nonzero probability. This correction, while adding no computational cost to the empirical estimator, reduces the bias without changing variance. As a result, the MillerMadow estimator is often preferred to the naive empirical entropy estimator.
2.2 Shrink entropy estimator
Shrinkage is a general technique to improve an estimator for a small sample size [3]. As the value of λ tends to one, the estimated entropy is moved toward the maximal entropy (uniform probability) whereas when λ is zero the estimated entropy tends to the value of the empirical one.
2.3 The SchurmannGrassberger Estimator
where β_{ i }is the prior probability of an event x_{ i }and Γ(·) is the gamma function, (see [25, 27, 29] for more details).
In case of no a priori knowledge, the β_{ k }are assumed to be equal (β_{ k }= N, k ∈ $\mathcal{X}$) so as no event becomes more probable than another. Note that using a Dirichlet prior with parameters N is equivalent to adding N ≥ 0 "pseudocounts" to each bin i ∈ $\mathcal{X}$. The prior actually provides the estimator the information that $\mathcal{X}$N counts have been observed in previous experiments. From that viewpoint, $\mathcal{X}$N becomes the a priori sample size.
with $\psi (z)=\frac{d\mathrm{ln}\phantom{\rule{0.5em}{0ex}}\Gamma (z)}{dz}$ the digamma function.
Various choices of prior parameters has been proposed in the literature [29–31]. Schurmann and Grassberger have proposed the prior $N=\frac{1}{\left\mathcal{X}\right}$[32] that has been retained in the package.
Implementation of estimators in minet
 1.
the data frame dataset which stores the gene expression dataset or a generic dataset where columns contain variables/features and rows contain outcomes/samples
 2.
the string mi, that denotes the routine used to perform mutual information estimator.
The package makes available four estimation routines : "mi.empirical", "mi.shrink", "mi.sg","mi.mm" (default:"mi.empirical") each referring to the estimators technique explained above.
3 Discretization methods
All the estimators discussed in the previous section have been designed for discrete variables. If the random variable X is continuous and takes values comprised between a and b, it is then required to partition the interval [a, b] into $\mathcal{X}$ subintervals in order to adopt a discrete entropy estimator. The two most used discretizing algorithm are the equal width and the equal frequency quantization. These are explained in the next sections. Other discretization methods can be found in [33–35].
3.1 Equal Width
The principle of the equal width discretization is to divide the range [a_{ i }, b_{ i }] of each variable X_{ i }, i ∈ {1, 2,...,n} in the dataset into ${\mathcal{X}}_{i}$ subintervals of equal size: $[{a}_{i,}{a}_{i}+\frac{{b}_{i}{a}_{i}}{\left{\mathcal{X}}_{i}\right}[,[{a}_{i}+\frac{{b}_{i}{a}_{i}}{\left{\mathcal{X}}_{i}\right},{a}_{i}+2\frac{{b}_{i}{a}_{i}}{\left{\mathcal{X}}_{i}\right}[,\mathrm{...}[{a}_{i}+\frac{({\mathcal{X}}_{i}1)({b}_{i}{a}_{i})}{\left{\mathcal{X}}_{i}\right},{b}_{i}+\epsilon [$. Note that an ε is added in the last interval in order to include the maximal value in one of the ${\mathcal{X}}_{i}$ bins. This discretization scheme has a O(m) complexity cost (by variable).
3.2 Global Equal Width
The principle of the global equal width discretization is the same as the equal width (Sec. 3.1) except that the considered range [a, b] is not the range of each random variable such as in Sec. 3.1 but the range of the random vector composed of all the variables in the dataset. In other words, a and b are respectively the minimal and the maximal value of the dataset.
3.3 Equal Frequency
The equal frequency discretization scheme consists in partitioning the range [a_{ i }, b_{ i }] of each variable X_{ i }in the dataset into ${\mathcal{X}}_{i}$ intervals, each having the same number m/${\mathcal{X}}_{i}$ of data points points. As a result, the size of each interval can be different. Note that if the ${\mathcal{X}}_{i}$ intervals have equal frequencies, the computation of entropy is straightforward: it is log $\frac{1}{\left{\mathcal{X}}_{i}\right}$. However, there can be more than m/${\mathcal{X}}_{i}$ identical values in a vector of measurements. In such case, one of the bins will be more dense than the others and the resulting entropy will be different of log $\frac{1}{\left{\mathcal{X}}_{i}\right}$. It should be noted that this discretization is reported in some papers as one of the most efficient method (e.g. for naive Bayes classification) [35].
Implementation of discretization strategies in minet
The discretization is performed in minet by the function
discretize(dataset, disc = "equalfreq", nbins = sqrt(nrow(dataset)))
where

dataset is the dataset to be discretized

disc is a string which can take three values: "equalfreq" "equalwidth" "globalequalwidth"(default is " equalfreq").

nbins, the number of bins to be used for discretization, which is by default set to $\sqrt{m}$ with m is the number of samples [35]. Note that there are functions used by the builtin R hist() function that can be used here such as nclass. FD(dataset), nclass. scott(dataset) and nclass. Sturges(dataset).
4 Assessment of the network inference algorithm
A network inference problem can be seen as a binary decision problem where the inference algorithm plays the role of a classifier: for each pair of nodes, the algorithm either returns an edge or not. Each pair of nodes can thus be assigned a positive label (an edge) or a negative one (no edge).
A positive label (an edge) predicted by the algorithm is considered as a true positive (TP) or as a false positive (FP) depending on the presence or not of the corresponding edge in the underlying true network, respectively. Analogously, a negative label is considered as a true negative (TN) or a false negative (FN) depending on whether the corresponding edge is present or not in the underlying true network, respectively. Note that all mutual information network inference methods use a threshold value in order to delete the arcs having a too low score. Hence, for each treshold value, a confusion matrix can be computed.
4.1 ROC curves
also known as recall or sensitivity.
A Receiver Operating Characteristic (ROC) curve, is a graphical plot of the TPR (true positive rate) vs. FPR (false positive rate) for a binary classifier system as the threshold is varied [36]. A perfect classifier would yield a point in the upper left corner (having coordinates [0,1]) of the ROC space, representing 100% TPR (all true positives are found) and 0% FPR (no false positives are found). A completely random guess gives a point along the diagonal line (the socalled line of nodiscrimination) which goes from the left bottom to the top right corners. Points above the diagonal line indicate good classification results, while points below the line indicate wrong results.
4.2 PR curves
It is generally recommended [37] to use receiver operator characteristic (ROC) curves when evaluating binary decision problems in order to avoid effects related to the chosen threshold. However, ROC curves can present an overly optimistic view of an algorithm's performance if there is a large skew in the class distribution, as typically encountered in transcriptional network inference because of sparseness. To tackle this problem, precisionrecall (PR) curves have been cited as an alternative to ROC curves [38].
also know as true positive rate (TPR), denote the fraction of real edges that are correctly inferred. These quantities depend on the threshold chosen to return a binary decision. The PR curve is a diagram which plots the precision (p) versus recall (r) for different values of the threshold on a twodimensional coordinate system.
4.3 FScores
which is an harmonic average of precision and recall.
where β is a parameter denoting the weight of the recall. Two commonly used Fscores are the F_{2}measure, which weights recall twice as much as precision, and the F_{0.5}measure, which weights precision twice as much as recall. In transcriptional network inference, precision is often a more desirable feature than recall since it is expensive to investigate if a gene regulates another.
Assesment functionalities in minet
In order to benchmark the inference methods, the package provides a number of assessment tools. The validate(net, ref.net, steps = 50) function allows to compare an inferred network net to a reference network ref.net, described by a Boolean adjacency matrix. The assessment process consists in removing the inferred edges having a score below a given threshold and in computing the related confusion matrix, for steps thresholds ranging from the minimum to the maximum value of edge weigths. A resulting dataframe table containing the list of all the steps confusion matrices is returned and made available for further analysis.
5 Example
Once the R platform is launched, the package, its description and its vignette can be loaded using the following commands:
library(minet)
library(help = minet)
vignette("minet")
A demo script (demo(demo)) shows the main functionalities of the package that we describe in the following.
In order to infer a network with the minet package, four steps are required:

data discretization,

MIM computation,

network inference,

normalization of the network (optional).
The function minet(dataset, method, estimator, disc, nbins) takes the following arguments: dataset, a matrix or a dataframe containing the microarray data, method, the inference algorithm (such as ARACNE, CLR or MRNET), estimator, the entropy estimator used for the computation of mutual information (empirical, MillerMadow, shrink, SchurmannnGrassberger), disc the binning algorithm (i.e. equal frequency or equal size interval) and the parameter nbins which sets the number of bins to use. The final step of the minet function is the normalization using the norm(net) function. This step normalizes all the weights of the inferred adjancy matrix between 0 and 1. Hence, the minet function returns the inferred network as a weighted adjacency matrix with values ranging from 0 to 1 where the higher is a weight, the higher is the evidence that a genegene interaction exists.
For demo purposes the package makes available also the dataset syn.data representing the expression of 50 genes in 100 experiments. This dataset has been synthetically generated from the network syn.net using the microarray data generator Syntren [40]. This dataset can be loaded with data(syn.data) and the corresponding original network with data(syn.net).
Note that the command res<minet(syn.data,"mrnet","mi.shrink","equalwidth",10) is a compact way to execute the following sequence of instructions:
discdata<discretize(syn.data,"equalwidth",10)
mim<build.mim(discdata,"mi.shrink")
net<mrnet(mim)
res<norm(net)
In order to plot a PRcurve (see Figure 3), the functions show.pr and validate can be used.
table < validate(res, syn.net)
show.pr(table)
library(Rgraphviz)
graph < as(res, "graphNEL")
plot(graph)
Note that, for the sake of computational efficiency, all the inference functions as well as the entropy estimators are implemented in C++. As a reference, a network of five hundreds variables may be inferred in less than one minute on an Intel Pentium 4 with 2 Ghz and 512 DDR SDRAM.
6 Conclusion
Transcriptional network inference is a key issue toward the understanding of the relationships between the genes of an organism. Notwithstanding, few public domain tools are available once a thourough comparison of existing approaches is at stake. A new R/Bioconductor package, freely available, has been introduced in this paper. This package makes available to biologists and bioinformatics practicioneers a set of tools to infer networks from microarray datasets with a large number (several thousands) of genes. Four informationtheoretic methods of network inference (i.e. Relevance Networks, CLR, ARACNE and MRNET), four different entropy estimators (i.e. empirical, MillerMadow, SchurmannGrassberger and shrink) and three validation tools (i.e. Fscores, PR curves and ROC curves) are implemented in the package. We deem that this tool is an effective answer to the increasing need of comparative tools in the growing domain of transcriptional network inference from expression data.
Availability and requirements
Available functions of the package minet (version 1.1.6)
Function  Usage 

minet(data, method, estimator, disc, nbins)  Network inference from data 
discretize(data, disc, nbins)  Unsupervised discretization 
build.mim(data, estimator)  Mutual information matrix estimation Estimator can be ""mi.empirical","mi.mm","mi.shrink" and "mi.sg". 
mrnet(mim)  MRNET algorithm 
aracne(mim)  ARACNE algorithm 
clr(mim)  CLR algorithm 
norm(net)  matrix/network normalization 
validate(net1, net2, steps)  Computes confusion matrices 
pr(table)  Computes precisions and recalls from confusion matrices 
rates(table)  Computes true positive rates and false positive rates from confusion matrices 
show.pr(table)  Displays precisionrecall curves from confusion matrices 
show.roc(table)  Displays receiver operator caracteristic curves from confusion matrices 
fscores(table)  Returns a vector of F_{ β }scores from confusion matrices 
Declarations
Acknowledgements
This work was partially funded by the Communauté Française de Belgique under ARC grant no. 04/09307. The authors thank their collegue Catharina Olsen for her appreciable comments, suggestions and testing of package functionalities. The authors also thank Korbinian Strimmer as well as the reviewers for their useful comments on the package and the paper.
Authors’ Affiliations
References
 van Someren EP, Wessels LFA, Backer E, Reinders MJT: Genetic network modeling. Pharmacogenomics 2002, 3(4):507–525. 10.1517/14622416.3.4.507View ArticlePubMedGoogle Scholar
 Gardner TS, Faith J: Reverseengineering transcription control networks. Physics of Life Reviews 2 2005.Google Scholar
 Schäfer J, Strimmer K: An empirical Bayes approach to inferring largescale gene association networks. Bioinformatics 2005, 21(6):754–764. 10.1093/bioinformatics/bti062View ArticlePubMedGoogle Scholar
 Faith J, Hayete B, Thaden J, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins J, Gardner T: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biology 2007., 5:Google Scholar
 Basso K, Margolin A, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nature Genetics 2005., 37:Google Scholar
 Butte AJ, PT , Slonim D, Golub T, Kohane I: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences 2000, 97(22):12182–12186. 10.1073/pnas.220392197View ArticleGoogle Scholar
 Butte AJ, Kohane IS: Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurments. Pac Symp Biocomput 2000, 418–429.Google Scholar
 Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006., 7:Google Scholar
 Meyer PE, Kontos K, Lafitte F, Bontempi G: InformationTheoretic Inference of Large Transcriptional Regulatory Networks. EURASIP J Bioinform Syst Biol 2007, 79879.Google Scholar
 Gentleman RIR: R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 1996., 5: [http://www.Rproject.org]Google Scholar
 Venables WN, Ripley BD: Modern Applied Statistics with S. Fourth edition. Springer; 2002.View ArticleGoogle Scholar
 Gentleman RC, Carey VJ, Bates DJ, Bolstad BM, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth GK, Tierney L, Yang YH, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 2004., 5:Google Scholar
 Cheng J, Greiner R, Kelly J, Bell D, Liu W: Learning Bayesian Networks from Data: An InformationTheory Based Approach. Artificial Intelligence 2002., 137:Google Scholar
 Chow C, Liu C: Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on 1968 14:Google Scholar
 Pearl J: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc; 1988.Google Scholar
 Cover TM, Thomas JA: Elements of Information Theory. New York: John Wiley; 1990.Google Scholar
 Tourassi GD, Frederick ED, Markey MK, C E, Floyd J: Application of the mutual information criterion for feature selection in computeraided diagnosis. Medical Physics 2001, 28(12):2394–2402. 10.1118/1.1418724View ArticlePubMedGoogle Scholar
 Peng H, Long F, Ding C: Feature selection based on mutual information: criteria of maxdependency, maxrelevance, and minredundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27(8):1226–1238. 10.1109/TPAMI.2005.159View ArticlePubMedGoogle Scholar
 Ding C, Peng H: Minimum Redundancy Feature Selection From Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 2005, 3(2):185–205. 10.1142/S0219720005001004View ArticlePubMedGoogle Scholar
 Merz P, Freisleben B: Greedy and Local Search Heuristics for Unconstrained Binary Quadratic Programming. Journal of Heuristics 2002, 8(2):1381–1231. 10.1023/A:1017912624016View ArticleGoogle Scholar
 Olsen C, Meyer PE, Bontempi G: On the Impact of Entropy Estimator in Transcriptional Regulatory Network Inference. In 5th International Workshop on Computational Systems Biology (WSCB 08). Edited by: Ahdesmäki M, Strimmer K, Radde N, Rahnenf hrer J, Klemm K, L hdesm ki H, YliHarja O. Tampere International Center for Signal Processing; 2008:41.Google Scholar
 Daub CO, Steuer R, Selbig J, Kloska S: Estimating mutual information using Bspline functions – an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004., 5:Google Scholar
 Paninski L: Estimation of entropy and mutual information. Neural Computation 2003, 15(6):1191–1253. 10.1162/089976603321780272View ArticleGoogle Scholar
 Beirlant J, Dudewica EJ, Gyofi L, Meulen E: Nonparametric Entropy Estimation: An Overview. Journal of Statistics 97.Google Scholar
 Nemenman I, Bialek W, de Ruyter van Steveninck R: Entropy and information in neural spike trains: Progress on the sampling problem. Phys Rev E Stat Nonlin Soft Matter Phys 2004, 69(5 Pt 2):056111.View ArticlePubMedGoogle Scholar
 Darbellay G, Vajda I: Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory 1999.Google Scholar
 Hausser J: Improving entropy estimation and inferring genetic regulatory networks. Master's thesis National Institute of Applied Sciences Lyon; 2006. [http://strimmerlab.org/publications/mschausser.pdf]Google Scholar
 Schäfer J, Strimmer K: A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 2005., 4(Article 32):Google Scholar
 Wu L, Neskovic P, Reyes E, Festa E, Heindel W: Classifying nback EEG data using entropy and mutual information features. European Symposium on Artificial Neural Networks 2007.Google Scholar
 Beerenwinkel N, Schmidt B, Walter H, Kaiser R, Lengauer T, Hoffmann D, Korn K, Selbig J: Diversity and complexity of HIV1 drug resistance: A bioinformatics approach to predicting phenotype from genotype. Proc Natl Acad Sci U S A 2002, 99(12):8271–8276. 10.1073/pnas.112177799PubMed CentralView ArticlePubMedGoogle Scholar
 Krichevsky R, Trofimov V: The performance of universal coding. IEEE Transactions in Information Theory 1981.Google Scholar
 Schurmann T, Grassberger P: Entropy estimation of symbol sequences. Chaos 1996.Google Scholar
 Dougherty J, Kohavi R, Sahami M: Supervised and Unsupervised Discretization of Continuous Features. International Conference on Machine Learning 1995, 194–202.Google Scholar
 Liu H, Hussain F, Tan CL, Dash M: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 2002., 6:Google Scholar
 Yang Y, Webb GI: On why discretization works for naivebayes classifiers. Proceedings of the 16th Australian Joint Conference on Artificial Intelligence 2003.Google Scholar
 Davis J, Goadrich M: The Relationship Between PrecisionRecall and ROC Curves. Proceedings of the 23rd international conference on Machine learning 2006.Google Scholar
 Provost F, Fawcett T, Kohavi R: The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA; 1998:445–453.Google Scholar
 Bockhorst J, Craven M: Markov Networks for Detecting Overlapping Elements in Sequence Data. In Advances in Neural Information Processing Systems 17. Edited by: Saul LK, Weiss Y, Bottou L. Cambridge, MA: MIT Press; 2005:193–200.Google Scholar
 Sokolova M, Japkowicz N, Szpakowicz S: Beyond Accuracy, Fscore and ROC: a Family of Discriminant Measures for Performance Evaluation. Proceedings of the AAAI'06 workshop on Evaluation Methods for Machine Learning 2006.Google Scholar
 den Bulcke TV, Leemput KV, Naudts B, van Remortel P, Ma H, Verschoren A, Moor BD, Marchal K: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 2006, 7: 43. 10.1186/14712105743View ArticleGoogle Scholar
 Carey VJ, Gentry J, Whalen E, Gentleman R: Network Structures and Algorithms in Bioconductor. Bioinformatics 2005, 21: 135–136. 10.1093/bioinformatics/bth458View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.