 Methodology article
 Open Access
Discovering causal interactions using Bayesian network scoring and information gain
 Zexian Zeng^{1},
 Xia Jiang^{2} and
 Richard Neapolitan^{1}Email author
https://doi.org/10.1186/s1285901610848
© Zeng et al. 2016
 Received: 27 November 2015
 Accepted: 14 May 2016
 Published: 26 May 2016
Abstract
Background
The problem of learning causal influences from data has recently attracted much attention. Standard statistical methods can have difficulty learning discrete causes, which interacting to affect a target, because the assumptions in these methods often do not model discrete causal relationships well. An important task then is to learn such interactions from data. Motivated by the problem of learning epistatic interactions from datasets developed in genomewide association studies (GWAS), researchers conceived new methods for learning discrete interactions. However, many of these methods do not differentiate a model representing a true interaction from a model representing noninteracting causes with strong individual affects. The recent algorithm MBSIGain addresses this difficulty by using Bayesian network learning and information gain to discover interactions from highdimensional datasets. However, MBSIGain requires marginal effects to detect interactions containing more than two causes. If the dataset is not highdimensional, we can avoid this shortcoming by doing an exhaustive search.
Results
We develop ExhaustiveIGain, which is like MBSIGain but does an exhaustive search. We compare the performance of ExhaustiveIGain to MBSIGain using lowdimensional simulated datasets based on interactions with marginal effects and ones based on interactions without marginal effects. Their performance is similar on the datasets based on marginal effects. However, ExhaustiveIGain compellingly outperforms MBSIGain on the datasets based on 3 and 4cause interactions without marginal effects. We apply ExhaustiveIGain to investigate how clinical variables interact to affect breast cancer survival, and obtain results that agree with judgements of a breast cancer oncologist.
Conclusions
We conclude that the combined use of information gain and Bayesian network scoring enables us to discover higher order interactions with no marginal effects if we perform an exhaustive search. We further conclude that ExhaustiveIGain can be effective when applied to real data.
Keywords
 Bayesian network
 Cause
 Interaction
 Information gain
 Epistasis
 Lowdimensional
 Breast cancer survival
Background
The problem of learning causal influences from passive data has attracted a good deal of attention in the past 30 years, and techniques have been developed and tested. The constraintbased technique for learning Bayesian networks is a wellknown method [1], and has been implemented in the Tetrad package (http://www.phil.cmu.edu/tetrad/). This method orients edges which are compelled to be causal influences. Another method for learning Bayesian networks is the greedy equivalent search (GES) [2], which does not in itself distinguish which edges are compelled to be causal. However, postprocessing of its resultant network can compel edges. Both these (and other) strategies assume the composition property, which states that a variable Z and a set of variables S are not independent conditional on T, then there exists a variable X in S such that X and Z are not independent conditional on T [2]. When T is the empty set, this property simply states if Z and S are not independent then there is an X in S such that Z and X are not independent. So, at least one variable in S much be correlated with Z. However, if two or more variables interact in some way to affect Z, there could be little marginal effect for each variable, and the observed data could easily not satisfy the composition property. Furthermore, if interacting variables have strong marginal effects, the causal learning algorithms do not distinguish them as interactions, but only as individual causes.
So, the standard methods for learning causal influences do not learn that causes are interacting to cause a target, and do not even discover causes that are interacting with little or no marginal effect. An important task then is to learn such interactions from data. A method that does this could be a preliminary step before applying a causal learning algorithm. This paper concerns the development of a new method that does this in the case of discrete variables. We first provide some examples of situations where discrete variables interact.
Interaction examples
An example, which has recently received a lot of attention, is genegene interactions, called epistasis. Biologically, epistasis describes a situation where a variant at one locus prevents the variant at a second locus from manifesting its effect [3]. Epistasis between n loci is called pure epistasis if none of the loci individually are predictive of phenotype and is called strict epistasis if no proper multilocus subset of the loci is predictive of phenotype [4]. Epistasis has been defined statistically as a deviation from additivity in a model summarizing the relationship between multilocus genotypes and phenotype [5]. It is believed that much of genetic risk for disease is due to epistatic interactions [6–9]. A Single nucleotide polymorphism (SNP) is a substitution of one base for another. Genomewide association studies (GWAS) investigate many SNPs, often numbering in the millions, along with a phenotype such as disease status. By investigating singlelocus associations, researchers have identified over 150 risk loci associated with 60 common diseases and traits [10–13]. However, these singlelocus investigations would miss epistatic interactions with little marginal effect.
Another important example is the interaction of clinical or genomic variables with treatments to affect patient outcomes. For example, Herceptin is a treatment for breast cancer patients which is effective for HER2+ patients. So, Herceptin and HER2 status interact to affect survival. This is a wellknown relationship. However, we now have large scale breast cancer and other datasets [14] from which we can learn treatmentvariable interactions that are not yet known. This knowledge will enable us to better provide precision medicine.
As another example, we are now obtaining abundant hospital data concerning workflow. These data can be analysed to determine good personnel combinations and sequencing [15].
Statistical interactions
The examples just shown are two extreme cases, providing us with clear examples of an interaction and a noninteraction. However, in general, there does not appear to be a dichotomous way to classify a discrete causal relationship as an interaction or a noninteraction. So, we propose a fuzzy set membership definition of a discrete interaction in the Methods Section.
Previous research on learning discrete interactions
The problem concerning learning genetic epistasis from GWAS datasets has recently inspired ample research on learning discrete interactions from highdimensional datasets. Researchers applied standard statistical techniques including logistic regression [17,18], and regularized logistic regression [19,20]. However, many felt that regression may not work well at learning interacting loci because the assumptions in these models are too restrictive. So researchers applied machine learning strategies including modeling full interactions [21], using information gain [22], a technique called SNP Harvester [23], using ReliefF [24], applying random forests [25], a strategy called predictive rule inference [26], a method called Bayesian epistasis association mapping (BEAM) [27], the use of maximum entropy [28], Bayesian network learning [29–31], and Bayesian network learning combined with information gain [32]. A wellknown new technique called Multifactor Dimensionality Reduction (MDR) [33] was also developed. MDR combines two or more variables into a single variable (hence leading to dimensionality reduction); this changes the representation space of the data and facilitates the detection of nonlinear interactions among the variables. MDR has been applied to detect epistatically interacting loci in hypertension [34], sporadic breast cancer [35], and type II diabetes [36]. Jiang et al. [37] evaluated the performance of 22 Bayesian network scoring criteria and MDR when learning two interacting SNPs with no marginal effects. Using 28,000 simulated datasets and a real Alzheimer's GWAS dataset, they found that several of the Bayesian network scoring criteria performed substantially better than other scores and MDR. The BN score that performed best was the Bayesian Dirichlet equivalence uniform score, which is based on the probability of the data given the model.
Henceforth, we refer to a candidate cause as a predictor. The multiple beam search algorithm (MBS) was developed in [29] to discover causal interactions. MBS starts by narrowing down the number of predictors using a Bayesian network scoring criterion (discussed in the Methods Section) to identify a best set of possible predictors. Next it starts a beam from each of these predictors. It performs greedy forward search on this beam by adding the predictor that increases the score the most. It stops when no predictor addition increases the score. Next MBS does greedy backward search on each beam by deleting the predictor that increases the score the most. It stops when no predictor deletion increases the score. The set of predictors discovered in this manner is a candidate causal interaction. However, if two predictors each have a strong individual effect, they will have a high score together and will therefore be identified as an interaction, even if they do not interact. MBSIGain [32] resolves this difficulty. MBSIGain also used MBS to develop beams and uses Bayesian network scoring to end the forward search. However, it uses information gain to choose the next predictor rather than adding the predictor that increases the score the most. In a comparison using 100 simulated 1000predictor datasets with 15 interacting predictors involved in 5 interactions, MBSIGain substantially outperformed nine epistasis learning methods including MBS [29], LEAP [31], logistic regression [18], MDR [33] combined with a heuristic search, full interaction modeling [21], information gain alone [22], SNP Harvester [23], BEAM [27], and a technique that uses maximum entropy [28].
Methods
MBSIGain requires some marginal effect to detect interactions containing more than two predictors. If the dataset is not highdimensional, we can alleviate this difficulty by instead doing an exhaustive search while using the model selection criteria in MBSIGain. However, the exhaustive search is not straightforward because we must not only score each candidate model M, but also check the submodels of M to see how much information is provided if we do not combine them into M. We develop ExhaustiveIGain, which does this.
We compare the performance of ExhaustiveIGain to MBSIGain using 10 simulated 40predictor datasets based on 5 interactions with marginal effects, 16 simulated 40predictor datasets based on two predictors interacting with no marginal effects, 16 simulated 40predictor datasets based on 3 predictors interacting with no marginal effects, and 16 simulated 40predictor datasets based on 4 predictors interacting with no marginal effects. We use ExhaustiveIGain to learn interactions from a real clinical breast cancer dataset.
Since ExhaustiveGain uses Bayesian networks and information gain, we first review these.
Bayesian networks
Using a BN, we can determine probabilities of interest with a BN inference algorithm [16]. For example, using the BN in Fig. 1, if a patient has a smoking history (H = yes), a positive chest Xray (X = pos), and a positive CAT scan (CT = pos), we can determine the probability of the patient having lung cancer (L = yes). That is, we can compute P(L = yes H = Yes, X = pos, CT = pos). Inference in BNs is NPhard [47]. So, approximation algorithms are often employed [16].
It has been shown that the problem of learning a BN DAG model from data is NPhard [50]. Resultantly, heuristic search algorithms have been developed [16].
Information gain, interaction strength, and interaction power
If we repeat n trials of the experiment having outcome Z, then it is possible to show that the entropy H(Z) is the limit as n → ∞ of the expected value of the number of bits needed to report the outcome of every trial. Entropy provides a measure of our uncertainty in the value of Z in the sense that, as entropy increases, it takes more bits on the average to resolve our uncertainty. Entropy achieves its maximum value when P(z _{ i }) = 1/m for all z _{ i }, and its minimum value (0) when P(z _{ j }) = 1 for some z _{ j }.
Since A is a set, A ∪ {X} should technically be used in the IG expression. However, we represent this union by X, A. Interaction strength provides a measure of the increase in information gain obtained when X and A are known together relative to knowing each of them separately.
Since information gain (IG) is nonnegative, it is straightforward that IP(Z;M) ≤ 1. If M is causing Z with no marginal effects (e.g. pure, strict epistasis), the IP is 1. We would consider this a very strong interaction. When the IP is small, the increase in IG obtained by considering the variables together is small compared to considering them separately. We would consider this a weak interaction or no interaction at all.
So, in situations we often investigate, the IP is between 0 and 1, and therefore satisfies the notion of a fuzzy set [52], where the greater the value of the IP the greater membership the model has in the fuzzy set of interactions.
The IS and IP can be used to discover interactions. In this next section we develop algorithms for learning interactions that use the IS and the IP.
Interaction strength algorithms
We present the multiple beam search information gain (MBSIGain) and exhaustive search information gain (ExhaustiveIGain) algorithms, which use information gain and Bayesian network scoring to learn interactions. MBSIGain, which was previously developed in [32], does a heuristic search, while ExhaustiveIGain does an exhaustive search.
Reporting the noteworthiness of an interaction
Evaluation methodology
We evaluated ExhaustiveIGain by comparing it to MBSIGain using simulated datasets, and by applying it to a real breast cancer dataset. We discuss each of these next.
Simulated datasets
 1.
S1, S2, S3, S4, S5
 2.
S6, S7, S8
 3.
S9, S10, S11
 4.
S12, S13
 5.
S14, S15
Each of these 5 interactions exhibits some marginal effect. As mentioned in the Introduction Section, MBSIGain [33] previously outperformed 9 other methods at interaction discovery using these 100 datasets. We developed 10 datasets based on these same interactions, but with only 40 total SNPs. Each dataset has 1000 cases and 1000 controls.
Urbanowicz et al. [54] created GAMETES, which is a software package for generating pure, strict epistatic models with random architectures. We used GAMETES to develop 2SNP, 3SNP, and 4SNP models of pure epistatic interaction. That is, there are no marginal effects. The software allows the user to specify the heritability and the minor allele frequency (MAF). We used values of heritability ranging between 0.01 and 0.2, and values of MAF ranging between 0.1 and 0.4. Using these values, we generated 16 datasets based on pure, strict 2SNP interactions, 16 datasets based on pure, strict 3SNP interactions, and 16 datasets based on pure, strict 4SNP interactions. The 2SNP and 3SNP based datasets contained 1000 cases and 1000 controls, and the 4SNP based datasets contained 5000 cases and 5000 controls. All the simulated datasets are available in Additional file 1.
We used both MBSIGain and ExhaustiveIGain to analyze both sets of datasets. We ran both algorithms with all combination of the following values of the threshold T in the algorithms: T = 0.1, 0.2; and the parameter α in the BDeu score: α = 9, 54, 128.

Criterion 1: This criterion determines how well the method discovers the predictors in the interactions, but does not concern itself with whether the method discovers the actual interactions. First, the learned interactions are ordered by their scores. Then each predictor is ordered according to the first interaction in which it appears. Finally, the power according to criterion 1 is computed as follows:$$ Powe{r}_1(K)=\frac{1}{H\times M}{\displaystyle \sum_{i=1}^H{N}_K(i)} $$
where N _{ K }(i) is the number of true interacting predictors appearing in the first K predictors learned for the ith dataset, M is the total number of interacting predictors in all interactions, and H is the number of datasets.

Criterion 2: This criterion measures how well a method discovers each of the interactions. The criterion used the Jaccard index which is as follows:$$ Jaccard\left(A,B\right)=\frac{\#\left(A\cap B\right)}{\#\left(A\cup B\right)}. $$
Real dataset

age_at_diagnosis: This variable was discretized to the five ranges shown using the equal distribution discretization technique and breast cancer expert knowledge.

size: This variable was discretized to the three standard ranges shown.

lymph_nodes_positive: This variable was grouped into the six ranges shown.
The clinical variables in the METABRIC dataset
Variable  Description  Values 

age_at_diagnosis  age at diagnosis of the disease  039, 39–54, 54–69, 69–84, 84100 
menopausal_status  inferred menopausal status  pre, post 
size  size of tumor in cm  020, 20–50, 50180 
lymph_nodes_positive  number of positive lymph nodes  0, 1, 2–3, 4–5, 6–9. ≥ 10 
lymph_nodes_removed  number of lymph nodes removed  0, 1–3, 4–9, 10–20, ≥ 21 
percent_nodes_positive  percent of removed nodes positive  00.2, 0.20.4, 0.40.6, 0.60.8, 0.81 
grade  grade of disease  1, 2, 3 
stage  composite of size and # positive nodes  0,1,2,3,4 
histological  tumor histology  IDC, Other 
ER_Expr  estrogen receptor expression  +, − 
PR_Expr  progesterone receptor expression  +, − 
HER2_status  HER2 expression  +, − 
P53_mutation_status  whether P53 is mutated  +, − 
chemo  whether patient had chemotherapy  yes, no 
radiation  whether patient had radiation therapy  yes, no 
hormone  whether patient had hormone therapy  yes, no 
The outcome variable is whether the patient died from breast cancer. If the person was known to die from breast cancer, the days after initial consultation that the patient died is recorded. If the person was not known to die from breast cancer, the days after initial consultation that the patient was last seen alive or died from another cause is recorded. If a patient was known to die from breast cancer within x years after initial consultation or is known to be alive x years after initial consultation, we say their breast cancer survival status is known x years after initial consultation. These data provide us with 1698 patients whose breast cancer survival status is known 5 years after initial consultation, 1228 patients whose breast cancer survival status is known 10 years after initial consultation, and 782 patients whose breast cancer survival status is known 15 years after initial consultation.
We used ExhaustiveIGain to learn interactions that affect 5 year, 10 year, and 15 year breast cancer survival.
Results and discussion
Simulated datasets based on marginal effects
ExhaustiveIGain discovers on the average 7.5 models and MBSIGain discovers on the average 7 models. When there are 40 SNPs, there about 760,058 models containing between 2 and 5 SNPs. So, both methods exhibit the good discovery performance shown in Figs. 5 and 6 with very few false positives. Note that MBSIGain could discover at most 40 models because there are only 40 beams.
Simulated datasets based on pure, strict epistasis
We see from Fig. 8a that both methods discover the 2SNP interaction very well. In fact ExhaustiveIGain ranked the correct interaction first in 15 of the datasets and 3^{rd} in the remaining dataset, while MBSIGain ranks it first in 15 of the datasets and 4^{th} in the remaining dataset (This information is not in the figure). In the case of a 2SNP interaction, MBSIGain effectively does an exhaustive search, explaining why it performs almost as well as ExhaustiveIGain. Its slightly worse performance is due to its different exit criteria concerning the score. It stops adding predictors when no predictor increases the score. On the other hand, ExhaustiveIGain checks whether any submodel has a higher score than the model being considered. ExhaustiveIGain achieves this performance with very few false discoveries. The average number of interactions discovered by ExhaustiveIGain is 2.0. On the other hand, the average number of interactions discovered by MBSIGain is 4.75.
Figure 8b shows that ExhaustiveIGain also discovers the 3SNP interactions extremely well, while MBSIGain exhibits poor performance. This poor performance is to be expected. That is, when there are no marginal effects, if {S1,S2,S3} is our interaction, S2 or S3 would be chosen first on the beam initiating from S1 only by chance. In general, ExhaustiveIGain exhibited this good performance with a low false positive rate. The average number of interactions discovered for 15 of the datasets was 2.47. However, for one of the datasets, 100 interactions (the maximum reported) were identified.
As Fig. 8c shows, ExhaustiveIGain performed well for the 4SNP interactions, but not as well it did for the smaller models. This result indicates that higher order interactions are more difficult to discover. As expected, MBSIGain again showed very poor performance. For 14 of the datasets, the average number of interactions discovered by ExhaustiveIGain was 1.85. However, for two of the datasets, 100 interactions were discovered.
Real dataset
The individual variable effects learned from the METABRIC dataset. The pvalues were obtained using the chisquare test
Variable  5 year BC death  10 year BC death  15 year BC death  

BNPP  pvalue  BNPP  pvalue  BNPP  pvalue  
P53_mutation_status  1  0  0.97  0.001  0.936  0.0004 
HER2_Status  1  0  1  0  0.853  0.0006 
chemo  1  0  1  0  0.999  0 
PR_category  1  0  1  0  0.971  0.002 
hormone  0.880  0.112  0.410  0.120  0.999  0 
radiation  0.240  0.320  0.170  1  0.280  0.576 
ER_category  1  0  1  0  0.889  0.002 
overall_stage  1  0  1  0  1  0 
menopausal_status  0.940  0.019  0.190  0.76  0.421  0.554 
histological  0.450  0.0250  0.940  0.002  0.913  0.055 
lymph_nodes_pos  1  0  1  0  1  0 
percent_nodes_positive  1  0  1  0  0.999  0 
overall_grade  1  0  1  0  0.999  0.0001 
size  1  0  1  0  0.954  0.014 
age_at_diagnosis  1  0  1  0  0.950  0.0003 
axillary_nodes_removed  0.160  0.113  0.950  0.003  0.147  0.567 
The interactions learned from the METABRIC dataset
Outcome  Interaction  BNPP  IP 

5 year BC death  histological, menopausal_status  0.77  0.43 
histological, hormone  0.93  0.47  
10 year BC death  hormone, menopausal_status  0.32  0.72 
15 year BC death  histological, menopausal status  0.57  0.49 
Table 3 shows the interactions learned from the Metabric dataset that have IPs > 0.4. The data indicates that histological interacts with menopausal_status to affect both 5 year and 15 year breast cancer death survival. A consultation with a breast cancer oncologist^{1} reveals that invasive ductal carcinoma (IDC) has a worse prognosis in premenopausal women, but other histological types do not. Furthermore, Table 2 indicates that neither histological nor menopausal status is highly correlated with 5 year or 15 year breast cancer death survival by themselves. Table 3 also shows that the data indicates hormone and menopausal_status interact to affect 10 breast cancer death survival. The breast cancer oncologist indicated that hormone therapy is more effective in postmenopausal women. As Table 2 shows, neither hormone nor menopausal_status are highly correlated with 10 year breast cancer death survival by themselves. Finally, Table 3 shows that the data indicates that histological and hormone interact to affect 5 year breast cancer death survival. The oncologist stated IDC might respond slightly worse to hormone therapy than other types, but that this difference is not wellestablished.
The average BNPPs and IPs of all 2, 3, 4, and 5 predictor models obtained from the Metabric dataset
Model  Avg. BNPP  Avg. IP 

2predictor models  0.266  0.042 
3predictor models  0.005  −0.005 
4predictor models  6.13 × 10 ^{7}  0.013 
5predictor models  7:04 × 10 ^{16}  0.040 
Conclusions
We compared ExhaustiveIGain to MBSIGain using simulated datasets based on interactions with marginal effects, and simulated datasets based on interactions with no marginal effects. MBSIGain performed as well as (actually slightly better than) ExhaustiveIGain when analysing the datasets based on interactions with marginal effects. MBSIGain is O(Rn ^{2}) whereas ExhaustiveIGain is O(n ^{ R }), where n is the number of predictors and R is the maximum size of the models considered. So, our results indicate that MBSIGain achieves similar results to ExhaustiveIGain with this type of dataset, but much more efficiently. On the other hand, as could be expected, MBSIGain could not discover pure epistatic interactions involving more than two SNPs. ExhaustiveIGain performed very well at discovering 3SNP interactions, and reasonably well at discovering 4SNP interactions. We conclude from these results that the combined use of information gain and Bayesian network scoring enables us to discover higher order pure epistatic interactions if we perform an exhaustive search.
When we applied ExhaustiveIGain to a real breast cancer dataset to learn interactions affecting breast cancer survival, we learned interactions that agreed with the judgements of a breast cancer oncologist. We conclude that ExhaustiveIGain can be effective when applied to real data.
Abbreviations
BDeu, Bayesian Dirichlet equivalent uniform; BEAM, Bayesian epistasis association mapping; BN, Bayesian network; BNPP, Bayesian network posterior probability; DAG, directed acyclic graph; ExhaustiveIGAIN, exhaustive search information gain; GES, greedy equivalent search; GWAS, genomewide association studies; IP, interaction power; IS, interaction strength; MAF, minor allele frequency; MBS, multiple beam search; MBSIGAIN, multiple beam search information gain; MDR, Multifactor Dimensionality Reduction; SNP, single nucleotide polymorphism.
Declarations
Acknowledgements
Not applicable.
Funding
This work was supported by National Library of Medicine grants number R00LM010822, R01LM011663, and R01LM011962.
Availability of data and materials
The simulated dataset(s) supporting the conclusions of this article are included in Additional file 1.
Authors’ contributions
Developed the algorithms: XJ. Conceived and designed the experiments: XJ. Performed the experiments: ZZ. Analysed the data: ZZ, XJ, RN. Wrote the paper: RN, XJ. All authors read and approved the final manuscript.
Authors’ information
Zexian Zeng is a Ph.D. student in bioinformatics at the Northwestern University Feinberg School of Medicine. Xia Jiang is assistant professor of biomedical informatics at the University of Pittsburgh. She has 13 years research experience in Bayesian network modeling, machine learning, and algorithm design, with the focus on solving problems in the clinical and biomedical domains. Richard Neapolitan is professor of biomedical informatics at Northwestern University. He is one of the leading researchers in uncertain reasoning in artificial intelligence, having written the seminal 1989 Bayesian network text Probabilistic Reasoning in Expert Systems, and more recently the 2004 text Learning Bayesian Networks. Drs. Jiang and Neapolitan collaborated on the 2012 text Contemporary Artificial Intelligence.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Spirtes P, Glymour C, Scheines R. Causation, prediction, and search. Boston: MIT Press; 2000.Google Scholar
 Chickering D, Meek C. Finding optimal Bayesian networks. In: Darwiche A, Friedman N, editors. Uncertainty in Artificial Intelligence; Proceedings of the Eighteenth Conference. San Mateo: Morgan Kaufmann; 2002.Google Scholar
 Cheverud J, Routman E. Epistasis and its contribution to genetic variance components. Genetics. 1995;139(3):1455.PubMedPubMed CentralGoogle Scholar
 Urbanowicz R, GranizoMackenzie A, Kiralis J, Moore JH. A classification and characterization of twolocus, pure, strict, epistatic models for simulation and detection. BioData Min. 2014;7:8.View ArticlePubMedPubMed CentralGoogle Scholar
 Fisher R. The correlation between relatives on the supposition of mendelian inheritance. Trans R Soc Edinburgh. 1918;52:399–433.View ArticleGoogle Scholar
 Galvin A, Ioannidis JPA, Dragani TA. Beyond genomewide association studies: genetic heterogeneity and individual predisposition to cancer. Trends Genet. 2010;26(3):132–41.View ArticleGoogle Scholar
 Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases and complex traits. Nature. 2009;461:747–53.View ArticlePubMedPubMed CentralGoogle Scholar
 Mahr B. Personal genomics: The case of missing heritability. Nature. 2008;456:18–21.View ArticleGoogle Scholar
 Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genomewide association studies. Bioinformatics. 2010;26:445–55.View ArticlePubMedPubMed CentralGoogle Scholar
 Manolio TA, Collins FS. The HapMap and genomewide association studies in diagnosis and therapy. Annu Rev Med. 2009;60:443–56.View ArticlePubMedPubMed CentralGoogle Scholar
 Herbert A, Gerry NP, McQueen MB. A common genetic variant is associated with adult and childhood obesity. J Comput Biol. 2006;312:279–384.Google Scholar
 Spinola M, Meyer P, Kammerer S, et al. Association of the PDCD5 locus with long cancer risk and prognosis in smokers. Am J Hum Genet. 2001;55:27–46.Google Scholar
 Lambert JC, Heath S, Even G, et al. Genomewide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet. 2009;41:1094–9.View ArticlePubMedGoogle Scholar
 Curtis C, Shah SP, Chin SF, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroup. Nature. 2012;486:346–52.PubMedPubMed CentralGoogle Scholar
 Soulakis ND, Carson MB, Lee YJ, Schneider DH, Skeehan CT, Scholtens DM. Visualizing collaborative electronic health record usage for hospitalized patients with heart failure. JAMIA. 2015;22(2):299–311.PubMedPubMed CentralGoogle Scholar
 Neapolitan RE. Learning Bayesian Networks. Upper Saddle River: Prentice Hall; 2004.Google Scholar
 Kooperberg C, Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol. 2005;28:157–70.View ArticlePubMedGoogle Scholar
 Agresti A. Categorical data analysis. 2nd ed. New York: Wiley; 2007.Google Scholar
 Park MY, Hastie T. Penalized logistic regression for detecting gene interactions. Biostatistics. 2008;9:30–50.View ArticlePubMedGoogle Scholar
 Wu TT, Chen YF, Hastie T, Sobel E, Lange K. Genomewide association analysis by lasso penalized logistic regression. Genome Analysis. 2009;25:714–21.Google Scholar
 Marchini J, Donnelly P, Cardon LR. Genomewide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37:413–7.View ArticlePubMedGoogle Scholar
 Moore JH, Gilbert JC, Tsai CT, et al. A flexible computational framework for detecting characterizing and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006;241:252–61.View ArticlePubMedGoogle Scholar
 Yang C, He Z, Wan X, et al. SNPHarvester: a filteringbased approach for detecting epistatic interactions in genomewide association studies. Bioinformatics. 2009;25:504–11.View ArticlePubMedGoogle Scholar
 Moore JH, White BC. Tuning ReliefF for genomewide genetic analysis. In: Marchiori E, Moore JH, Rajapakee JC, editors. Proceedings of EvoBIO 2007. Berlin: Springer; 2007.Google Scholar
 Meng Y, Yang Q, Cuenco KT, et al. Twostage approach for identifying singlenucleotide polymorphisms associated with rheumatoid arthritis using random forests and Bayesian networks. BMC Proc. 2007;1 Suppl 1:S56.View ArticlePubMedGoogle Scholar
 Wan X, Yang C, Yang Q, et al. Predictive rule inference for epistatic interaction detection in genomewide association studies. Bioinformatics. 2007;26(1):30–7.View ArticleGoogle Scholar
 Zhang Y, Liu JS. Bayesian inference of epistatic interactions in case control studies. Nat Genet. 2007;39:1167–73.View ArticlePubMedGoogle Scholar
 Miller DJ, Zhang Y, Yu G, et al. An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics. 2009;25(19):2478–85.View ArticlePubMedPubMed CentralGoogle Scholar
 Jiang X, Barmada MM, Neapolitan RE, Visweswaran S, Cooper GF. A fast algorithm for learning epistatic genomic relationships. In: AMIA 2010 Symposium Proceedings. 2010. p. 341–5.Google Scholar
 Jiang X, Barmada MM, Cooper GF, Becich MJ. A Bayesian method for evaluating and discovering disease loci associations. PLoS One. 2011;6(8):e22075.View ArticlePubMedPubMed CentralGoogle Scholar
 Jiang X, Neapolitan RE. LEAP: biomarker inference through learning and evaluating association patterns. Genet Epidemiol. 2015;39(3):173–84.View ArticlePubMedPubMed CentralGoogle Scholar
 Jiang X, Jao J, Neapolitan RE. Learning predictive interactions using information gain and Bayesian network scoring. PLoS One. 2015. http://dx.doi.org/10.1371/journal.pone.0143247.
 Hahn LW, Ritchie MD, Moore JH. Multifactor dimensionality reduction software for detecting genegene and geneenvironment interactions. Bioinformatics. 2003;19:376–82.View ArticlePubMedGoogle Scholar
 Moore JH, Williams SM. New strategies for identifying gene interactions in hypertension. Ann Med. 2002;34:88–95.View ArticlePubMedGoogle Scholar
 Ritchie MD, Hahn LW, Roodi N, et al. Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–47.View ArticlePubMedPubMed CentralGoogle Scholar
 Cho YM, Ritchie MD, Moore JH, et al. Multifactor dimensionality reduction reveals a twolocus interaction associated with type 2 diabetes mellitus. Diabetologia. 2004;47:549–54.View ArticlePubMedGoogle Scholar
 Jiang X, Neapolitan RE, Barmada MM, Visweswaran S. Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinformatics. 2011;12(89):147121051289.Google Scholar
 Jensen FV, Neilsen TD. Bayesian Networks and Decision Graphs. New York: Springer; 2007.View ArticleGoogle Scholar
 Neapolitan RE. Probabilistic Reasoning in Expert Systems. New York: Wiley; 1989.Google Scholar
 Pearl J. Probabilistic Reasoning in Intelligent Systems. Burlington: Morgan Kaufmann; 1988.Google Scholar
 Segal E, Pe'er D, Regev A, Koller D, Friedman N. Learning module networks. J Mach Learn Res. 2005;6:557–88.Google Scholar
 Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on computational molecular biology, Tokyo, Japan. 2005.Google Scholar
 Fishelson M, Geiger D. Optimizing exact genetic linkage computation. J Comput Biol. 2004;11:263–75.View ArticlePubMedGoogle Scholar
 Neapolitan RE. Probabilistic Reasoning in Bioinformatics. Burlington: Morgan Kaufmann; 2009.Google Scholar
 Jiang X, Cooper GF. A realtime temporal Bayesian architecture for event surveillance and its application to patientspecific multiple disease outbreak detection. Data Min Knowl Disc. 2010;20(3):328–60.View ArticleGoogle Scholar
 Jiang X, Wallstrom G, Cooper GF, Wagner MM. Bayesian prediction of an epidemic curve. J Biomed Inform. 2009;42(1):90–9.View ArticlePubMedGoogle Scholar
 Cooper GF. The computational complexity of probabilistic inference using Bayesian belief networks. J Artif Intell. 1990;42(2–3):393–405.View ArticleGoogle Scholar
 Cooper GF, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Mach Learn. 1992;9:309–47.Google Scholar
 Heckerman D, Geiger D, Chickering D. Learning Bayesian networks: the combination of knowledge and statistical data. Technical report MSRTR9409. Microsoft Research, 1995.Google Scholar
 Chickering M. Learning Bayesian networks is NPcomplete. In: Fisher D, Lenz H, editors. Learning from Data: Artificial Intelligence and Statistics V. New York: Springer; 1996.Google Scholar
 Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423.View ArticleGoogle Scholar
 Zadeh LA. Fuzzy sets. Inf Control. 1965;8:338–53.View ArticleGoogle Scholar
 Chen L, Yu G, Langefeld CD, et al. Comparative analysis of methods for detecting interacting loci. BMC Genomics. 2011;12:344.View ArticlePubMedPubMed CentralGoogle Scholar
 Urbanowicz R, Kiralis J, SinnottArmstrong NA, et al. GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012;5(1):16. doi:10.1186/17560381516.View ArticlePubMedPubMed CentralGoogle Scholar
 Fisher RA. On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron. 1921;1:3–32.Google Scholar