Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: A knowledge graph approach to predict and interpret disease-causing gene interactions

Fig. 3

KG-based associative classifier training workflow. The diagram outlines how our framework uses labelled gene pairs and the path information in BOCK to train a rule-based model predicting pathogenic gene interactions. (1) Given a disease-associated gene pair (represented as \(D_1\) (\(G_S\),\(G_T\))), all paths in BOCK starting at the gene node \(G_S\) and ending at the gene node \(G_T\) are collected, up to a certain predetermined path length cutoff. Although this traversal disregard edge directionality, the original direction of the edges is encoded in the recorded paths; (2) Each path is attributed a reliability score based on the original edge weight. Paths are then aggregated into their metapaths (i.e. path types) (M); (3.a) Association rules (R) are mined by finding frequent patterns of metapaths occurring in disease-causing gene pairs (D). Rules are extended with additional metapath conditions as long as their support (i.e. the weighted frequency of the pattern) is greater than a defined threshold (minsup); (3.b) Rules can be extended with a unification condition (e.g. node \(G_X\) common to metapaths M1 and M2) if such pattern remains frequent; (3.c) The mined rules are refined with path reliability thresholds aiming to filter paths of lower quality while preserving a high rule support; (4) Using all pre-mined rules (R) and training data made of disease-causing gene pairs (D) as positive examples and a set of putative neutral gene pairs (N) as negative examples, a decision set model is trained by selecting a subset of predictive rules

Back to article page