Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Molecular complex detection in protein interaction networks through reinforcement learning

Fig. 1

Example trajectory of training the RL pipeline on a network by learning a value function. A This network comprises 7 nodes and 11 weighted edges B A known complex consists of the nodes A, B, C, and E. C First, a seed edge (A, B) is identified, where the state (density) is S1 = 0.8 and the value function is V(0.8) = 0 (all densities are initialized to 0). Once a node is added, a reward of + 0.2 is given if the node is in the training complex and − 0.2 if absent. D We evaluate all possible neighbors i.e., C and D, to add to the current subgraph {A,B}. Using the value iteration update rule (with γ = 0.5), we compute a corresponding value for the current state by adding each neighbor. E Adding node C updates V(0.8) = 0.2. F Adding node D updates V(0.8) = − 0.2. G The neighbor providing the highest value function (C) is added to the candidate complex and the original state’s value function S1{A,B} = 0.8 is now + 0.2. H Again, we evaluate all possible neighbors of the updated complex S2{A,B,C} = 0.57, i.e., D, E, and G. (I) Node D updates V(0.57) = − 0.2. J Node E updates V(0.57) =  + 0.2. K Node G updates V(0.57) = − 0.2. L Node E is added to the complex and V(0.57) is updated to + 0.2. This process is repeated until growth termination by adding an imaginary node with reward 0. As the remaining neighbors D, F, and G have a reward of − 0.2, the imaginary node is chosen as it results in the highest value function (0.1). The candidate complex is then finalized. A new seed edge is chosen from the network and this process repeats

Back to article page