A reinforcement learning approach for protein–ligand binding pose prediction

BMC Bioinformatics

Table 1 Important notations in RL

Notations	Meaning
\(s \in {\mathcal{S}}\)	State
\(a \in {\mathcal{A}}\)	Actions
\(r \in {\mathcal{R}}\)	Immediate reward
\(\gamma\)	Discount factor
\(G_{t}\)	The long-term reward: \(G_{t} = \mathop \sum \limits_{k = 0}^{\infty } \gamma^{k} R_{t + k + 1}\)
\(\pi_{\theta } (a\|s)\)	Actor model with parameters \(\theta ;\) it is a distribution of action given the state
\(V_{\omega }^{\pi } \left( s \right)\)	Critic model with parameters \(\omega ;\) it depends on the policy model and can output score

ISSN: 1471-2105