Skip to main content

Table 1 Important notations in RL

From: A reinforcement learning approach for protein–ligand binding pose prediction

Notations

Meaning

\(s \in {\mathcal{S}}\)

State

\(a \in {\mathcal{A}}\)

Actions

\(r \in {\mathcal{R}}\)

Immediate reward

\(\gamma\)

Discount factor

\(G_{t}\)

The long-term reward: \(G_{t} = \mathop \sum \limits_{k = 0}^{\infty } \gamma^{k} R_{t + k + 1}\)

\(\pi_{\theta } (a|s)\)

Actor model with parameters \(\theta ;\) it is a distribution of action given the state

\(V_{\omega }^{\pi } \left( s \right)\)

Critic model with parameters \(\omega ;\) it depends on the policy model and can output score