Deep Reinforcement Learning
(INCOMPLETE) Deep Reinforcement Learning is an exciting new field that encompasses many different fields: computer science, optimal control, statistics, machine learning, and so on. Its application are numerous.
Policy Gradient
Definition A Markov Decision Process contains
- \( \pi : \mathcal{S} \rightarrow \Delta(\mathcal(A)) \), the stochastic policy.
- \( \eta(\pi) = \mathbb{E} \left[ R_0 + \gamma R_1 + \gamma^2 R_2 +
… \right] \) - \( p: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \rightarrow \mathbb{R} \), the state transition probability
- \( \mu : \mathcal{S} \rightarrow \mathbb{R} \), the probability distribution over the initial state, \( s_0 \)
- \( \theta \in \mathbb{R}^n \), a vector of parameter that parameterizes the stochastic policy \(\pi\)
A policy gradient algorithm then calculate \( \nabla_ {\theta} \eta (\theta) \), and make proceed as a standard gradient ascent algorithm. We approximate the gradient using Monte Carlo estimation, since we don’t have access to the underlying probability distribution.
Theorem Monte Carlo estimation. Let \( X : \Omega \rightarrow \mathbb{R} ^ n \) be a random variable with probability distribution \(q\), and \(f : \mathbb {R}^n \rightarrow \mathbb{R} \). Then
Armed with this theorem, we can use a sample average to estimate the expectation.