Policy Gradient methods and Proximal Policy Optimization (PPO): diving into Deep RL!
Fantastic short video where Xander introduces Policy Gradient methods for Deep Reinforcement Learning. After a general overview, he dives into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.












