增强学习 经典算法 A3C
论文摘要:
We propose a conceptually simple and
lightweight framework for deep reinforcement
learning that uses asynchronous gradient
descent for optimization of deep neural network
controllers. We present asynchronous variants of
four standard reinforcement learning algorithms
and show that parallel actor-learners have a
stabilizing effect on training allowing all four
methods to successfully train neural network
controllers. The best performing method, an
asynchronous variant of actor-critic, surpasses
the current state-of-the-art on the Atari domain
while training for half the time on a single
multi-core CPU instead of a GPU. Furthermore,
we show that asynchronous actor-critic succeeds
on a wide variety of continuous motor control
problems as well as on a new task of navigating
random 3D mazes using a visual input.