Skip to content

Djoren/rl-atari

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atari 2600 Deep-Q Learning AI

Rl-atari implements a set of reinforcement learning algorithms to learn control policies for playing Atari games.
This implementation recreates the Deep Q-Networks (DQN) model and configuration as proposed first by Google DeepMind.
It combines computer vision (using convolutional neural networks) with reinforcement learning algorithms (Q-learning), in order to train an agent that autonomously (read: without supervision) learns how to play a computer game from only pixel and reward inputs. The following deep-RL variants and features are built out, the integrated combination of which is known as the Rainbow agent.

Training these agents takes a very long time, bringing with it an extended turnaround to obtain (new) results. New progress and figures will be pushed as they come in.

Enabled Algorithm Reference
✔️ Deep Q-Learning (DQN) https://arxiv.org/abs/1312.5602 https://www.nature.com/articles/nature14236
✔️ Double Q-Learning (DDQN) https://arxiv.org/abs/1509.06461
✔️ Prioritized Experience Replay (PER) https://arxiv.org/abs/1511.05952
✔️ Multi-step Learning https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
✔️ Dueling Network Architecture https://arxiv.org/abs/1511.06581
✔️ Noisy Network https://arxiv.org/abs/1706.10295
✔️ Distributional Network (C51) https://arxiv.org/abs/1707.06887
Async Advantage Actor-Critic (A3C) https://arxiv.org/abs/1602.01783
✔️ Rainbow Agent https://arxiv.org/abs/1710.02298

Requirements

Virtual env and deps were managed with Conda and Poetry resp. Conda builds environment from the yaml file. Poetry installs the exact requirements and dependencies from the .lock and .toml files.

Visualization

1. Agent play

Custom functions are written to output comprehensive animations of agent playing episodes during training, with details such as Q values, Value and Advantage stream values, Z distribution and activation maps. Animation is generated from functions in utils.py.

Example 1: Dueling agent

The following shows a (partially) trained DQN agent playing Space Invaders attaining a training score of 2145. Displayed are:

  1. Left Original Atari frame.
  2. Right Preprocessed frame as viewed by agent.
    • Overlayed on this are Conv. Neural Net saliency maps to display pixel attribution in agent decision making.
    • Dueling network was used, where value stream is displayed in blue and advantage in red.
  3. Middle Max Q-value series as estimated by agent, along with the Value and Advantage stream values.
  4. Bottom Q-values for each action.

Note:

  • Animation is sped up to 60 fps.
  • saliency/activation maps can often be rather noisy, however some particular attentions stand out, such as agent focussing on the bonus (round) flying saucer.
6929_train_2115.0.mp4

Example 2: Distibutional agent

Idem ditto as example 1, however the Q-values have been replaced with a categorical value distribution.

23744_train_2460.0.mp4

2. Train statistics

Visual inspection of metrics during training is imperative to measure model performance, analyze agent behavior, and predict progress and runtimes. First plot below displays statistics aggregated per train episode, showing total reward score, episode length (in terms of played frames), mean of max Q values, mean TD-errors and runtime. The second plots show distribution of actions taken over time.

Plots are generated from functions in utils.py.

res

actions

Implementation

AI agents are build on top of convolutional neural networks coded up in Tensorflow 2. Both a small (2 conv layers) and a large network (3 conv layers) are supported.

Example 1: a large network with Dueling (V and A) streams, as well as noisy layers.

output2

Example 2: of a large network with Dueling (V and A) streams, as well as distributional (C51) output.

dddd

About

Reinforcement learning for Atari games

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages