Atari 2600 Deep-Q Learning AI

Rl-atari implements a set of reinforcement learning algorithms to learn control policies for playing Atari games.
This implementation recreates the Deep Q-Networks (DQN) model and configuration as proposed first by Google DeepMind.
It combines computer vision (using convolutional neural networks) with reinforcement learning algorithms (Q-learning), in order to train an agent that autonomously (read: without supervision) learns how to play a computer game from only pixel and reward inputs. The following deep-RL variants and features are built out, the integrated combination of which is known as the Rainbow agent.

Training these agents takes a very long time, bringing with it an extended turnaround to obtain (new) results. New progress and figures will be pushed as they come in.

Enabled	Algorithm	Reference
✔️	Deep Q-Learning (DQN)	https://arxiv.org/abs/1312.5602 https://www.nature.com/articles/nature14236
✔️	Double Q-Learning (DDQN)	https://arxiv.org/abs/1509.06461
✔️	Prioritized Experience Replay (PER)	https://arxiv.org/abs/1511.05952
✔️	Multi-step Learning	https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
✔️	Dueling Network Architecture	https://arxiv.org/abs/1511.06581
✔️	Noisy Network	https://arxiv.org/abs/1706.10295
✔️	Distributional Network (C51)	https://arxiv.org/abs/1707.06887
	Async Advantage Actor-Critic (A3C)	https://arxiv.org/abs/1602.01783
✔️	Rainbow Agent	https://arxiv.org/abs/1710.02298

Requirements

Virtual env and deps were managed with Conda and Poetry resp. Conda builds environment from the yaml file. Poetry installs the exact requirements and dependencies from the .lock and .toml files.

Visualization

1. Agent play

Custom functions are written to output comprehensive animations of agent playing episodes during training, with details such as Q values, Value and Advantage stream values, Z distribution and activation maps. Animation is generated from functions in utils.py.

Example 1: Dueling agent

The following shows a (partially) trained DQN agent playing Space Invaders attaining a training score of 2145. Displayed are:

Left Original Atari frame.
Right Preprocessed frame as viewed by agent.
- Overlayed on this are Conv. Neural Net saliency maps to display pixel attribution in agent decision making.
- Dueling network was used, where value stream is displayed in blue and advantage in red.
Middle Max Q-value series as estimated by agent, along with the Value and Advantage stream values.
Bottom Q-values for each action.

Note:

Animation is sped up to 60 fps.
saliency/activation maps can often be rather noisy, however some particular attentions stand out, such as agent focussing on the bonus (round) flying saucer.

6929_train_2115.0.mp4

Example 2: Distibutional agent

Idem ditto as example 1, however the Q-values have been replaced with a categorical value distribution.

23744_train_2460.0.mp4

2. Train statistics

Visual inspection of metrics during training is imperative to measure model performance, analyze agent behavior, and predict progress and runtimes. First plot below displays statistics aggregated per train episode, showing total reward score, episode length (in terms of played frames), mean of max Q values, mean TD-errors and runtime. The second plots show distribution of actions taken over time.

Plots are generated from functions in utils.py.

Implementation

AI agents are build on top of convolutional neural networks coded up in Tensorflow 2. Both a small (2 conv layers) and a large network (3 conv layers) are supported.

Example 1: a large network with Dueling (V and A) streams, as well as noisy layers.

Example 2: of a large network with Dueling (V and A) streams, as well as distributional (C51) output.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
notebooks		notebooks
script		script
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atari 2600 Deep-Q Learning AI

Requirements

Visualization

1. Agent play

2. Train statistics

Implementation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Atari 2600 Deep-Q Learning AI

Requirements

Visualization

1. Agent play

2. Train statistics

Implementation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages