[![Github](https://img.shields.io/github/stars/lab-ml/nn?style=social)](https://github.com/lab-ml/nn)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/rl/ppo/experiment.ipynb)                    

## Proximal Policy Optimization - PPO

This is an experiment training an agent to play Atari Breakout game using  Proximal Policy Optimization - PPO

Install the `labml-nn` package

In [None]:
!pip install labml-nn

Imports

In [None]:
from labml import experiment
from labml.configs import FloatDynamicHyperParam, IntDynamicHyperParam
from labml_nn.rl.ppo.experiment import Trainer

Create an experiment

In [None]:
experiment.create(name="ppo")

### Configurations

`IntDynamicHyperParam` and `FloatDynamicHyperParam` are dynamic hyper parameters
that you can change while the experiment is running.

In [None]:
configs = {
    # number of updates
    'updates': 10000,
    # number of epochs to train the model with sampled data
    'epochs': IntDynamicHyperParam(8),
    # number of worker processes
    'n_workers': 8,
    # number of steps to run on each process for a single update
    'worker_steps': 128,
    # number of mini batches
    'batches': 4,
    # Value loss coefficient
    'value_loss_coef': FloatDynamicHyperParam(0.5),
    # Entropy bonus coefficient
    'entropy_bonus_coef': FloatDynamicHyperParam(0.01),
    # Clip range
    'clip_range': FloatDynamicHyperParam(0.1),
    # Learning rate
    'learning_rate': FloatDynamicHyperParam(2.5e-4, (0, 1e-3)),
}

Set experiment configurations

In [None]:
experiment.configs(configs)

Create trainer

In [None]:
trainer = Trainer(
    updates=configs['updates'],
    epochs=configs['epochs'],
    n_workers=configs['n_workers'],
    worker_steps=configs['worker_steps'],
    batches=configs['batches'],
    value_loss_coef=configs['value_loss_coef'],
    entropy_bonus_coef=configs['entropy_bonus_coef'],
    clip_range=configs['clip_range'],
    learning_rate=configs['learning_rate'],
)

Start the experiment and run the training loop.

In [None]:
with experiment.start():
    trainer.run_training_loop()