Files
Varuna Jayasiri 391fa39167 cleanup notebooks
2024-06-24 16:17:09 +05:30

239 lines
5.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "AYV_dMVDxyc2"
},
"source": [
"[![Github](https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social)](https://github.com/labmlai/annotated_deep_learning_paper_implementations)\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/ppo/experiment.ipynb) \n",
"\n",
"## Proximal Policy Optimization - PPO\n",
"\n",
"This is an experiment training an agent to play Atari Breakout game using Proximal Policy Optimization - PPO"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AahG_i2y5tY9"
},
"source": [
"Install the `labml-nn` package"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZCzmCrAIVg0L",
"outputId": "028e759e-0c9f-472e-b4b8-fdcf3e4604ee"
},
"source": [
"!pip install labml-nn"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add Atari ROMs (Doesn't work without this in Google Colab)"
]
},
{
"cell_type": "code",
"metadata": {},
"source": [
"! wget http://www.atarimania.com/roms/Roms.rar\n",
"! mkdir /content/ROM/\n",
"! unrar e /content/Roms.rar /content/ROM/\n",
"! python -m atari_py.import_roms /content/ROM/"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "SE2VUQ6L5zxI"
},
"source": [
"Imports"
]
},
{
"cell_type": "code",
"metadata": {
"id": "0hJXx_g0wS2C"
},
"source": [
"from labml import experiment\n",
"from labml.configs import FloatDynamicHyperParam, IntDynamicHyperParam\n",
"from labml_nn.rl.ppo.experiment import Trainer"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "Lpggo0wM6qb-"
},
"source": [
"Create an experiment"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bFcr9k-l4cAg"
},
"source": [
"experiment.create(name=\"ppo\")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "-OnHLi626tJt"
},
"source": [
"### Configurations\n",
"\n",
"`IntDynamicHyperParam` and `FloatDynamicHyperParam` are dynamic hyper parameters\n",
"that you can change while the experiment is running."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Piz0c5f44hRo"
},
"source": [
"configs = {\n",
" # number of updates\n",
" 'updates': 10000,\n",
" # number of epochs to train the model with sampled data\n",
" 'epochs': IntDynamicHyperParam(8),\n",
" # number of worker processes\n",
" 'n_workers': 8,\n",
" # number of steps to run on each process for a single update\n",
" 'worker_steps': 128,\n",
" # number of mini batches\n",
" 'batches': 4,\n",
" # Value loss coefficient\n",
" 'value_loss_coef': FloatDynamicHyperParam(0.5),\n",
" # Entropy bonus coefficient\n",
" 'entropy_bonus_coef': FloatDynamicHyperParam(0.01),\n",
" # Clip range\n",
" 'clip_range': FloatDynamicHyperParam(0.1),\n",
" # Learning rate\n",
" 'learning_rate': FloatDynamicHyperParam(2.5e-4, (0, 1e-3)),\n",
"}"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "wwMzCqpD6vkL"
},
"source": [
"Set experiment configurations"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 17
},
"id": "e6hmQhTw4nks",
"outputId": "0e978879-5dcd-4140-ec53-24a3fbd547de"
},
"source": [
"experiment.configs(configs)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "qYQCFt_JYsjd"
},
"source": [
"Create trainer"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8LB7XVViYuPG"
},
"source": [
"trainer = Trainer(**configs)"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"metadata": {
"id": "KJZRf8527GxL"
},
"source": [
"Start the experiment and run the training loop."
]
},
{
"cell_type": "code",
"metadata": {
"id": "aIAWo7Fw5DR8"
},
"source": [
"with experiment.start():\n",
" trainer.run_training_loop()"
],
"outputs": [],
"execution_count": null
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "Proximal Policy Optimization - PPO",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}