{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "AYV_dMVDxyc2" }, "source": [ "[![Github](https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social)](https://github.com/labmlai/annotated_deep_learning_paper_implementations)\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/ppo/experiment.ipynb) \n", "\n", "## Proximal Policy Optimization - PPO\n", "\n", "This is an experiment training an agent to play Atari Breakout game using Proximal Policy Optimization - PPO" ] }, { "cell_type": "markdown", "metadata": { "id": "AahG_i2y5tY9" }, "source": [ "Install the `labml-nn` package" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZCzmCrAIVg0L", "outputId": "028e759e-0c9f-472e-b4b8-fdcf3e4604ee" }, "source": [ "!pip install labml-nn" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add Atari ROMs (Doesn't work without this in Google Colab)" ] }, { "cell_type": "code", "metadata": {}, "source": [ "! wget http://www.atarimania.com/roms/Roms.rar\n", "! mkdir /content/ROM/\n", "! unrar e /content/Roms.rar /content/ROM/\n", "! python -m atari_py.import_roms /content/ROM/" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "SE2VUQ6L5zxI" }, "source": [ "Imports" ] }, { "cell_type": "code", "metadata": { "id": "0hJXx_g0wS2C" }, "source": [ "from labml import experiment\n", "from labml.configs import FloatDynamicHyperParam, IntDynamicHyperParam\n", "from labml_nn.rl.ppo.experiment import Trainer" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "Lpggo0wM6qb-" }, "source": [ "Create an experiment" ] }, { "cell_type": "code", "metadata": { "id": "bFcr9k-l4cAg" }, "source": [ "experiment.create(name=\"ppo\")" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "-OnHLi626tJt" }, "source": [ "### Configurations\n", "\n", "`IntDynamicHyperParam` and `FloatDynamicHyperParam` are dynamic hyper parameters\n", "that you can change while the experiment is running." ] }, { "cell_type": "code", "metadata": { "id": "Piz0c5f44hRo" }, "source": [ "configs = {\n", " # number of updates\n", " 'updates': 10000,\n", " # number of epochs to train the model with sampled data\n", " 'epochs': IntDynamicHyperParam(8),\n", " # number of worker processes\n", " 'n_workers': 8,\n", " # number of steps to run on each process for a single update\n", " 'worker_steps': 128,\n", " # number of mini batches\n", " 'batches': 4,\n", " # Value loss coefficient\n", " 'value_loss_coef': FloatDynamicHyperParam(0.5),\n", " # Entropy bonus coefficient\n", " 'entropy_bonus_coef': FloatDynamicHyperParam(0.01),\n", " # Clip range\n", " 'clip_range': FloatDynamicHyperParam(0.1),\n", " # Learning rate\n", " 'learning_rate': FloatDynamicHyperParam(2.5e-4, (0, 1e-3)),\n", "}" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "wwMzCqpD6vkL" }, "source": [ "Set experiment configurations" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 17 }, "id": "e6hmQhTw4nks", "outputId": "0e978879-5dcd-4140-ec53-24a3fbd547de" }, "source": [ "experiment.configs(configs)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "qYQCFt_JYsjd" }, "source": [ "Create trainer" ] }, { "cell_type": "code", "metadata": { "id": "8LB7XVViYuPG" }, "source": [ "trainer = Trainer(**configs)" ], "outputs": [], "execution_count": null }, { "cell_type": "markdown", "metadata": { "id": "KJZRf8527GxL" }, "source": [ "Start the experiment and run the training loop." ] }, { "cell_type": "code", "metadata": { "id": "aIAWo7Fw5DR8" }, "source": [ "with experiment.start():\n", " trainer.run_training_loop()" ], "outputs": [], "execution_count": null } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "name": "Proximal Policy Optimization - PPO", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }