diff --git a/notebooks/bonus-unit1/bonus-unit1.ipynb b/notebooks/bonus-unit1/bonus-unit1.ipynb index 93db85a..5725765 100644 --- a/notebooks/bonus-unit1/bonus-unit1.ipynb +++ b/notebooks/bonus-unit1/bonus-unit1.ipynb @@ -199,9 +199,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python 3.10.11\n" + ] + } + ], "source": [ "# Colab's Current Python Version (Incompatible with ML-Agents)\n", "!python --version" @@ -600,7 +608,7 @@ }, "outputs": [], "source": [ - "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy2\" --repo-id=\"ThomasSimonini/ppo-Huggy\" --commit-message=\"Huggy\"" + "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy\" --repo-id=\"turbo-maikol/rl-course-bu1\" --commit-message=\"Huggy\"" ] }, { @@ -691,11 +699,21 @@ }, "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3", + "display_name": "rl-env-bu1", + "language": "python", "name": "python3" }, "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" } }, "nbformat": 4, diff --git a/notebooks/bonus-unit1/bonus_unit1.ipynb b/notebooks/bonus-unit1/bonus_unit1.ipynb deleted file mode 100644 index a85452b..0000000 --- a/notebooks/bonus-unit1/bonus_unit1.ipynb +++ /dev/null @@ -1,695 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "view-in-github" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2D3NL_e4crQv" - }, - "source": [ - "# Bonus Unit 1: Let's train Huggy the Dog 🐶 to fetch a stick" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "FMYrDriDujzX" - }, - "source": [ - "\"Bonus\n", - "\n", - "In this notebook, we'll reinforce what we learned in the first Unit by **teaching Huggy the Dog to fetch the stick and then play with it directly in your browser**\n", - "\n", - "⬇️ Here is an example of what **you will achieve at the end of the unit.** ⬇️ (launch ▶ to see)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "PnVhs1yYNyUF" - }, - "outputs": [], - "source": [ - "%%html\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "x7oR6R-ZIbeS" - }, - "source": [ - "### The environment 🎮\n", - "\n", - "- Huggy the Dog, an environment created by [Thomas Simonini](https://twitter.com/ThomasSimonini) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit)\n", - "\n", - "### The library used 📚\n", - "\n", - "- [MLAgents](https://github.com/Unity-Technologies/ml-agents)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "60yACvZwO0Cy" - }, - "source": [ - "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Oks-ETYdO2Dc" - }, - "source": [ - "## Objectives of this notebook 🏆\n", - "\n", - "At the end of the notebook, you will:\n", - "\n", - "- Understand **the state space, action space and reward function used to train Huggy**.\n", - "- **Train your own Huggy** to fetch the stick.\n", - "- Be able to play **with your trained Huggy directly in your browser**.\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mUlVrqnBv2o1" - }, - "source": [ - "## This notebook is from Deep Reinforcement Learning Course\n", - "\"Deep" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "pAMjaQpHwB_s" - }, - "source": [ - "In this free course, you will:\n", - "\n", - "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n", - "- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n", - "- 🤖 Train **agents in unique environments**\n", - "\n", - "And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n", - "\n", - "Don’t forget to **sign up to the course** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**\n", - "\n", - "\n", - "The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6r7Hl0uywFSO" - }, - "source": [ - "## Prerequisites 🏗️\n", - "\n", - "Before diving into the notebook, you need to:\n", - "\n", - "🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1\n", - "\n", - "🔲 📚 **Read the introduction to Huggy** by doing Bonus Unit 1" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DssdIjk_8vZE" - }, - "source": [ - "## Set the GPU 💪\n", - "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", - "\n", - "\"GPU" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sTfCXHy68xBv" - }, - "source": [ - "- `Hardware Accelerator > GPU`\n", - "\n", - "\"GPU" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clone the repository 🔽\n", - "\n", - "- We need to clone the repository, that contains **ML-Agents.**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%capture\n", - "# Clone the repository (can take 3min)\n", - "!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setup the Virtual Environment 🔽\n", - "- In order for the **ML-Agents** to run successfully in Colab, Colab's Python version must meet the library's Python requirements.\n", - "\n", - "- We can check for the supported Python version under the `python_requires` parameter in the `setup.py` files. These files are required to set up the **ML-Agents** library for use and can be found in the following locations:\n", - " - `/content/ml-agents/ml-agents/setup.py`\n", - " - `/content/ml-agents/ml-agents-envs/setup.py`\n", - "\n", - "- Colab's Current Python version(can be checked using `!python --version`) doesn't match the library's `python_requires` parameter, as a result installation may silently fail and lead to errors like these, when executing the same commands later:\n", - " - `/bin/bash: line 1: mlagents-learn: command not found`\n", - " - `/bin/bash: line 1: mlagents-push-to-hf: command not found`\n", - "\n", - "- To resolve this, we'll create a virtual environment with a Python version compatible with the **ML-Agents** library.\n", - "\n", - "`Note:` *For future compatibility, always check the `python_requires` parameter in the installation files and set your virtual environment to the maximum supported Python version in the given below script if the Colab's Python version is not compatible*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Colab's Current Python Version (Incompatible with ML-Agents)\n", - "!python --version" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Install virtualenv and create a virtual environment\n", - "!pip install virtualenv\n", - "!virtualenv myenv\n", - "\n", - "# Download and install Miniconda\n", - "!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n", - "!chmod +x Miniconda3-latest-Linux-x86_64.sh\n", - "!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local\n", - "\n", - "# Activate Miniconda and install Python ver 3.10.12\n", - "!source /usr/local/bin/activate\n", - "!conda install -q -y --prefix /usr/local python=3.10.12 ujson # Specify the version here\n", - "\n", - "# Set environment variables for Python and conda paths\n", - "!export PYTHONPATH=/usr/local/lib/python3.10/site-packages/\n", - "!export CONDA_PREFIX=/usr/local/envs/myenv" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Python Version in New Virtual Environment (Compatible with ML-Agents)\n", - "!python --version" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Installing the dependencies 🔽" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%capture\n", - "# Go inside the repository and install the package (can take 3min)\n", - "%cd ml-agents\n", - "!pip3 install -e ./ml-agents-envs\n", - "!pip3 install -e ./ml-agents" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HRY5ufKUKfhI" - }, - "source": [ - "## Download and move the environment zip file in `./trained-envs-executables/linux/`\n", - "\n", - "- Our environment executable is in a zip file.\n", - "- We need to download it and place it to `./trained-envs-executables/linux/`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "C9Ls6_6eOKiA" - }, - "outputs": [], - "source": [ - "!mkdir ./trained-envs-executables\n", - "!mkdir ./trained-envs-executables/linux" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "IHh_LXsRrrbM" - }, - "source": [ - "We downloaded the file Huggy.zip from https://github.com/huggingface/Huggy using `wget`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8xNAD1tRpy0_" - }, - "outputs": [], - "source": [ - "!wget \"https://github.com/huggingface/Huggy/raw/main/Huggy.zip\" -O ./trained-envs-executables/linux/Huggy.zip" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "8FPx0an9IAwO" - }, - "outputs": [], - "source": [ - "%%capture\n", - "!unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nyumV5XfPKzu" - }, - "source": [ - "Make sure your file is accessible" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EdFsLJ11JvQf" - }, - "outputs": [], - "source": [ - "!chmod -R 755 ./trained-envs-executables/linux/Huggy" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "dYKVj8yUvj55" - }, - "source": [ - "## Let's recap how this environment works\n", - "\n", - "### The State Space: what Huggy \"perceives.\"\n", - "\n", - "Huggy doesn't \"see\" his environment. Instead, we provide him information about the environment:\n", - "\n", - "- The target (stick) position\n", - "- The relative position between himself and the target\n", - "- The orientation of his legs.\n", - "\n", - "Given all this information, Huggy **can decide which action to take next to fulfill his goal**.\n", - "\n", - "\"Huggy\"\n", - "\n", - "\n", - "### The Action Space: what moves Huggy can do\n", - "\"Huggy\n", - "\n", - "**Joint motors drive huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.\n", - "\n", - "### The Reward Function\n", - "\n", - "The reward function is designed so that **Huggy will fulfill his goal** : fetch the stick.\n", - "\n", - "Remember that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.\n", - "\n", - "Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.\n", - "\n", - "Our reward function:\n", - "\n", - "\"Huggy\n", - "\n", - "- *Orientation bonus*: we **reward him for getting close to the target**.\n", - "- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.\n", - "- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.\n", - "- *Getting to the target reward*: we reward Huggy for **reaching the target**." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NAuEq32Mwvtz" - }, - "source": [ - "## Create the Huggy config file\n", - "\n", - "- In ML-Agents, you define the **training hyperparameters into config.yaml files.**\n", - "\n", - "- For the scope of this notebook, we're not going to modify the hyperparameters, but if you want to try as an experiment, you should also try to modify some other hyperparameters, Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md).\n", - "\n", - "- But we need to create a config file for Huggy.\n", - "\n", - " - To do that click on Folder logo on the left of your screen.\n", - "\n", - " \"Create\n", - "\n", - " - Go to `/content/ml-agents/config/ppo`\n", - " - Right mouse click and create a new file called `Huggy.yaml`\n", - "\n", - " \"Create\n", - "\n", - "- Copy and paste the content below 🔽" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "loQ0N5jhXW71" - }, - "outputs": [], - "source": [ - "behaviors:\n", - " Huggy:\n", - " trainer_type: ppo\n", - " hyperparameters:\n", - " batch_size: 2048\n", - " buffer_size: 20480\n", - " learning_rate: 0.0003\n", - " beta: 0.005\n", - " epsilon: 0.2\n", - " lambd: 0.95\n", - " num_epoch: 3\n", - " learning_rate_schedule: linear\n", - " network_settings:\n", - " normalize: true\n", - " hidden_units: 512\n", - " num_layers: 3\n", - " vis_encode_type: simple\n", - " reward_signals:\n", - " extrinsic:\n", - " gamma: 0.995\n", - " strength: 1.0\n", - " checkpoint_interval: 200000\n", - " keep_checkpoints: 15\n", - " max_steps: 2e6\n", - " time_horizon: 1000\n", - " summary_freq: 50000" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "oakN7UHwXdCX" - }, - "source": [ - "- Don't forget to save the file!" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "r9wv5NYGw-05" - }, - "source": [ - "- **In the case you want to modify the hyperparameters**, in Google Colab notebook, you can click here to open the config.yaml: `/content/ml-agents/config/ppo/Huggy.yaml`\n", - "\n", - "- For instance **if you want to save more models during the training** (for now, we save every 200,000 training timesteps). You need to modify:\n", - " - `checkpoint_interval`: The number of training timesteps collected between each checkpoint.\n", - " - `keep_checkpoints`: The maximum number of model checkpoints to keep.\n", - "\n", - "=> Just keep in mind that **decreasing the `checkpoint_interval` means more models to upload to the Hub and so a longer uploading time**\n", - "We’re now ready to train our agent 🔥." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "f9fI555bO12v" - }, - "source": [ - "## Train our agent\n", - "\n", - "To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**\n", - "\n", - "\"ml\n", - "\n", - "With ML Agents, we run a training script. We define four parameters:\n", - "\n", - "1. `mlagents-learn `: the path where the hyperparameter config file is.\n", - "2. `--env`: where the environment executable is.\n", - "3. `--run-id`: the name you want to give to your training run id.\n", - "4. `--no-graphics`: to not launch the visualization during the training.\n", - "\n", - "Train the model and use the `--resume` flag to continue training in case of interruption.\n", - "\n", - "> It will fail first time when you use `--resume`, try running the block again to bypass the error.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lN32oWF8zPjs" - }, - "source": [ - "The training will take 30 to 45min depending on your machine (don't forget to **set up a GPU**), go take a ☕️you deserve it 🤗." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "bS-Yh1UdHfzy" - }, - "outputs": [], - "source": [ - "!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id=\"Huggy2\" --no-graphics" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5Vue94AzPy1t" - }, - "source": [ - "## Push the agent to the 🤗 Hub\n", - "\n", - "- Now that we trained our agent, we’re **ready to push it to the Hub to be able to play with Huggy on your browser🔥.**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "izT6FpgNzZ6R" - }, - "source": [ - "To be able to share your model with the community there are three more steps to follow:\n", - "\n", - "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n", - "\n", - "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n", - "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n", - "\n", - "\"Create\n", - "\n", - "- Copy the token\n", - "- Run the cell below and paste the token" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "rKt2vsYoK56o" - }, - "outputs": [], - "source": [ - "from huggingface_hub import notebook_login\n", - "notebook_login()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ew59mK19zjtN" - }, - "source": [ - "If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Xi0y_VASRzJU" - }, - "source": [ - "Then, we simply need to run `mlagents-push-to-hf`.\n", - "\n", - "\"ml" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "KK4fPfnczunT" - }, - "source": [ - "And we define 4 parameters:\n", - "\n", - "1. `--run-id`: the name of the training run id.\n", - "2. `--local-dir`: where the agent was saved, it’s results/, so in my case results/First Training.\n", - "3. `--repo-id`: the name of the Hugging Face repo you want to create or update. It’s always /\n", - "If the repo does not exist **it will be created automatically**\n", - "4. `--commit-message`: since HF repos are git repository you need to define a commit message." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "dGEFAIboLVc6" - }, - "outputs": [], - "source": [ - "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy2\" --repo-id=\"ThomasSimonini/ppo-Huggy\" --commit-message=\"Huggy\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yborB0850FTM" - }, - "source": [ - "Else, if everything worked you should have this at the end of the process(but with a different url 😆) :\n", - "\n", - "\n", - "\n", - "```\n", - "Your model is pushed to the hub. You can view your model here: https://huggingface.co/ThomasSimonini/ppo-Huggy\n", - "```\n", - "\n", - "It’s the link to your model repository. The repository contains a model card that explains how to use the model, your Tensorboard logs and your config file. **What’s awesome is that it’s a git repository, which means you can have different commits, update your repository with a new push, open Pull Requests, etc.**\n", - "\n", - "\"ml" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5Uaon2cg0NrL" - }, - "source": [ - "But now comes the best: **being able to play with Huggy online 👀.**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "VMc4oOsE0QiZ" - }, - "source": [ - "## Play with your Huggy 🐕\n", - "\n", - "This step is the simplest:\n", - "\n", - "- Open the game Huggy in your browser: https://huggingface.co/spaces/ThomasSimonini/Huggy\n", - "\n", - "- Click on Play with my Huggy model\n", - "\n", - "\"load-huggy\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Djs8c5rR0Z8a" - }, - "source": [ - "1. In step 1, choose your model repository which is the model id (in my case ThomasSimonini/ppo-Huggy).\n", - "\n", - "2. In step 2, **choose what model you want to replay**:\n", - " - I have multiple ones, since we saved a model every 500000 timesteps.\n", - " - But since I want the more recent, I choose `Huggy.onnx`\n", - "\n", - "👉 What’s nice **is to try with different models steps to see the improvement of the agent.**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "PI6dPWmh064H" - }, - "source": [ - "Congrats on finishing this bonus unit!\n", - "\n", - "You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to spread the love by sharing Huggy with your friends 🤗**. And if you share about it on social media, **please tag us @huggingface and me @simoninithomas**\n", - "\n", - "\"Huggy\n", - "\n", - "\n", - "## Keep Learning, Stay awesome 🤗" - ] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "include_colab_link": true, - "private_outputs": true, - "provenance": [] - }, - "gpuClass": "standard", - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb index 06d62b0..3605d63 100644 --- a/notebooks/unit1/unit1.ipynb +++ b/notebooks/unit1/unit1.ipynb @@ -284,11 +284,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "id": "BE5JWP5rQIKf" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "KeyboardInterrupt\n", + "\n" + ] + } + ], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", @@ -316,11 +326,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": { "id": "cygWLPGsEQ0m" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.\n", + "Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.\n", + "Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.\n", + "See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit1/venv-u1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], "source": [ "import gymnasium\n", "\n", @@ -353,7 +376,7 @@ "\n", "Let's look at an example, but first let's recall the RL loop.\n", "\n", - "\"The" + "\"The" ] }, { @@ -396,11 +419,59 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "id": "w7vOFlpA_ONz" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Action taken: 2\n", + " - reward: 3.0692112001439513\n", + "Action taken: 3\n", + " - reward: -2.0283326535021318\n", + "Action taken: 0\n", + " - reward: -2.013109629392062\n", + "Action taken: 1\n", + " - reward: -1.8387614642986694\n", + "Action taken: 3\n", + " - reward: -1.9646071472228346\n", + "Action taken: 2\n", + " - reward: 1.724712789874087\n", + "Action taken: 3\n", + " - reward: -2.0772821745045733\n", + "Action taken: 3\n", + " - reward: -2.263394443942046\n", + "Action taken: 2\n", + " - reward: 1.03422570110298\n", + "Action taken: 1\n", + " - reward: -1.9686919634781634\n", + "Action taken: 0\n", + " - reward: -1.880365204866706\n", + "Action taken: 3\n", + " - reward: -2.1378125038369533\n", + "Action taken: 2\n", + " - reward: 0.23407670781683693\n", + "Action taken: 0\n", + " - reward: -2.0440816329574147\n", + "Action taken: 0\n", + " - reward: -1.9836184981424765\n", + "Action taken: 2\n", + " - reward: 1.1548347711850055\n", + "Action taken: 1\n", + " - reward: -1.7956347801317054\n", + "Action taken: 0\n", + " - reward: -1.7729850216231284\n", + "Action taken: 2\n", + " - reward: 1.9191545079788284\n", + "Action taken: 3\n", + " - reward: -2.0884827451743875\n", + "Total reward: -18.720944184971565\n" + ] + } + ], "source": [ "import gymnasium as gym\n", "\n", @@ -410,6 +481,7 @@ "# Then we reset this environment\n", "observation, info = env.reset()\n", "\n", + "total_reward = 0\n", "for _ in range(20):\n", " # Take a random action\n", " action = env.action_space.sample()\n", @@ -418,13 +490,16 @@ " # Do this action in the environment and get\n", " # next_state, reward, terminated, truncated and info\n", " observation, reward, terminated, truncated, info = env.step(action)\n", - "\n", + " print(f\" - reward: {reward}\")\n", + " total_reward += reward\n", " # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n", + " \n", " if terminated or truncated:\n", " # Reset the environment\n", " print(\"Environment is reset\")\n", " observation, info = env.reset()\n", "\n", + "print(\"Total reward:\", total_reward)\n", "env.close()" ] }, @@ -450,6 +525,29 @@ "---\n" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The state is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.\n", + "\n", + "```\n", + "Box([ -2.5 -2.5 -10. -10. -6.2831855 -10. -0. -0. ], [ 2.5 2.5 10. 10. 6.2831855 10. 1. 1. ], (8,), float32)\n", + "Box(\n", + "[ \n", + " x -2.5 y -2.5 \n", + " vx -10. vy -10. \n", + " lv -6.2831855 \n", + " av -10. \n", + " ll 1. \n", + " rl 1. \n", + "],\n", + "size (8,)\n", + ", float32)\n", + "```\n", + "\n" + ] + }, { "cell_type": "markdown", "metadata": { @@ -461,11 +559,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": { "id": "ZNPG0g_UGCfh" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "_____OBSERVATION SPACE_____ \n", + "\n", + "Observation Space Shape (8,)\n", + "Sample observation [ 53.21532 -87.118256 -0.84611297 3.4404945 0.7532178\n", + " 2.645675 0.9980984 0.40649492]\n" + ] + } + ], "source": [ "# We create our environment with gym.make(\"\")\n", "env = gym.make(\"LunarLander-v2\")\n", @@ -494,11 +604,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { "id": "We5WqOBGLoSm" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " _____ACTION SPACE_____ \n", + "\n", + "Action Space Shape 4\n", + "Action Space Sample 3\n" + ] + } + ], "source": [ "print(\"\\n _____ACTION SPACE_____ \\n\")\n", "print(\"Action Space Shape\", env.action_space.n)\n", @@ -549,7 +671,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": { "id": "99hqQ_etEy1N" }, @@ -629,16 +751,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "id": "nxI6hT1GE4-A" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using cuda device\n" + ] + } + ], "source": [ "# TODO: Define a PPO MlpPolicy architecture\n", "# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n", "# if we had frames as input we would use CnnPolicy\n", - "model =" + "model = PPO(\"MlpPolicy\", env=env, verbose=1)" ] }, { @@ -652,11 +782,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { "id": "543OHYDfcjK4" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using cuda device\n" + ] + } + ], "source": [ "# SOLUTION\n", "# We added some parameters to accelerate the training\n", @@ -669,7 +807,9 @@ " gamma = 0.999,\n", " gae_lambda = 0.98,\n", " ent_coef = 0.01,\n", - " verbose=1)" + " verbose=1\n", + ")\n", + "model_name = \"ppo-LunarLander-v2\"\n" ] }, { @@ -685,16 +825,16 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": { "id": "qKnYkNiVp89p" }, "outputs": [], "source": [ "# TODO: Train it for 1,000,000 timesteps\n", - "\n", + "model.learn(total_timesteps=1_000_000)\n", "# TODO: Specify file name for model and save the model to file\n", - "model_name = \"ppo-LunarLander-v2\"\n" + "model.save(model_name)" ] }, { @@ -741,21 +881,238 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { "id": "yRpno0glsADy" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mean reward: 252.266642 +/- 34.661009936952354\n" + ] + } + ], "source": [ "# TODO: Evaluate the agent\n", "# Create a new environment for evaluation\n", - "eval_env =\n", + "eval_env = Monitor(gym.make(\"LunarLander-v2\", render_mode='rgb_array'))\n", "\n", + "# Load model\n", + "model = PPO.load(model_name, env=eval_env)\n", "# Evaluate the model with 10 evaluation episodes and deterministic=True\n", - "mean_reward, std_reward =\n", + "mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, deterministic=True)\n", "\n", "# Print the results\n", - "\n" + "print(f\"Mean reward: {mean_reward} +/- {std_reward}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 250\n", + " - frame: 500\n", + " - frame: 750\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 1000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 250\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 250\n", + " - frame: 500\n", + " - frame: 750\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 1000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 250\n", + " - frame: 250\n", + " - frame: 500\n", + " - frame: 750\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 1000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - frame: 250\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n", + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + } + ], + "source": [ + "from stable_baselines3.common.vec_env import DummyVecEnv\n", + "from stable_baselines3.common.monitor import Monitor\n", + "import imageio\n", + "import gym\n", + "import numpy as np\n", + "np.bool8 = np.bool_\n", + "\n", + "for i in range(30):\n", + " eval_env = DummyVecEnv([lambda: Monitor(gym.make(\"LunarLander-v2\", render_mode=\"rgb_array\"))])\n", + "\n", + " frames = []\n", + " obs = eval_env.reset()\n", + " done = False\n", + " while not done:\n", + " action, _ = model.predict(obs, deterministic=False)\n", + " obs, reward, done, info = eval_env.step(action)\n", + " done = done[0] # VecEnv returns a list\n", + " img = eval_env.envs[0].render() # returns RGB array\n", + " frames.append(img)\n", + " if len(frames) % 250 == 0:\n", + " print(f\" - frame: {len(frames)}\")\n", + "\n", + " imageio.mimsave(f'lunarlander_run-{i}.mp4', frames, fps=30)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n" + ] + }, + { + "ename": "", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n", + "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n", + "\u001b[1;31mClick here for more info. \n", + "\u001b[1;31mView Jupyter log for further details." + ] + } + ], + "source": [ + "imageio.mimsave('lunarlander_run.mp4', frames, fps=30)\n" ] }, { @@ -777,7 +1134,7 @@ "source": [ "#@title\n", "eval_env = Monitor(gym.make(\"LunarLander-v2\", render_mode='rgb_array'))\n", - "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)\n", + "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True, )\n", "print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")" ] }, @@ -889,12 +1246,248 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "metadata": { "id": "JPG7ofdGIHN8" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n", + "create a model card and push everything to the hub. It might take up to 1min.\n", + "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n", + "Saving video to /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n", + "MoviePy - Building video /tmp/tmpztlifguu/-step-0-to-step-1000.mp4.\n", + "MoviePy - Writing video /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n", + " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n", + " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n", + " libavutil 58. 29.100 / 58. 29.100\n", + " libavcodec 60. 31.102 / 60. 31.102\n", + " libavformat 60. 16.100 / 60. 16.100\n", + " libavdevice 60. 3.100 / 60. 3.100\n", + " libavfilter 9. 12.100 / 9. 12.100\n", + " libswscale 7. 5.100 / 7. 5.100\n", + " libswresample 4. 12.100 / 4. 12.100\n", + " libpostproc 57. 3.100 / 57. 3.100\n", + "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmpztlifguu/-step-0-to-step-1000.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf61.1.100\n", + " Duration: 00:00:20.00, start: 0.000000, bitrate: 51 kb/s\n", + " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 600x400, 46 kb/s, 50 fps, 50 tbr, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc61.3.100 libx264\n", + "Stream mapping:\n", + " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n", + "Press [q] to stop, [?] for help\n", + "[libx264 @ 0x55b3ab1fa980] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n", + "[libx264 @ 0x55b3ab1fa980] profile High, level 3.1, 4:2:0, 8-bit\n", + "[libx264 @ 0x55b3ab1fa980] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n", + "Output #0, mp4, to '/tmp/tmp2etu86el/replay.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf60.16.100\n", + " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 600x400, q=2-31, 50 fps, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc60.31.102 libx264\n", + " Side data:\n", + " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n", + "frame= 0 fps=0.0 q=0.0 size= 0kB time=N/A bitrate=N/A speed=N/A \r" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MoviePy - Done !\n", + "MoviePy - video ready /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[out#0/mp4 @ 0x55b3ab12b880] video:110kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 11.302154%\n", + "frame= 1000 fps=0.0 q=-1.0 Lsize= 122kB time=00:00:19.94 bitrate= 50.1kbits/s speed=22.1x \n", + "[libx264 @ 0x55b3ab1fa980] frame I:4 Avg QP: 9.42 size: 2199\n", + "[libx264 @ 0x55b3ab1fa980] frame P:268 Avg QP:18.69 size: 158\n", + "[libx264 @ 0x55b3ab1fa980] frame B:728 Avg QP:20.11 size: 83\n", + "[libx264 @ 0x55b3ab1fa980] consecutive B-frames: 0.8% 4.2% 6.6% 88.4%\n", + "[libx264 @ 0x55b3ab1fa980] mb I I16..4: 92.1% 1.7% 6.2%\n", + "[libx264 @ 0x55b3ab1fa980] mb P I16..4: 0.1% 0.3% 0.1% P16..4: 1.1% 0.2% 0.1% 0.0% 0.0% skip:98.1%\n", + "[libx264 @ 0x55b3ab1fa980] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 1.7% 0.2% 0.0% direct: 0.0% skip:98.0% L0:55.7% L1:43.8% BI: 0.6%\n", + "[libx264 @ 0x55b3ab1fa980] 8x8 transform intra:15.7% inter:16.2%\n", + "[libx264 @ 0x55b3ab1fa980] coded y,uvDC,uvAC intra: 7.0% 9.7% 8.7% inter: 0.1% 0.2% 0.1%\n", + "[libx264 @ 0x55b3ab1fa980] i16 v,h,dc,p: 90% 5% 5% 0%\n", + "[libx264 @ 0x55b3ab1fa980] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 11% 4% 84% 0% 0% 0% 0% 0% 0%\n", + "[libx264 @ 0x55b3ab1fa980] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 15% 58% 2% 3% 1% 3% 1% 3%\n", + "[libx264 @ 0x55b3ab1fa980] i8c dc,h,v,p: 93% 3% 3% 0%\n", + "[libx264 @ 0x55b3ab1fa980] Weighted P-Frames: Y:0.0% UV:0.0%\n", + "[libx264 @ 0x55b3ab1fa980] ref P L0: 66.6% 1.8% 20.4% 11.1%\n", + "[libx264 @ 0x55b3ab1fa980] ref B L0: 68.4% 27.3% 4.2%\n", + "[libx264 @ 0x55b3ab1fa980] ref B L1: 93.3% 6.7%\n", + "[libx264 @ 0x55b3ab1fa980] kb/s:44.60\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Pushing repo turbo-maikol/rl-course-unit1 to the Hugging Face Hub\u001b[0m\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing Files (0 / 0) : | | 0.00B / 0.00B \n", + "\u001b[A\n", + "Processing Files (1 / 1) : 0%| | 1.26kB / 408kB, 3.16kB/s \n", + "\u001b[A\n", + "\u001b[A\n", + "\u001b[A\n", + "\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (5 / 5) : 100%|██████████| 408kB / 408kB, 255kB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (5 / 5) : 100%|██████████| 408kB / 408kB, 185kB/s \n", + "New Data Upload : 100%|██████████| 406kB / 406kB, 185kB/s \n", + " ...unarLander-v2/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n", + " ...LunarLander-v2/policy.optimizer.pth: 100%|██████████| 88.9kB / 88.9kB \n", + " ...u86el/ppo-LunarLander-v2/policy.pth: 100%|██████████| 44.1kB / 44.1kB \n", + " .../tmp2etu86el/ppo-LunarLander-v2.zip: 100%|██████████| 149kB / 149kB \n", + " /tmp/tmp2etu86el/replay.mp4 : 100%|██████████| 125kB / 125kB \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n", + "https://huggingface.co/turbo-maikol/rl-course-unit1/tree/main/\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/rl-course-unit1/commit/3de80d180623b404c50319ea857ba782dccad4c9', commit_message='Model trained with PPO on LunarLander-v2 for the DEEP RL huggingface course', commit_description='', oid='3de80d180623b404c50319ea857ba782dccad4c9', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/rl-course-unit1', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/rl-course-unit1'), pr_revision=None, pr_num=None)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ + "import os\n", + "from dotenv import load_dotenv\n", + "load_dotenv()\n", + "\n", + "\n", "import gymnasium as gym\n", "from stable_baselines3.common.vec_env import DummyVecEnv\n", "from stable_baselines3.common.env_util import make_vec_env\n", @@ -903,29 +1496,32 @@ "\n", "## TODO: Define a repo_id\n", "## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n", - "repo_id =\n", + "repo_id = \"turbo-maikol/rl-course-unit1\"\n", "\n", "# TODO: Define the name of the environment\n", - "env_id =\n", + "env_id = \"LunarLander-v2\"\n", "\n", "# Create the evaluation env and set the render_mode=\"rgb_array\"\n", "eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode=\"rgb_array\"))])\n", "\n", "\n", "# TODO: Define the model architecture we used\n", - "model_architecture = \"\"\n", + "model_architecture = \"PPO\"\n", "\n", "## TODO: Define the commit message\n", - "commit_message = \"\"\n", + "commit_message = \"Model trained with PPO on LunarLander-v2 for the DEEP RL huggingface course\"\n", "\n", "# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub\n", - "package_to_hub(model=model, # Our trained model\n", - " model_name=model_name, # The name of our trained model\n", - " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n", - " env_id=env_id, # Name of the environment\n", - " eval_env=eval_env, # Evaluation Environment\n", - " repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n", - " commit_message=commit_message)" + "package_to_hub(\n", + " model=model, # Our trained model\n", + " model_name=model_name, # The name of our trained model\n", + " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n", + " env_id=env_id, # Name of the environment\n", + " eval_env=eval_env, # Evaluation Environment\n", + " repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n", + " commit_message=commit_message,\n", + " token=os.getenv(\"HF_HUB_TOKEN\")\n", + ")" ] }, { @@ -1066,9 +1662,9 @@ "# 1. Install pickle5 (we done it at the beginning of the colab)\n", "# 2. Create a custom empty object we pass as parameter to PPO.load()\n", "custom_objects = {\n", - " \"learning_rate\": 0.0,\n", - " \"lr_schedule\": lambda _: 0.0,\n", - " \"clip_range\": lambda _: 0.0,\n", + " \"learning_rate\": 0.0,\n", + " \"lr_schedule\": lambda _: 0.0, \n", + " \"clip_range\": lambda _: 0.0,\n", "}\n", "\n", "checkpoint = load_from_hub(repo_id, filename)\n", @@ -1163,18 +1759,21 @@ }, "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3.9.7", + "display_name": "venv-u1", "language": "python", "name": "python3" }, "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", "name": "python", - "version": "3.9.7" - }, - "vscode": { - "interpreter": { - "hash": "ed7f8024e43d3b8f5ca3c5e1a8151ab4d136b3ecee1e3fd59e0766ccc55e1b10" - } + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" } }, "nbformat": 4, diff --git a/notebooks/unit2/unit2.ipynb b/notebooks/unit2/unit2.ipynb index e9ae624..5df36f4 100644 --- a/notebooks/unit2/unit2.ipynb +++ b/notebooks/unit2/unit2.ipynb @@ -3,8 +3,8 @@ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" @@ -36,6 +36,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "DPTBOv9HYLZ2" + }, "source": [ "###🎮 Environments:\n", "\n", @@ -48,10 +51,7 @@ "- [Gymnasium](https://gymnasium.farama.org/)\n", "\n", "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)." - ], - "metadata": { - "id": "DPTBOv9HYLZ2" - } + ] }, { "cell_type": "markdown", @@ -72,14 +72,14 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "viNzVbVaYvY3" + }, "source": [ "## This notebook is from the Deep Reinforcement Learning Course\n", "\n", "\"Deep" - ], - "metadata": { - "id": "viNzVbVaYvY3" - } + ] }, { "cell_type": "markdown", @@ -156,28 +156,31 @@ }, { "cell_type": "markdown", - "source": [ - "# Let's code our first Reinforcement Learning algorithm 🚀" - ], "metadata": { "id": "HEtx8Y8MqKfH" - } + }, + "source": [ + "# Let's code our first Reinforcement Learning algorithm 🚀" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "Kdxb1IhzTn0v" + }, "source": [ "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained Taxi model to the Hub and **get a result of >= 4.5**.\n", "\n", "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n", "\n", "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" - ], - "metadata": { - "id": "Kdxb1IhzTn0v" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "4gpxC1_kqUYe" + }, "source": [ "## Install dependencies and create a virtual display 🔽\n", "\n", @@ -194,10 +197,7 @@ "The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n", "\n", "You can see here all the Deep RL models available (if they use Q Learning) here 👉 https://huggingface.co/models?other=q-learning" - ], - "metadata": { - "id": "4gpxC1_kqUYe" - } + ] }, { "cell_type": "code", @@ -212,53 +212,53 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "n71uTX7qqzz2" + }, + "outputs": [], "source": [ "!sudo apt-get update\n", "!sudo apt-get install -y python3-opengl\n", "!apt install ffmpeg xvfb\n", "!pip3 install pyvirtualdisplay" - ], - "metadata": { - "id": "n71uTX7qqzz2" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**" - ], "metadata": { "id": "K6XC13pTfFiD" - } + }, + "source": [ + "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**" + ] }, { "cell_type": "code", - "source": [ - "import os\n", - "os.kill(os.getpid(), 9)" - ], + "execution_count": null, "metadata": { "id": "3kuZbWAkfHdg" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "import os\n", + "os.kill(os.getpid(), 9)" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "DaY1N4dBrabi" + }, + "outputs": [], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" - ], - "metadata": { - "id": "DaY1N4dBrabi" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -276,7 +276,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "id": "VcNvOAQlysBJ" }, @@ -287,10 +287,8 @@ "import random\n", "import imageio\n", "import os\n", - "import tqdm\n", "\n", - "import pickle5 as pickle\n", - "from tqdm.notebook import tqdm" + "import pickle5 as pickle" ] }, { @@ -354,14 +352,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 35, "metadata": { "id": "IzJnb8O3y8up" }, "outputs": [], "source": [ "# Create the FrozenLake-v1 environment using 4x4 map and non-slippery version and render_mode=\"rgb_array\"\n", - "env = gym.make() # TODO use the correct parameters" + "\n", + "desc=[\n", + " \"SFFF\", \n", + " \"FHFH\", \n", + " \"FFFH\", \n", + " \"HFFG\"\n", + "]\n", + "env = gym.make(\"FrozenLake-v1\", map_name=\"4x4\", desc=desc, is_slippery=False, render_mode=\"rgb_array\") # TODO use the correct parameters" ] }, { @@ -411,11 +416,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 34, "metadata": { "id": "ZNPG0g_UGCfh" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "_____OBSERVATION SPACE_____ \n", + "\n", + "Observation Space Discrete(16)\n", + "Sample observation 0\n" + ] + } + ], "source": [ "# We create our environment with gym.make(\"\")- `is_slippery=False`: The agent always moves in the intended direction due to the non-slippery nature of the frozen lake (deterministic).\n", "print(\"_____OBSERVATION SPACE_____ \\n\")\n", @@ -441,11 +457,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": { "id": "We5WqOBGLoSm" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " _____ACTION SPACE_____ \n", + "\n", + "Action Space Shape 4\n", + "Action Space Sample 2\n" + ] + } + ], "source": [ "print(\"\\n _____ACTION SPACE_____ \\n\")\n", "print(\"Action Space Shape\", env.action_space.n)\n", @@ -488,22 +516,31 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": { "id": "y3ZCdluj3k0l" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 16 possible states\n", + "There are 4 possible actions\n" + ] + } + ], "source": [ - "state_space =\n", + "state_space = env.observation_space.n\n", "print(\"There are \", state_space, \" possible states\")\n", "\n", - "action_space =\n", + "action_space = env.action_space.n\n", "print(\"There are \", action_space, \" possible actions\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": { "id": "rCddoOXM3UQH" }, @@ -511,19 +548,47 @@ "source": [ "# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros. np.zeros needs a tuple (a,b)\n", "def initialize_q_table(state_space, action_space):\n", - " Qtable =\n", + " \"\"\"Is not a matrix, is an array and we can locate each game cell later with `current_row * ncols + current_col`\"\"\"\n", + " Qtable = np.zeros((state_space, action_space))\n", " return Qtable" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": { "id": "9YfvrqRt3jdR" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.],\n", + " [0., 0., 0., 0.]])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "Qtable_frozenlake = initialize_q_table(state_space, action_space)" + "Qtable_frozenlake = initialize_q_table(state_space, action_space)\n", + "Qtable_frozenlake" ] }, { @@ -595,17 +660,30 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": { "id": "E3SCLmLX5bWG" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "np.int64(0)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "def greedy_policy(Qtable, state):\n", " # Exploitation: take the action with the highest state, action value\n", - " action =\n", + " action = np.argmax(Qtable[state])\n", "\n", - " return action" + " return action\n", + "\n", + "greedy_policy(Qtable_frozenlake, 2)" ] }, { @@ -638,7 +716,7 @@ "id": "flILKhBU3yZ7" }, "source": [ - "##Define the epsilon-greedy policy 🤖\n", + "## Define the epsilon-greedy policy 🤖\n", "\n", "Epsilon-greedy is the training policy that handles the exploration/exploitation trade-off.\n", "\n", @@ -655,7 +733,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": { "id": "6Bj7x3in3_Pq" }, @@ -663,15 +741,15 @@ "source": [ "def epsilon_greedy_policy(Qtable, state, epsilon):\n", " # Randomly generate a number between 0 and 1\n", - " random_num =\n", + " random_num = np.random.random()\n", " # if random_num > greater than epsilon --> exploitation\n", " if random_num > epsilon:\n", " # Take the action with the highest value given a state\n", " # np.argmax can be useful here\n", - " action =\n", + " action = greedy_policy(Qtable, state)\n", " # else --> exploration\n", " else:\n", - " action = # Take a random action\n", + " action = env.action_space.sample() # np.random.randint(0, Qtable[state].size) # Take a random action\n", "\n", " return action" ] @@ -724,7 +802,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": { "id": "Y1tWn0tycWZ1" }, @@ -778,12 +856,13 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": { "id": "paOynXy3aoJW" }, "outputs": [], "source": [ + "from tqdm import tqdm\n", "def train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable):\n", " for episode in tqdm(range(n_training_episodes)):\n", " # Reduce epsilon (because we need less and less exploration)\n", @@ -796,15 +875,16 @@ "\n", " # repeat\n", " for step in range(max_steps):\n", - " # Choose the action At using epsilon greedy policy\n", - " action =\n", + " # TODO: Choose the action At using epsilon greedy policy\n", + " action = epsilon_greedy_policy(Qtable, state, epsilon)\n", "\n", - " # Take action At and observe Rt+1 and St+1\n", - " # Take the action (a) and observe the outcome state(s') and reward (r)\n", - " new_state, reward, terminated, truncated, info =\n", + " # TODO: Take action At and observe Rt+1 and St+1\n", + " # TODO: Take the action (a) and observe the outcome state(s') and reward (r)\n", + " new_state, reward, terminated, truncated, info = env.step(action)\n", "\n", - " # Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]\n", - " Qtable[state][action] =\n", + " # TODO: Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]\n", + " old_Qsa = Qtable[state][action]\n", + " Qtable[state][action] = old_Qsa + learning_rate * (reward + gamma * np.max(Qtable[new_state]) - old_Qsa)\n", "\n", " # If terminated or truncated finish the episode\n", " if terminated or truncated:\n", @@ -874,11 +954,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "metadata": { "id": "DPBxfjJdTCOH" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 10000/10000 [00:00<00:00, 11230.14it/s]\n" + ] + } + ], "source": [ "Qtable_frozenlake = train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable_frozenlake)" ] @@ -894,11 +982,37 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "metadata": { "id": "nmfchsTITw4q" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0.73509189, 0.77378094, 0.77378094, 0.73509189],\n", + " [0.73509189, 0. , 0.81450625, 0.77378094],\n", + " [0.77378094, 0.857375 , 0.77378094, 0.81450625],\n", + " [0.81450625, 0. , 0.77378094, 0.77378094],\n", + " [0.77378094, 0.81450625, 0. , 0.73509189],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0.9025 , 0. , 0.81450625],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0.81450625, 0. , 0.857375 , 0.77378094],\n", + " [0.81450625, 0.9025 , 0.9025 , 0. ],\n", + " [0.857375 , 0.95 , 0. , 0.857375 ],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0.9025 , 0.95 , 0.857375 ],\n", + " [0.9025 , 0.95 , 1. , 0.9025 ],\n", + " [0. , 0. , 0. , 0. ]])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "Qtable_frozenlake" ] @@ -916,7 +1030,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 20, "metadata": { "id": "jNl0_JO2cbkm" }, @@ -972,11 +1086,33 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 21, "metadata": { "id": "fAgB7s0HEFMm" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 100/100 [00:00<00:00, 12881.37it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mean_reward=1.00 +/- 0.00\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], "source": [ "# Evaluate our Agent\n", "mean_reward, std_reward = evaluate_agent(env, max_steps, n_eval_episodes, Qtable_frozenlake, eval_seed)\n", @@ -1018,7 +1154,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 23, "metadata": { "id": "Jex3i9lZ8ksX" }, @@ -1034,7 +1170,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 24, "metadata": { "id": "Qo57HBn3W74O" }, @@ -1065,6 +1201,11 @@ }, { "cell_type": "code", + "execution_count": 26, + "metadata": { + "id": "U4mdUTKkGnUd" + }, + "outputs": [], "source": [ "def push_to_hub(\n", " repo_id, model, env, video_fps=1, local_repo_path=\"hub\"\n", @@ -1194,12 +1335,7 @@ " )\n", "\n", " print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)" - ], - "metadata": { - "id": "U4mdUTKkGnUd" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -1269,7 +1405,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "metadata": { "id": "FiMqxqVHg0I4" }, @@ -1311,24 +1447,153 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 28, "metadata": { "id": "5sBo2umnXpPd" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "{'env_id': 'FrozenLake-v1',\n", + " 'max_steps': 99,\n", + " 'n_training_episodes': 10000,\n", + " 'n_eval_episodes': 100,\n", + " 'eval_seed': [],\n", + " 'learning_rate': 0.7,\n", + " 'gamma': 0.95,\n", + " 'max_epsilon': 1.0,\n", + " 'min_epsilon': 0.05,\n", + " 'decay_rate': 0.0005,\n", + " 'qtable': array([[0.73509189, 0.77378094, 0.77378094, 0.73509189],\n", + " [0.73509189, 0. , 0.81450625, 0.77378094],\n", + " [0.77378094, 0.857375 , 0.77378094, 0.81450625],\n", + " [0.81450625, 0. , 0.77378094, 0.77378094],\n", + " [0.77378094, 0.81450625, 0. , 0.73509189],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0.9025 , 0. , 0.81450625],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0.81450625, 0. , 0.857375 , 0.77378094],\n", + " [0.81450625, 0.9025 , 0.9025 , 0. ],\n", + " [0.857375 , 0.95 , 0. , 0.857375 ],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0. , 0. , 0. ],\n", + " [0. , 0.9025 , 0.95 , 0.857375 ],\n", + " [0.9025 , 0.95 , 1. , 0.9025 ],\n", + " [0. , 0. , 0. , 0. ]])}" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "model" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 29, "metadata": { "id": "RpOTtSt83kPZ" }, - "outputs": [], + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "e4d5c292dab14baa940d2ed46f0dd484", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 1 files: 0%| | 0/1 [00:00\"Open" @@ -41,6 +41,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "ykJiGevCMVc5" + }, "source": [ "### 🎮 Environments:\n", "\n", @@ -51,10 +54,7 @@ "### 📚 RL-Library:\n", "\n", "- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)" - ], - "metadata": { - "id": "ykJiGevCMVc5" - } + ] }, { "cell_type": "markdown", @@ -72,13 +72,13 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "TsnP0rjxMn1e" + }, "source": [ "## This notebook is from Deep Reinforcement Learning Course\n", "\"Deep" - ], - "metadata": { - "id": "TsnP0rjxMn1e" - } + ] }, { "cell_type": "markdown", @@ -114,12 +114,12 @@ }, { "cell_type": "markdown", - "source": [ - "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)." - ], "metadata": { "id": "7kszpGFaRVhq" - } + }, + "source": [ + "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)." + ] }, { "cell_type": "markdown", @@ -142,6 +142,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "Nc8BnyVEc3Ys" + }, "source": [ "## An advice 💡\n", "It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.\n", @@ -151,66 +154,63 @@ "Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.\n", "\n", "And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`." - ], - "metadata": { - "id": "Nc8BnyVEc3Ys" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "PU4FVzaoM6fC" + }, "source": [ "## Set the GPU 💪\n", "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "PU4FVzaoM6fC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "KV0NyFdQM9ZG" + }, "source": [ "- `Hardware Accelerator > GPU`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "KV0NyFdQM9ZG" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "wS_cVefO-aYg" + }, "source": [ "# Install RL-Baselines3 Zoo and its dependencies 📚\n", "\n", "If you see `ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.` **this is normal and it's not a critical error** there's a conflict of version. But the packages we need are installed." - ], - "metadata": { - "id": "wS_cVefO-aYg" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo" - ], + "execution_count": null, "metadata": { "id": "S1A_E4z3awa_" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo" + ] }, { "cell_type": "code", - "source": [ - "!apt-get install swig cmake ffmpeg" - ], + "execution_count": null, "metadata": { "id": "8_MllY6Om1eI" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!apt-get install swig cmake ffmpeg" + ] }, { "cell_type": "markdown", @@ -223,28 +223,28 @@ }, { "cell_type": "code", - "source": [ - "!pip install gymnasium[atari]\n", - "!pip install gymnasium[accept-rom-license]" - ], + "execution_count": null, "metadata": { "id": "NsRP-lX1_2fC" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install gymnasium[atari]\n", + "!pip install gymnasium[accept-rom-license]" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "bTpYcVZVMzUI" + }, "source": [ "## Create a virtual display 🔽\n", "\n", "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n", "\n", "Hence the following cell will install the librairies and create and run a virtual screen 🖥" - ], - "metadata": { - "id": "bTpYcVZVMzUI" - } + ] }, { "cell_type": "code", @@ -262,18 +262,18 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BE5JWP5rQIKf" + }, + "outputs": [], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" - ], - "metadata": { - "id": "BE5JWP5rQIKf" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -360,7 +360,7 @@ }, "outputs": [], "source": [ - "!python -m rl_zoo3.train --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________ -c _________" + "!python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -c dqn.yml" ] }, { @@ -396,13 +396,185 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "id": "co5um_KeKbBJ" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loading latest experiment, id=2\n", + "Loading logs/dqn/SpaceInvadersNoFrameskip-v4_2/SpaceInvadersNoFrameskip-v4.zip\n", + "A.L.E: Arcade Learning Environment (version 0.11.2+ecc1138)\n", + "[Powered by Stella]\n", + "Stacking 4 frames\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2771\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 25.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1943\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 35.00\n", + "Atari Episode Length 1891\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2727\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2749\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1985\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 30.00\n", + "Atari Episode Length 2727\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2787\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2787\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1927\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 30.00\n", + "Atari Episode Length 2077\n", + "Atari Episode Score: 35.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 30.00\n", + "Atari Episode Length 2749\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2699\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 25.00\n", + "Atari Episode Length 2001\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2069\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2863\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1943\n", + "Atari Episode Score: 10.00\n", + "Atari Episode Length 2675\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2775\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2025\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 35.00\n", + "Atari Episode Length 2787\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1925\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2699\n", + "Atari Episode Score: 10.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 35.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2771\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2709\n", + "Atari Episode Score: 20.00\n", + "Atari Episode Length 1943\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2749\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1973\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 1943\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 15.00\n", + "Atari Episode Length 2025\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2749\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 0.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 30.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 30.00\n", + "Atari Episode Length 2769\n", + "Atari Episode Score: 5.00\n", + "Atari Episode Length 2769\n" + ] + } + ], "source": [ - "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/" + "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 50000 --folder logs/" ] }, { @@ -534,7 +706,7 @@ }, "outputs": [], "source": [ - "!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name _____________________ -orga _____________________ -f logs/" + "python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name rl-course-unit3 -orga turbo-maikol -f logs/" ] }, { @@ -627,11 +799,26 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "id": "OdBNZHy0NGTR" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading from https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4\n", + "dqn-BeamRiderNoFrameskip-v4.zip: 100%|█████| 27.2M/27.2M [00:02<00:00, 12.6MB/s]\n", + "config.yml: 100%|██████████████████████████████| 548/548 [00:00<00:00, 4.99MB/s]\n", + "No normalization file\n", + "args.yml: 100%|████████████████████████████████| 887/887 [00:00<00:00, 4.13MB/s]\n", + "env_kwargs.yml: 100%|████████████████████████| 3.00/3.00 [00:00<00:00, 9.20kB/s]\n", + "train_eval_metrics.zip: 100%|████████████████| 244k/244k [00:00<00:00, 12.7MB/s]\n", + "Saving to rl_trained/dqn/BeamRiderNoFrameskip-v4_1\n" + ] + } + ], "source": [ "# Download model and save it into the logs/ folder\n", "!python -m rl_zoo3.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/" @@ -648,11 +835,35 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "id": "aOxs0rNuN0uS" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.\n", + "Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.\n", + "Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.\n", + "See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.\n", + "Loading latest experiment, id=1\n", + "Loading rl_trained/dqn/BeamRiderNoFrameskip-v4_1/BeamRiderNoFrameskip-v4.zip\n", + "A.L.E: Arcade Learning Environment (version 0.11.2+ecc1138)\n", + "[Powered by Stella]\n", + "Stacking 4 frames\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/save_util.py:167: UserWarning: Could not deserialize object exploration_schedule. Consider using `custom_objects` argument to replace this object.\n", + "Exception: 'bytes' object cannot be interpreted as an integer\n", + " warnings.warn(\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/vec_env/patch_gym.py:95: UserWarning: You loaded a model that was trained using OpenAI Gym. We strongly recommend transitioning to Gymnasium by saving that model again.\n", + " warnings.warn(\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/base_class.py:773: UserWarning: You are probably loading a DQN model saved with SB3 < 2.4.0, we truncated the optimizer state so you can save the model again to avoid issues in the future (see https://github.com/DLR-RM/stable-baselines3/pull/1963 for more info). Original error: loaded state dict contains a parameter group that doesn't match the size of optimizer's group \n", + "Note: the model should still work fine, this only a warning.\n", + " warnings.warn(\n" + ] + } + ], "source": [ "!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/ --no-render" ] @@ -734,12 +945,12 @@ }, { "cell_type": "markdown", - "source": [ - "See you on Bonus unit 2! 🔥" - ], "metadata": { "id": "Kc3udPT-RcXc" - } + }, + "source": [ + "See you on Bonus unit 2! 🔥" + ] }, { "cell_type": "markdown", @@ -752,13 +963,15 @@ } ], "metadata": { + "accelerator": "GPU", "colab": { + "include_colab_link": true, "private_outputs": true, - "provenance": [], - "include_colab_link": true + "provenance": [] }, + "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "venv-u2", "language": "python", "name": "python3" }, @@ -772,7 +985,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.6" + "version": "3.10.18" }, "varInspector": { "cols": { @@ -802,10 +1015,8 @@ "_Feature" ], "window_display": false - }, - "accelerator": "GPU", - "gpuClass": "standard" + } }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/notebooks/unit4/unit4.ipynb b/notebooks/unit4/unit4.ipynb index 884eddd..afa1d5c 100644 --- a/notebooks/unit4/unit4.ipynb +++ b/notebooks/unit4/unit4.ipynb @@ -3,8 +3,8 @@ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" @@ -36,15 +36,18 @@ }, { "cell_type": "markdown", - "source": [ - " \"Environments\"/\n" - ], "metadata": { "id": "s4rBom2sbo7S" - } + }, + "source": [ + " \"Environments\"/\n" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "BPLwsPajb1f8" + }, "source": [ "### 🎮 Environments: \n", "\n", @@ -58,10 +61,7 @@ "\n", "\n", "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)." - ], - "metadata": { - "id": "BPLwsPajb1f8" - } + ] }, { "cell_type": "markdown", @@ -120,6 +120,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "Bsh4ZAamchSl" + }, "source": [ "# Let's code Reinforce algorithm from scratch 🔥\n", "\n", @@ -132,58 +135,55 @@ "To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**. **If you don't see your model on the leaderboard, go at the bottom of the leaderboard page and click on the refresh button**.\n", "\n", "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process\n" - ], - "metadata": { - "id": "Bsh4ZAamchSl" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "JoTC9o2SczNn" + }, "source": [ "## An advice 💡\n", "It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.\n", "\n", "To do that you can either do `Ctrl + S` or `File > Save a copy in Google Drive.`" - ], - "metadata": { - "id": "JoTC9o2SczNn" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "PU4FVzaoM6fC" + }, "source": [ "## Set the GPU 💪\n", "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "PU4FVzaoM6fC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "KV0NyFdQM9ZG" + }, "source": [ "- `Hardware Accelerator > GPU`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "KV0NyFdQM9ZG" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "bTpYcVZVMzUI" + }, "source": [ "## Create a virtual display 🖥\n", "\n", "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n", "\n", "Hence the following cell will install the librairies and create and run a virtual screen 🖥" - ], - "metadata": { - "id": "bTpYcVZVMzUI" - } + ] }, { "cell_type": "code", @@ -203,18 +203,18 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Sr-Nuyb1dBm0" + }, + "outputs": [], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" - ], - "metadata": { - "id": "Sr-Nuyb1dBm0" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -245,14 +245,14 @@ }, { "cell_type": "code", - "source": [ - "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit4/requirements-unit4.txt" - ], + "execution_count": null, "metadata": { "id": "e8ZVi-uydpgL" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit4/requirements-unit4.txt" + ] }, { "cell_type": "markdown", @@ -269,7 +269,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 25, "metadata": { "id": "V8oadoJSWp7C" }, @@ -290,46 +290,47 @@ "from torch.distributions import Categorical\n", "\n", "# Gym\n", - "import gym\n", - "import gym_pygame\n", + "import gymnasium as gym\n", + "# import gym_pygame\n", "\n", "# Hugging Face Hub\n", "from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n", - "import imageio" + "import imageio\n", + "\n", + "%load_ext autoreload\n", + "%autoreload 2" ] }, { "cell_type": "markdown", + "metadata": { + "id": "RfxJYdMeeVgv" + }, "source": [ "## Check if we have a GPU\n", "\n", "- Let's check if we have a GPU\n", "- If it's the case you should see `device:cuda0`" - ], - "metadata": { - "id": "RfxJYdMeeVgv" - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "kaJu5FeZxXGY" - }, - "outputs": [], - "source": [ - "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { - "id": "U5TNYa14aRav" + "id": "kaJu5FeZxXGY" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "cuda:0\n" + ] + } + ], "source": [ - "print(device)" + "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", + "print(device)\n" ] }, { @@ -393,7 +394,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": { "id": "POOOk15_K6KA" }, @@ -404,7 +405,7 @@ "env = gym.make(env_id)\n", "\n", "# Create the evaluation env\n", - "eval_env = gym.make(env_id)\n", + "eval_env = gym.make(env_id, render_mode=\"rgb_array\")\n", "\n", "# Get the state space and action space\n", "s_size = env.observation_space.shape[0]\n", @@ -413,11 +414,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "id": "FMLFrjiBNLYJ" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "_____OBSERVATION SPACE_____ \n", + "\n", + "The State Space is: 4\n", + "Sample observation [-0.92062986 -0.65902454 0.2579916 -0.6175645 ]\n" + ] + } + ], "source": [ "print(\"_____OBSERVATION SPACE_____ \\n\")\n", "print(\"The State Space is: \", s_size)\n", @@ -426,11 +438,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { "id": "Lu6t4sRNNWkN" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " _____ACTION SPACE_____ \n", + "\n", + "The Action Space is: 2\n", + "Action Space Sample 1\n" + ] + } + ], "source": [ "print(\"\\n _____ACTION SPACE_____ \\n\")\n", "print(\"The Action Space is: \", a_size)\n", @@ -466,27 +490,43 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 21, "metadata": { "id": "w2LHcHhVZvPZ" }, - "outputs": [], + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'nn' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[21], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mclass\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mPolicy\u001b[39;00m(\u001b[43mnn\u001b[49m\u001b[38;5;241m.\u001b[39mModule):\n\u001b[1;32m 2\u001b[0m \u001b[38;5;66;03m# State # Action # hidden\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21m__init__\u001b[39m(\u001b[38;5;28mself\u001b[39m, s_size, a_size, h_size):\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28msuper\u001b[39m(Policy, \u001b[38;5;28mself\u001b[39m)\u001b[38;5;241m.\u001b[39m\u001b[38;5;21m__init__\u001b[39m()\n", + "\u001b[0;31mNameError\u001b[0m: name 'nn' is not defined" + ] + } + ], "source": [ "class Policy(nn.Module):\n", + " # State # Action # hidden\n", " def __init__(self, s_size, a_size, h_size):\n", " super(Policy, self).__init__()\n", " # Create two fully connected layers\n", - "\n", - "\n", + " self.fc1 = nn.Linear(s_size, h_size)\n", + " self.fc2 = nn.Linear(h_size, a_size)\n", + " self.relu = nn.ReLU()\n", "\n", " def forward(self, x):\n", " # Define the forward pass\n", " # state goes to fc1 then we apply ReLU activation function\n", - "\n", + " x = self.relu(self.fc1(x))\n", " # fc1 outputs goes to fc2\n", + " x = self.fc2(x)\n", "\n", " # We output the softmax\n", - " \n", + " return F.softmax(x, dim=1)\n", + "\n", " def act(self, state):\n", " \"\"\"\n", " Given a state, take action\n", @@ -494,7 +534,7 @@ " state = torch.from_numpy(state).float().unsqueeze(0).to(device)\n", " probs = self.forward(state).cpu()\n", " m = Categorical(probs)\n", - " action = np.argmax(m)\n", + " action = m.sample() #torch.argmax(probs, dim=1)#np.argmax(m)\n", " return action.item(), m.log_prob(action)" ] }, @@ -554,7 +594,7 @@ "outputs": [], "source": [ "debug_policy = Policy(s_size, a_size, 64).to(device)\n", - "debug_policy.act(env.reset())" + "debug_policy.act(env.reset()[0])" ] }, { @@ -619,14 +659,14 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "c-20i7Pk0l1T" + }, "source": [ "- Since **we want to sample an action from the probability distribution over actions**, we can't use `action = np.argmax(m)` since it will always output the action that have the highest probability.\n", "\n", "- We need to replace with `action = m.sample()` that will sample an action from the probability distribution P(.|s)" - ], - "metadata": { - "id": "c-20i7Pk0l1T" - } + ] }, { "cell_type": "markdown", @@ -643,6 +683,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "QmcXG-9i2Qu2" + }, "source": [ "- When we calculate the return Gt (line 6) we see that we calculate the sum of discounted rewards **starting at timestep t**.\n", "\n", @@ -652,10 +695,7 @@ "\n", "We use an interesting technique coded by [Chris1nexus](https://github.com/Chris1nexus) to **compute the return at each timestep efficiently**. The comments explained the procedure. Don't hesitate also [to check the PR explanation](https://github.com/huggingface/deep-rl-class/pull/95)\n", "But overall the idea is to **compute the return at each timestep efficiently**." - ], - "metadata": { - "id": "QmcXG-9i2Qu2" - } + ] }, { "cell_type": "markdown", @@ -676,38 +716,72 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "state = array([-0.01473051, 0.02841404, 0.0272485 , -0.03844116], dtype=float32)\n", + "state = array([-0.01416223, 0.22313488, 0.02647968, -0.3224039 ], dtype=float32)\n", + "reward = 1.0\n", + "terminated = False\n", + "truncated = False\n", + "_ = {}\n" + ] + } + ], + "source": [ + "state, _ = env.reset()\n", + "print(f\"{state = }\")\n", + "state, reward, terminated, truncated, _ = env.step(1)\n", + "\n", + "print(f\"{state = }\")\n", + "print(f\"{reward = }\")\n", + "print(f\"{terminated = }\")\n", + "print(f\"{truncated = }\")\n", + "print(f\"{_ = }\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, "metadata": { "id": "iOdv8Q9NfLK7" }, "outputs": [], "source": [ - "def reinforce(policy, optimizer, n_training_episodes, max_t, gamma, print_every):\n", + "def reinforce(policy, optimizer, n_training_episodes, max_t, gamma, print_every, max_pacience = 50):\n", " # Help us to calculate the score during the training\n", - " scores_deque = deque(maxlen=100)\n", - " scores = []\n", + " scores = deque(maxlen=100)\n", + "\n", + " last_max = np.mean(scores)\n", + " pacience = 0\n", " # Line 3 of pseudocode\n", " for i_episode in range(1, n_training_episodes+1):\n", - " saved_log_probs = []\n", - " rewards = []\n", - " state = # TODO: reset the environment\n", - " # Line 4 of pseudocode\n", + " rewards, saved_log_probs = [], []\n", + " state, _ = env.reset() # TODO: reset the environment\n", + "\n", + " # ========= Line 4 of pseudocode =========\n", " for t in range(max_t):\n", - " action, log_prob = # TODO get the action\n", + " action, log_prob = policy.act(state) # TODO get the action\n", " saved_log_probs.append(log_prob)\n", - " state, reward, done, _ = # TODO: take an env step\n", + " state, reward, terminated, truncated, _ = env.step(action) # TODO: take an env step\n", " rewards.append(reward)\n", - " if done:\n", + " if terminated or truncated:\n", " break \n", - " scores_deque.append(sum(rewards))\n", + "\n", " scores.append(sum(rewards))\n", " \n", - " # Line 6 of pseudocode: calculate the return\n", + " # ========= Line 6 of pseudocode: calculate the return =========\n", " returns = deque(maxlen=max_t) \n", " n_steps = len(rewards) \n", + " \n", + " \"\"\"# ================ EXPLANATION ================\n", " # Compute the discounted returns at each timestep,\n", " # as the sum of the gamma-discounted return at time t (G_t) + the reward at time t\n", - " \n", + "\n", " # In O(N) time, where N is the number of time steps\n", " # (this definition of the discounted return G_t follows the definition of this quantity \n", " # shown at page 44 of Sutton&Barto 2017 2nd draft)\n", @@ -723,7 +797,6 @@ " # This is correct since the above is equivalent to (see also page 46 of Sutton&Barto 2017 2nd draft)\n", " # G_(t-1) = r_t + gamma*r_(t+1) + gamma*gamma*r_(t+2) + ...\n", " \n", - " \n", " ## Given the above, we calculate the returns at timestep t as: \n", " # gamma[t] * return[t] + reward[t]\n", " #\n", @@ -733,10 +806,11 @@ " \n", " ## Hence, the queue \"returns\" will hold the returns in chronological order, from t=0 to t=n_steps\n", " ## thanks to the appendleft() function which allows to append to the position 0 in constant time O(1)\n", - " ## a normal python list would instead require O(N) to do this.\n", + " ## a normal python list would instead require O(N) to do this.\"\"\"\n", + " disc_return_t = 0\n", " for t in range(n_steps)[::-1]:\n", - " disc_return_t = (returns[0] if len(returns)>0 else 0)\n", - " returns.appendleft( ) # TODO: complete here \n", + " returns.appendleft(disc_return_t * gamma + rewards[t]) \n", + " disc_return_t = returns[0]\n", " \n", " ## standardization of the returns is employed to make training more stable\n", " eps = np.finfo(np.float32).eps.item()\n", @@ -746,21 +820,30 @@ " returns = torch.tensor(returns)\n", " returns = (returns - returns.mean()) / (returns.std() + eps)\n", " \n", - " # Line 7:\n", + " # ========= Line 7=========\n", " policy_loss = []\n", " for log_prob, disc_return in zip(saved_log_probs, returns):\n", " policy_loss.append(-log_prob * disc_return)\n", " policy_loss = torch.cat(policy_loss).sum()\n", " \n", - " # Line 8: PyTorch prefers gradient descent \n", + " # ========= Line 8: PyTorch prefers gradient descent =========\n", " optimizer.zero_grad()\n", " policy_loss.backward()\n", " optimizer.step()\n", " \n", + " mean = np.mean(scores)\n", " if i_episode % print_every == 0:\n", - " print('Episode {}\\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)))\n", + " print('Episode {}\\tAverage Score: {:.2f}'.format(i_episode, mean))\n", + "\n", + " if last_max >= mean:\n", + " pacience += 1\n", + " if pacience >= max_pacience:\n", + " print(' - Breaking at Episode {}\\t with average Score: {:.2f} for max pacience {:.2f}'.format(i_episode, mean, last_max))\n", + " break\n", + " else:\n", + " last_max, pacience = mean, 0\n", " \n", - " return scores" + " return list(scores)" ] }, { @@ -788,7 +871,7 @@ " for i_episode in range(1, n_training_episodes+1):\n", " saved_log_probs = []\n", " rewards = []\n", - " state = env.reset()\n", + " state, _ = env.reset()\n", " # Line 4 of pseudocode\n", " for t in range(max_t):\n", " action, log_prob = policy.act(state)\n", @@ -875,7 +958,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": { "id": "utRe1NgtVBYF" }, @@ -886,17 +969,17 @@ " \"n_training_episodes\": 1000,\n", " \"n_evaluation_episodes\": 10,\n", " \"max_t\": 1000,\n", - " \"gamma\": 1.0,\n", + " \"gamma\": 0.99,\n", " \"lr\": 1e-2,\n", " \"env_id\": env_id,\n", - " \"state_space\": s_size,\n", - " \"action_space\": a_size,\n", + " \"state_space\": int(s_size),\n", + " \"action_space\": int(a_size),\n", "}" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": { "id": "D3lWyVXBVfl6" }, @@ -909,18 +992,31 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 32, "metadata": { "id": "uGf-hQCnfouB" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Episode 25\tAverage Score: 500.00\n", + "Episode 50\tAverage Score: 500.00\n", + " - Breaking at Episode 51\t with average Score: 500.00 for max pacience 500.00\n" + ] + } + ], "source": [ - "scores = reinforce(cartpole_policy,\n", - " cartpole_optimizer,\n", - " cartpole_hyperparameters[\"n_training_episodes\"], \n", - " cartpole_hyperparameters[\"max_t\"],\n", - " cartpole_hyperparameters[\"gamma\"], \n", - " 100)" + "scores = reinforce(\n", + " cartpole_policy,\n", + " cartpole_optimizer,\n", + " cartpole_hyperparameters[\"n_training_episodes\"], \n", + " cartpole_hyperparameters[\"max_t\"],\n", + " cartpole_hyperparameters[\"gamma\"], \n", + " 25,\n", + " 50\n", + ")" ] }, { @@ -935,7 +1031,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 20, "metadata": { "id": "3FamHmxyhBEU" }, @@ -950,20 +1046,60 @@ " \"\"\"\n", " episode_rewards = []\n", " for episode in range(n_eval_episodes):\n", - " state = env.reset()\n", - " step = 0\n", - " done = False\n", + " state, _ = env.reset()\n", " total_rewards_ep = 0\n", " \n", - " for step in range(max_steps):\n", + " for _ in range(max_steps):\n", " action, _ = policy.act(state)\n", - " new_state, reward, done, info = env.step(action)\n", + " new_state, reward, terminated, truncated, _ = env.step(action)\n", " total_rewards_ep += reward\n", " \n", - " if done:\n", + " if terminated or truncated:\n", + " break\n", + "\n", + " state = new_state\n", + " episode_rewards.append(total_rewards_ep)\n", + "\n", + " if episode % 100 == 0:\n", + " print(f\"Episode: {episode:.4f}, mean reward: {np.mean(episode_rewards):.4f}\")\n", + "\n", + " mean_reward = np.mean(episode_rewards)\n", + " std_reward = np.std(episode_rewards)\n", + "\n", + " return mean_reward, std_reward\n", + "\n", + "\n", + "def evaluate_agent_pygame(env, max_steps, n_eval_episodes, policy, game_p):\n", + " \"\"\"\n", + " Evaluate the agent for ``n_eval_episodes`` episodes and returns average reward and std of reward.\n", + " :param env: The evaluation environment\n", + " :param n_eval_episodes: Number of episode to evaluate the agent\n", + " :param policy: The Reinforce agent\n", + " \"\"\"\n", + " episode_rewards = []\n", + " game_p.init()\n", + " actions_set = game_p.getActionSet()\n", + " for episode in range(n_eval_episodes):\n", + " game_p.reset_game()\n", + " state = np.array(list(game_p.getGameState().values()), dtype=np.float32)\n", + " \n", + " total_rewards_ep = 0\n", + " \n", + " for _ in range(max_steps):\n", + " action, _ = policy.act(state)\n", + " action = actions_set[action]\n", + " reward = game_p.act(action)\n", + " total_rewards_ep += reward\n", + " new_state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n", + " if game_p.game_over():\n", " break\n", + "\n", " state = new_state\n", " episode_rewards.append(total_rewards_ep)\n", + "\n", + " if episode % 100 == 0:\n", + " print(f\"Episode: {episode:.4f}, mean reward: {np.mean(episode_rewards):.4f}\")\n", + "\n", " mean_reward = np.mean(episode_rewards)\n", " std_reward = np.std(episode_rewards)\n", "\n", @@ -981,16 +1117,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 34, "metadata": { "id": "ohGSXDyHh0xx" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Episode: 0.0000, mean reward: 500.0000\n" + ] + }, + { + "data": { + "text/plain": [ + "(np.float64(500.0), np.float64(0.0))" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "evaluate_agent(eval_env, \n", - " cartpole_hyperparameters[\"max_t\"], \n", - " cartpole_hyperparameters[\"n_evaluation_episodes\"],\n", - " cartpole_policy)" + "evaluate_agent(\n", + " eval_env, \n", + " cartpole_hyperparameters[\"max_t\"], \n", + " cartpole_hyperparameters[\"n_evaluation_episodes\"],\n", + " cartpole_policy\n", + ")" ] }, { @@ -1019,6 +1175,11 @@ }, { "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "LIVsvlW_8tcw" + }, + "outputs": [], "source": [ "from huggingface_hub import HfApi, snapshot_download\n", "from huggingface_hub.repocard import metadata_eval_result, metadata_save\n", @@ -1031,21 +1192,18 @@ "import tempfile\n", "\n", "import os" - ], - "metadata": { - "id": "LIVsvlW_8tcw" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": { "id": "Lo4JH45if81z" }, "outputs": [], "source": [ + "import pygame\n", + "\n", "def record_video(env, policy, out_directory, fps=30):\n", " \"\"\"\n", " Generate a replay video of the agent\n", @@ -1056,27 +1214,75 @@ " \"\"\"\n", " images = [] \n", " done = False\n", - " state = env.reset()\n", - " img = env.render(mode='rgb_array')\n", + " state, _ = env.reset()\n", + " img = env.render()\n", " images.append(img)\n", - " while not done:\n", + " for frame in range(fps*100):\n", " # Take the action (index) that have the maximum expected future reward given that state\n", " action, _ = policy.act(state)\n", - " state, reward, done, info = env.step(action) # We directly put next_state = state for recording logic\n", - " img = env.render(mode='rgb_array')\n", + " state, reward, terminated, truncated, _ = env.step(action) # We directly put next_state = state for recording logic\n", + " img = env.render()\n", " images.append(img)\n", + "\n", + " if terminated or truncated:\n", + " break\n", + "\n", + "\n", + " print(\" - Teminated video loop, mimsave...\")\n", + " imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)\n", + "\n", + "\n", + "def record_video_pygame(env, policy, out_directory, game_p, fps=30):\n", + " \"\"\"\n", + " Generate a replay video of the agent\n", + " :param env\n", + " :param Qtable: Qtable of our agent\n", + " :param out_directory\n", + " :param fps: how many frame per seconds (with taxi-v3 and frozenlake-v1 we use 1)\n", + " \"\"\"\n", + " images = [] \n", + " game_p.init()\n", + " actions_set = game_p.getActionSet()\n", + " game_p.reset_game()\n", + " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) # TODO: reset the environment\n", + " \n", + " for frame in range(fps*100):\n", + " # Take the action (index) that have the maximum expected future reward given that state\n", + " action, _ = policy.act(state)\n", + " action = actions_set[action]\n", + " reward = game_p.act(action) # We directly put next_state = state for recording logic\n", + "\n", + "\n", + " surface = pygame.display.get_surface()\n", + " if surface is not None:\n", + " img = pygame.surfarray.array3d(surface) # shape (W,H,3)\n", + " img = np.transpose(img, (1, 0, 2)) # (H,W,3)\n", + " images.append(img)\n", + "\n", + " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n", + " if game_p.game_over():\n", + " break\n", + "\n", + "\n", + " print(\" - Teminated video loop, mimsave...\")\n", " imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)" ] }, { "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "_TPdq47D7_f_" + }, + "outputs": [], "source": [ - "def push_to_hub(repo_id, \n", - " model,\n", - " hyperparameters,\n", - " eval_env,\n", - " video_fps=30\n", - " ):\n", + "def push_to_hub(\n", + " repo_id, \n", + " model,\n", + " hyperparameters,\n", + " eval_env,\n", + " video_fps=30\n", + "):\n", " \"\"\"\n", " Evaluate, Generate a video and Upload a model to Hugging Face Hub.\n", " This method does the complete pipeline:\n", @@ -1180,9 +1386,11 @@ "\n", " # Step 6: Record a video\n", " video_path = local_directory / \"replay.mp4\"\n", - " record_video(env, model, video_path, video_fps)\n", + " print(\"VIDEO\")\n", + " record_video(eval_env, model, video_path, video_fps)\n", "\n", " # Step 7. Push everything to the Hub\n", + " print(\"PUSH\")\n", " api.upload_folder(\n", " repo_id=repo_id,\n", " folder_path=local_directory,\n", @@ -1190,26 +1398,155 @@ " )\n", "\n", " print(f\"Your model is pushed to the Hub. You can view your model here: {repo_url}\")" - ], - "metadata": { - "id": "_TPdq47D7_f_" - }, - "execution_count": null, - "outputs": [] + ] }, { - "cell_type": "markdown", - "metadata": { - "id": "w17w8CxzoURM" - }, + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], "source": [ - "### .\n", - "\n", - "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.\n", + "def push_to_hub_pygame(\n", + " repo_id, \n", + " model,\n", + " hyperparameters,\n", + " eval_env,\n", + " game_p,\n", + " video_fps=30,\n", + "):\n", + " \"\"\"\n", + " Evaluate, Generate a video and Upload a model to Hugging Face Hub.\n", + " This method does the complete pipeline:\n", + " - It evaluates the model\n", + " - It generates the model card\n", + " - It generates a replay video of the agent\n", + " - It pushes everything to the Hub\n", "\n", - "This way:\n", - "- You can **showcase our work** 🔥\n", - "- You can **visualize your agent playing** 👀\n", + " :param repo_id: repo_id: id of the model repository from the Hugging Face Hub\n", + " :param model: the pytorch model we want to save\n", + " :param hyperparameters: training hyperparameters\n", + " :param eval_env: evaluation environment\n", + " :param video_fps: how many frame per seconds to record our video replay \n", + " \"\"\"\n", + "\n", + " _, repo_name = repo_id.split(\"/\")\n", + " api = HfApi()\n", + " \n", + " # Step 1: Create the repo\n", + " repo_url = api.create_repo(\n", + " repo_id=repo_id,\n", + " exist_ok=True,\n", + " )\n", + "\n", + " with tempfile.TemporaryDirectory() as tmpdirname:\n", + " local_directory = Path(tmpdirname)\n", + " \n", + " # Step 2: Save the model\n", + " torch.save(model, local_directory / \"model.pt\")\n", + "\n", + " # Step 3: Save the hyperparameters to JSON\n", + " with open(local_directory / \"hyperparameters.json\", \"w\") as outfile:\n", + " json.dump(hyperparameters, outfile)\n", + " \n", + " # Step 4: Evaluate the model and build JSON\n", + " mean_reward, std_reward = evaluate_agent_pygame(\n", + " eval_env, \n", + " hyperparameters[\"max_t\"],\n", + " hyperparameters[\"n_evaluation_episodes\"], \n", + " model,\n", + " game_p\n", + " )\n", + " # Get datetime\n", + " eval_datetime = datetime.datetime.now()\n", + " eval_form_datetime = eval_datetime.isoformat()\n", + "\n", + " evaluate_data = {\n", + " \"env_id\": hyperparameters[\"env_id\"], \n", + " \"mean_reward\": mean_reward,\n", + " \"n_evaluation_episodes\": hyperparameters[\"n_evaluation_episodes\"],\n", + " \"eval_datetime\": eval_form_datetime,\n", + " }\n", + "\n", + " # Write a JSON file\n", + " with open(local_directory / \"results.json\", \"w\") as outfile:\n", + " json.dump(evaluate_data, outfile)\n", + "\n", + " # Step 5: Create the model card\n", + " env_name = hyperparameters[\"env_id\"]\n", + " \n", + " metadata = {}\n", + " metadata[\"tags\"] = [\n", + " env_name,\n", + " \"reinforce\",\n", + " \"reinforcement-learning\",\n", + " \"custom-implementation\",\n", + " \"deep-rl-class\"\n", + " ]\n", + "\n", + " # Add metrics\n", + " eval = metadata_eval_result(\n", + " model_pretty_name=repo_name,\n", + " task_pretty_name=\"reinforcement-learning\",\n", + " task_id=\"reinforcement-learning\",\n", + " metrics_pretty_name=\"mean_reward\",\n", + " metrics_id=\"mean_reward\",\n", + " metrics_value=f\"{mean_reward:.2f} +/- {std_reward:.2f}\",\n", + " dataset_pretty_name=env_name,\n", + " dataset_id=env_name,\n", + " )\n", + "\n", + " # Merges both dictionaries\n", + " metadata = {**metadata, **eval}\n", + "\n", + " model_card = f\"\"\"\n", + " # **Reinforce** Agent playing **{env_id}**\n", + " This is a trained model of a **Reinforce** agent playing **{env_id}** .\n", + " To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction\n", + " \"\"\"\n", + "\n", + " readme_path = local_directory / \"README.md\"\n", + " readme = \"\"\n", + " if readme_path.exists():\n", + " with readme_path.open(\"r\", encoding=\"utf8\") as f:\n", + " readme = f.read()\n", + " else:\n", + " readme = model_card\n", + "\n", + " with readme_path.open(\"w\", encoding=\"utf-8\") as f:\n", + " f.write(readme)\n", + "\n", + " # Save our metrics to Readme metadata\n", + " metadata_save(readme_path, metadata)\n", + "\n", + " # Step 6: Record a video\n", + " video_path = local_directory / \"replay.mp4\"\n", + " print(\"VIDEO\")\n", + " record_video_pygame(eval_env, model, video_path, game_p, video_fps)\n", + "\n", + " # Step 7. Push everything to the Hub\n", + " print(\"PUSH\")\n", + " api.upload_folder(\n", + " repo_id=repo_id,\n", + " folder_path=local_directory,\n", + " path_in_repo=\".\",\n", + " )\n", + "\n", + " print(f\"Your model is pushed to the Hub. You can view your model here: {repo_url}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w17w8CxzoURM" + }, + "source": [ + "### .\n", + "\n", + "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.\n", + "\n", + "This way:\n", + "- You can **showcase our work** 🔥\n", + "- You can **visualize your agent playing** 👀\n", "- You can **share with the community an agent that others can use** 💾\n", "- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n" ] @@ -1262,19 +1599,32 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 37, "metadata": { "id": "UNwkTS65Uq3Q" }, - "outputs": [], + "outputs": [ + { + "ename": "NameError", + "evalue": "name 'cartpole_policy' is not defined", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[37], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m repo_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mturbo-maikol/Reinforce-rl-course-unit4-cartpole\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m#TODO Define your repo id {username/Reinforce-{model-id}}\u001b[39;00m\n\u001b[1;32m 2\u001b[0m push_to_hub(\n\u001b[1;32m 3\u001b[0m repo_id,\n\u001b[0;32m----> 4\u001b[0m \u001b[43mcartpole_policy\u001b[49m, \u001b[38;5;66;03m# The model we want to save\u001b[39;00m\n\u001b[1;32m 5\u001b[0m cartpole_hyperparameters, \u001b[38;5;66;03m# Hyperparameters\u001b[39;00m\n\u001b[1;32m 6\u001b[0m eval_env, \u001b[38;5;66;03m# Evaluation environment\u001b[39;00m\n\u001b[1;32m 7\u001b[0m video_fps\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m30\u001b[39m\n\u001b[1;32m 8\u001b[0m )\n", + "\u001b[0;31mNameError\u001b[0m: name 'cartpole_policy' is not defined" + ] + } + ], "source": [ - "repo_id = \"\" #TODO Define your repo id {username/Reinforce-{model-id}}\n", - "push_to_hub(repo_id,\n", - " cartpole_policy, # The model we want to save\n", - " cartpole_hyperparameters, # Hyperparameters\n", - " eval_env, # Evaluation environment\n", - " video_fps=30\n", - " )" + "repo_id = \"turbo-maikol/Reinforce-rl-course-unit4-cartpole\" #TODO Define your repo id {username/Reinforce-{model-id}}\n", + "push_to_hub(\n", + " repo_id,\n", + " cartpole_policy, # The model we want to save\n", + " cartpole_hyperparameters, # Hyperparameters\n", + " eval_env, # Evaluation environment\n", + " video_fps=30\n", + ")" ] }, { @@ -1290,56 +1640,145 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "JNLVmKKVKA6j" + }, "source": [ "## Second agent: PixelCopter 🚁\n", "\n", "### Study the PixelCopter environment 👀\n", "- [The Environment documentation](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html)\n" - ], - "metadata": { - "id": "JNLVmKKVKA6j" - } + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from ple.games.pixelcopter import Pixelcopter\n", + "from ple import PLE\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 35, "metadata": { "id": "JBSc8mlfyin3" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[119, None]" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "env_id = \"Pixelcopter-PLE-v0\"\n", - "env = gym.make(env_id)\n", - "eval_env = gym.make(env_id)\n", - "s_size = env.observation_space.shape[0]\n", - "a_size = env.action_space.n" + "# env_id = \"Pixelcopter-PLE-v0\"\n", + "# env = gym.make(env_id)\n", + "env = Pixelcopter()\n", + "p = PLE(env, fps=30, display_screen=True)\n", + "\n", + "p.init()\n", + "reward = 0.0\n", + "\n", + "actions = p.getActionSet()\n", + "# for i in range(10_000):\n", + " # if p.game_over():\n", + " # print(f\"{p.reset_game()}\")\n", + " # print(f\"{i: }\")\n", + "\n", + " # print(f\" - {np.array(list(env.getGameState().values()), dtype=np.float32)}\")\n", + " # reward = p.act(np.random.randint(0,2))\n", + " # print(f\" - {reward = }\")\n", + "actions" ] }, { "cell_type": "code", - "source": [ - "print(\"_____OBSERVATION SPACE_____ \\n\")\n", - "print(\"The State Space is: \", s_size)\n", - "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " - [24. 0. 7. 17. 48. 22. 31.]\n", + " - reward = 0.0\n" + ] + } ], + "source": [ + "if p.game_over():\n", + " print(f\"{p.reset_game()}\")\n", + "\n", + "print(f\" - {np.array(list(env.getGameState().values()), dtype=np.float32)}\")\n", + "reward = p.act(actions[0])\n", + "print(f\" - {reward = }\")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "s_size = len(env.getGameState())\n", + "a_size = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 10, "metadata": { "id": "L5u_zAHsKBy7" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "_____OBSERVATION SPACE_____ \n", + "\n", + "The State Space is: 7\n" + ] + } + ], + "source": [ + "print(\"_____OBSERVATION SPACE_____ \\n\")\n", + "print(\"The State Space is: \", s_size)\n", + "# print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" + ] }, { "cell_type": "code", - "source": [ - "print(\"\\n _____ACTION SPACE_____ \\n\")\n", - "print(\"The Action Space is: \", a_size)\n", - "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" - ], + "execution_count": 11, "metadata": { "id": "D7yJM9YXKNbq" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " _____ACTION SPACE_____ \n", + "\n", + "The Action Space is: 2\n" + ] + } + ], + "source": [ + "print(\"\\n _____ACTION SPACE_____ \\n\")\n", + "print(\"The Action Space is: \", a_size)\n", + "# print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" + ] }, { "cell_type": "markdown", @@ -1366,17 +1805,17 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "aV1466QP8crz" + }, "source": [ "### Define the new Policy 🧠\n", "- We need to have a deeper neural network since the environment is more complex" - ], - "metadata": { - "id": "aV1466QP8crz" - } + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "metadata": { "id": "I1eBkCiX2X_S" }, @@ -1386,9 +1825,20 @@ " def __init__(self, s_size, a_size, h_size):\n", " super(Policy, self).__init__()\n", " # Define the three layers here\n", + " self.fc1 = nn.Linear(s_size, h_size)\n", + " self.fc2 = nn.Linear(h_size, h_size*2)\n", + " self.fc3 = nn.Linear(h_size*2, a_size)\n", + " # self.fc4 = nn.Linear(h_size, a_size)\n", + " self.relu = nn.ReLU()\n", "\n", " def forward(self, x):\n", " # Define the forward process here\n", + " x = self.relu(self.fc1(x))\n", + " x = self.relu(self.fc2(x))\n", + " # x = self.relu(self.fc3(x))\n", + " # x = self.fc4(x) \n", + " x = self.fc3(x) \n", + "\n", " return F.softmax(x, dim=1)\n", " \n", " def act(self, state):\n", @@ -1401,15 +1851,20 @@ }, { "cell_type": "markdown", - "source": [ - "#### Solution" - ], "metadata": { "id": "47iuAFqV8Ws-" - } + }, + "source": [ + "#### Solution" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wrNuVcHC8Xu7" + }, + "outputs": [], "source": [ "class Policy(nn.Module):\n", " def __init__(self, s_size, a_size, h_size):\n", @@ -1430,12 +1885,7 @@ " m = Categorical(probs)\n", " action = m.sample()\n", " return action.item(), m.log_prob(action)" - ], - "metadata": { - "id": "wrNuVcHC8Xu7" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -1449,16 +1899,135 @@ ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": { - "id": "y0uujOR_ypB6" + "id": "wyvXTJWm9GJG" }, + "source": [ + "### Train it\n", + "- We're now ready to train our agent 🔥." + ] + }, + { + "cell_type": "code", + "execution_count": 359, + "metadata": {}, "outputs": [], "source": [ + "from collections import Counter\n", + "\n", + "def reinforce_pygame(game_p, policy, optimizer, n_training_episodes, max_t, gamma, print_every, max_pacience = None):\n", + " # Help us to calculate the score during the training\n", + " scores = deque(maxlen=100)\n", + " game_p.init()\n", + " actions_set = game_p.getActionSet()\n", + "\n", + " last_max = np.mean(scores)\n", + " pacience = 0\n", + " # Line 3 of pseudocode\n", + " for i_episode in range(1, n_training_episodes+1):\n", + " rewards, saved_log_probs, actions = [], [], []\n", + " game_p.reset_game()\n", + " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) # TODO: reset the environment\n", + "\n", + " # ========= Line 4 of pseudocode =========\n", + " for t in range(max_t):\n", + " action, log_prob = policy.act(state) # TODO get the action\n", + " action = actions_set[action]\n", + " actions.append(action)\n", + "\n", + " saved_log_probs.append(log_prob)\n", + " reward = game_p.act(action) # TODO: take an env step\n", + " rewards.append(reward)\n", + "\n", + " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n", + " if game_p.game_over():\n", + " break\n", + "\n", + " scores.append(sum(rewards))\n", + " \n", + " # ========= Line 6 of pseudocode: calculate the return =========\n", + " returns = deque(maxlen=max_t) \n", + " n_steps = len(rewards) \n", + " \n", + " \"\"\"# ================ EXPLANATION ================\n", + " # Compute the discounted returns at each timestep,\n", + " # as the sum of the gamma-discounted return at time t (G_t) + the reward at time t\n", + "\n", + " # In O(N) time, where N is the number of time steps\n", + " # (this definition of the discounted return G_t follows the definition of this quantity \n", + " # shown at page 44 of Sutton&Barto 2017 2nd draft)\n", + " # G_t = r_(t+1) + r_(t+2) + ...\n", + " \n", + " # Given this formulation, the returns at each timestep t can be computed \n", + " # by re-using the computed future returns G_(t+1) to compute the current return G_t\n", + " # G_t = r_(t+1) + gamma*G_(t+1)\n", + " # G_(t-1) = r_t + gamma* G_t\n", + " # (this follows a dynamic programming approach, with which we memorize solutions in order \n", + " # to avoid computing them multiple times)\n", + " \n", + " # This is correct since the above is equivalent to (see also page 46 of Sutton&Barto 2017 2nd draft)\n", + " # G_(t-1) = r_t + gamma*r_(t+1) + gamma*gamma*r_(t+2) + ...\n", + " \n", + " ## Given the above, we calculate the returns at timestep t as: \n", + " # gamma[t] * return[t] + reward[t]\n", + " #\n", + " ## We compute this starting from the last timestep to the first, in order\n", + " ## to employ the formula presented above and avoid redundant computations that would be needed \n", + " ## if we were to do it from first to last.\n", + " \n", + " ## Hence, the queue \"returns\" will hold the returns in chronological order, from t=0 to t=n_steps\n", + " ## thanks to the appendleft() function which allows to append to the position 0 in constant time O(1)\n", + " ## a normal python list would instead require O(N) to do this.\"\"\"\n", + " disc_return_t = 0\n", + " for t in range(n_steps)[::-1]:\n", + " returns.appendleft(disc_return_t * gamma + rewards[t]) \n", + " disc_return_t = returns[0]\n", + " \n", + " ## standardization of the returns is employed to make training more stable\n", + " eps = np.finfo(np.float32).eps.item()\n", + " \n", + " ## eps is the smallest representable float, which is \n", + " # added to the standard deviation of the returns to avoid numerical instabilities\n", + " returns = torch.tensor(returns)\n", + " returns = (returns - returns.mean()) / (returns.std() + eps)\n", + " \n", + " # ========= Line 7=========\n", + " policy_loss = []\n", + " for log_prob, disc_return in zip(saved_log_probs, returns):\n", + " policy_loss.append(-log_prob * disc_return)\n", + " policy_loss = torch.cat(policy_loss).sum()\n", + " \n", + " # ========= Line 8: PyTorch prefers gradient descent =========\n", + " optimizer.zero_grad()\n", + " policy_loss.backward()\n", + " optimizer.step()\n", + " \n", + " mean = np.mean(scores)\n", + " if i_episode % print_every == 0:\n", + " print(f'Episode {i_episode} Average Score: {mean:.2f}. Move count {Counter(actions)}')\n", + "\n", + " if last_max >= mean:\n", + " pacience += 1\n", + " if max_pacience is not None and pacience >= max_pacience:\n", + " print(' - Breaking at Episode {}\\t with average Score: {:.2f} for max pacience {:.2f}'.format(i_episode, mean, last_max))\n", + " break\n", + " else:\n", + " last_max, pacience = mean, 0\n", + " \n", + " return list(scores)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "env_id = \"Pixelcopter-PLE-v0\"\n", "pixelcopter_hyperparameters = {\n", " \"h_size\": 64,\n", - " \"n_training_episodes\": 50000,\n", + " \"n_training_episodes\": 100_000,\n", " \"n_evaluation_episodes\": 10,\n", " \"max_t\": 10000,\n", " \"gamma\": 0.99,\n", @@ -1466,74 +2035,135 @@ " \"env_id\": env_id,\n", " \"state_space\": s_size,\n", " \"action_space\": a_size,\n", - "}" + "}\n", + "\n", + "device = torch.device(\"cuda\")\n", + "# Create policy and place it to the device\n", + "# torch.manual_seed(50)\n", + "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n", + "pixelcopter_optimizer = optim.Adam(pixelcopter_policy.parameters(), lr=pixelcopter_hyperparameters[\"lr\"])\n", + "\n", + "env = Pixelcopter()\n", + "game_p = PLE(env, fps=30, display_screen=True)\n", + "\n", + "# scores = reinforce_pygame(\n", + "# game_p,\n", + "# pixelcopter_policy,\n", + "# pixelcopter_optimizer,\n", + "# pixelcopter_hyperparameters[\"n_training_episodes\"], \n", + "# pixelcopter_hyperparameters[\"max_t\"],\n", + "# pixelcopter_hyperparameters[\"gamma\"], \n", + "# 100,\n", + "# )" ] }, { "cell_type": "markdown", - "source": [ - "### Train it\n", - "- We're now ready to train our agent 🔥." - ], "metadata": { - "id": "wyvXTJWm9GJG" - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "7mM2P_ckysFE" + "id": "8kwFQ-Ip85BE" }, - "outputs": [], "source": [ - "# Create policy and place it to the device\n", - "# torch.manual_seed(50)\n", - "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n", - "pixelcopter_optimizer = optim.Adam(pixelcopter_policy.parameters(), lr=pixelcopter_hyperparameters[\"lr\"])" + "### Publish our trained model on the Hub 🔥" ] }, { "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "v1HEqP-fy-Rf" - }, + "execution_count": 363, + "metadata": {}, "outputs": [], "source": [ - "scores = reinforce(pixelcopter_policy,\n", - " pixelcopter_optimizer,\n", - " pixelcopter_hyperparameters[\"n_training_episodes\"], \n", - " pixelcopter_hyperparameters[\"max_t\"],\n", - " pixelcopter_hyperparameters[\"gamma\"], \n", - " 1000)" + "torch.save(pixelcopter_policy.state_dict(), \"./saved_model.pth\")" ] }, { - "cell_type": "markdown", - "source": [ - "### Publish our trained model on the Hub 🔥" + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } ], - "metadata": { - "id": "8kwFQ-Ip85BE" - } + "source": [ + "pixelcopter_hyperparameters = {\n", + " \"h_size\": 64,\n", + " \"n_training_episodes\": 100_000,\n", + " \"n_evaluation_episodes\": 10,\n", + " \"max_t\": 10000,\n", + " \"gamma\": 0.99,\n", + " \"lr\": 1e-4,\n", + " \"env_id\": env_id,\n", + " \"state_space\": s_size,\n", + " \"action_space\": a_size,\n", + "}\n", + "\n", + "# Create policy and place it to the device\n", + "# torch.manual_seed(50)\n", + "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n", + "\n", + "pixelcopter_policy.load_state_dict(torch.load(\"./saved_model.pth\"))\n" + ] }, { "cell_type": "code", - "source": [ - "repo_id = \"\" #TODO Define your repo id {username/Reinforce-{model-id}}\n", - "push_to_hub(repo_id,\n", - " pixelcopter_policy, # The model we want to save\n", - " pixelcopter_hyperparameters, # Hyperparameters\n", - " eval_env, # Evaluation environment\n", - " video_fps=30\n", - " )" - ], + "execution_count": 40, "metadata": { "id": "6PtB7LRbTKWK" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Episode: 0.0000, mean reward: 18.0000\n", + "VIDEO\n", + " - Teminated video loop, mimsave...\n", + "PUSH\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing Files (0 / 0) : | | 0.00B / 0.00B \n", + "\u001b[A\n", + "Processing Files (1 / 1) : 100%|██████████| 40.3kB / 40.3kB, ???B/s \n", + "\u001b[A\n", + "\u001b[A\n", + "Processing Files (1 / 1) : 100%|██████████| 40.3kB / 40.3kB, 0.00B/s \n", + "New Data Upload : | | 0.00B / 0.00B, 0.00B/s \n", + " /tmp/tmpec45byqc/model.pt : 100%|██████████| 40.3kB / 40.3kB \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your model is pushed to the Hub. You can view your model here: https://huggingface.co/turbo-maikol/Reinforce-rl-course-unit4-pixelcopter\n" + ] + } + ], + "source": [ + "repo_id = \"turbo-maikol/Reinforce-rl-course-unit4-pixelcopter\" #TODO Define your repo id {username/Reinforce-{model-id}}\n", + "\n", + "env = Pixelcopter()\n", + "game_p = PLE(env, fps=30, display_screen=True)\n", + "push_to_hub_pygame(\n", + " repo_id,\n", + " pixelcopter_policy, # The model we want to save\n", + " pixelcopter_hyperparameters, # Hyperparameters\n", + " env, # Evaluation environment\n", + " game_p,\n", + " video_fps=30,\n", + ")" + ] }, { "cell_type": "markdown", @@ -1585,8 +2215,6 @@ "metadata": { "accelerator": "GPU", "colab": { - "private_outputs": true, - "provenance": [], "collapsed_sections": [ "BPLwsPajb1f8", "L_WSo0VUV99t", @@ -1597,11 +2225,13 @@ "47iuAFqV8Ws-", "x62pP0PHdA-y" ], - "include_colab_link": true + "include_colab_link": true, + "private_outputs": true, + "provenance": [] }, "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": ".venv", "language": "python", "name": "python3" }, @@ -1615,7 +2245,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.18" } }, "nbformat": 4, diff --git a/notebooks/unit5/unit5.ipynb b/notebooks/unit5/unit5.ipynb index cb9ec8b..580dffc 100644 --- a/notebooks/unit5/unit5.ipynb +++ b/notebooks/unit5/unit5.ipynb @@ -277,7 +277,10 @@ "# Go inside the repository and install the package (can take 3min)\n", "%cd ml-agents\n", "!pip3 install -e ./ml-agents-envs\n", - "!pip3 install -e ./ml-agents" + "!pip3 install -e ./ml-agents\n", + "\n", + "uv pip install -e ./ml-agents/ml-agents-envs\n", + "uv pip install -e ./ml-agents/ml-agents\n" ] }, { @@ -584,7 +587,7 @@ }, "outputs": [], "source": [ - "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message" + "!mlagents-push-to-hf --run-id=\"SnowballTarget1\" --local-dir=\"./results/SnowballTarget1\" --repo-id=\"turbo-maikol/rl-course-unit5-snowball\" --commit-message=\"First Push\"" ] }, { @@ -796,7 +799,9 @@ }, "outputs": [], "source": [ - "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message" + "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message\n", + "\n", + "mlagents-push-to-hf --run-id=\"Pyramids Training\" --local-dir=\"./results/Pyramids Training\" --repo-id=\"turbo-maikol/rl-course-unit5-pyramids\" --commit-message=\"First Push\"\n" ] }, { diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb index e5d0081..2584c5a 100644 --- a/notebooks/unit6/unit6.ipynb +++ b/notebooks/unit6/unit6.ipynb @@ -1,31 +1,10 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "private_outputs": true, - "collapsed_sections": [ - "tF42HvI7-gs5" - ], - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - }, - "accelerator": "GPU", - "gpuClass": "standard" - }, "cells": [ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" @@ -33,6 +12,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "-PTReiOw-RAN" + }, "source": [ "# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with Panda-Gym 🤖\n", "\n", @@ -43,37 +25,37 @@ "- `Reach`: the robot must place its end-effector at a target position.\n", "\n", "After that, you'll be able **to train in other robotics tasks**.\n" - ], - "metadata": { - "id": "-PTReiOw-RAN" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "QInFitfWno1Q" + }, "source": [ "### 🎮 Environments:\n", "\n", "- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n", "\n", - "###📚 RL-Library:\n", + "### 📚 RL-Library:\n", "\n", "- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)" - ], - "metadata": { - "id": "QInFitfWno1Q" - } + ] }, { "cell_type": "markdown", - "source": [ - "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)." - ], "metadata": { "id": "2CcdX4g3oFlp" - } + }, + "source": [ + "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)." + ] }, { "cell_type": "markdown", + "metadata": { + "id": "MoubJX20oKaQ" + }, "source": [ "## Objectives of this notebook 🏆\n", "\n", @@ -85,13 +67,13 @@ "- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n", "\n", "\n" - ], - "metadata": { - "id": "MoubJX20oKaQ" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "DoUNkTExoUED" + }, "source": [ "## This notebook is from the Deep Reinforcement Learning Course\n", "\"Deep\n", @@ -108,34 +90,34 @@ "\n", "\n", "The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5" - ], - "metadata": { - "id": "DoUNkTExoUED" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "BTuQAUAPoa5E" + }, "source": [ "## Prerequisites 🏗️\n", "Before diving into the notebook, you need to:\n", "\n", "🔲 📚 Study [Actor-Critic methods by reading Unit 6](https://huggingface.co/deep-rl-course/unit6/introduction) 🤗 " - ], - "metadata": { - "id": "BTuQAUAPoa5E" - } + ] }, { "cell_type": "markdown", - "source": [ - "# Let's train our first robots 🤖" - ], "metadata": { "id": "iajHvVDWoo01" - } + }, + "source": [ + "# Let's train our first robots 🤖" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "zbOENTE2os_D" + }, "source": [ "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained model to the Hub and get the following results:\n", "\n", @@ -144,46 +126,43 @@ "To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n", "\n", "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" - ], - "metadata": { - "id": "zbOENTE2os_D" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "PU4FVzaoM6fC" + }, "source": [ "## Set the GPU 💪\n", "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "PU4FVzaoM6fC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "KV0NyFdQM9ZG" + }, "source": [ "- `Hardware Accelerator > GPU`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "KV0NyFdQM9ZG" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "bTpYcVZVMzUI" + }, "source": [ "## Create a virtual display 🔽\n", "\n", "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n", "\n", "Hence the following cell will install the librairies and create and run a virtual screen 🖥" - ], - "metadata": { - "id": "bTpYcVZVMzUI" - } + ] }, { "cell_type": "code", @@ -202,21 +181,24 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ww5PQH1gNLI4" + }, + "outputs": [], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" - ], - "metadata": { - "id": "ww5PQH1gNLI4" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "e1obkbdJ_KnG" + }, "source": [ "### Install dependencies 🔽\n", "\n", @@ -228,48 +210,60 @@ "- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n", "\n", "⏲ The installation can **take 10 minutes**." - ], - "metadata": { - "id": "e1obkbdJ_KnG" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install stable-baselines3[extra]\n", - "!pip install gymnasium" - ], + "execution_count": null, "metadata": { "id": "TgZUkjKYSgvn" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install stable-baselines3[extra]\n", + "!pip install gymnasium" + ] }, { "cell_type": "code", - "source": [ - "!pip install huggingface_sb3\n", - "!pip install huggingface_hub\n", - "!pip install panda_gym" - ], + "execution_count": null, "metadata": { "id": "ABneW6tOSpyU" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "uv pip install stable-baselines3[extra] gymnasium huggingface_sb3 huggingface_hub panda_gym" + ] }, { "cell_type": "markdown", - "source": [ - "## Import the packages 📦" - ], "metadata": { "id": "QTep3PQQABLr" - } + }, + "source": [ + "## Import the packages 📦" + ] }, { "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "HpiB8VdnQ7Bk" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], "source": [ + "%load_ext autoreload\n", + "%autoreload 2\n", + "\n", "import os\n", "\n", "import gymnasium as gym\n", @@ -283,15 +277,13 @@ "from stable_baselines3.common.env_util import make_vec_env\n", "\n", "from huggingface_hub import notebook_login" - ], - "metadata": { - "id": "HpiB8VdnQ7Bk" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "lfBwIS_oAVXI" + }, "source": [ "## PandaReachDense-v3 🦾\n", "\n", @@ -310,26 +302,38 @@ "\n", "This way **the training will be easier**.\n", "\n" - ], - "metadata": { - "id": "lfBwIS_oAVXI" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "frVXOrnlBerQ" + }, "source": [ "### Create the environment\n", "\n", "#### The environment 🎮\n", "\n", "In `PandaReachDense-v3` the robotic arm must place its end-effector at a target position (green ball)." - ], - "metadata": { - "id": "frVXOrnlBerQ" - } + ] }, { "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "zXzAu3HYF1WD" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n" + ] + } + ], "source": [ "env_id = \"PandaReachDense-v3\"\n", "\n", @@ -339,28 +343,47 @@ "# Get the state space and action space\n", "s_size = env.observation_space.shape\n", "a_size = env.action_space" - ], - "metadata": { - "id": "zXzAu3HYF1WD" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], "source": [ - "print(\"_____OBSERVATION SPACE_____ \\n\")\n", - "print(\"The State Space is: \", s_size)\n", - "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" - ], + "s_size = env.observation_space.sample()[\"observation\"].shape[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 17, "metadata": { "id": "E-U9dexcF-FB" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "_____OBSERVATION SPACE_____ \n", + "\n", + "The State Space is: 6\n", + "Sample observation OrderedDict([('achieved_goal', array([-5.6249957, 3.2377138, 9.631121 ], dtype=float32)), ('desired_goal', array([-5.9595466, 4.739131 , -3.3849702], dtype=float32)), ('observation', array([-3.4746149 , -1.6921669 , -9.1196995 , 1.4088092 , 0.84349155,\n", + " -9.425635 ], dtype=float32))])\n" + ] + } + ], + "source": [ + "print(\"_____OBSERVATION SPACE_____ \\n\")\n", + "print(\"The State Space is: \", s_size)\n", + "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "g_JClfElGFnF" + }, "source": [ "The observation space **is a dictionary with 3 different elements**:\n", "- `achieved_goal`: (x,y,z) the current position of the end-effector.\n", @@ -368,45 +391,57 @@ "- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n", "\n", "Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**." - ], - "metadata": { - "id": "g_JClfElGFnF" - } + ] }, { "cell_type": "code", + "execution_count": 18, + "metadata": { + "id": "ib1Kxy4AF-FC" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + " _____ACTION SPACE_____ \n", + "\n", + "The Action Space is: Box(-1.0, 1.0, (3,), float32)\n", + "Action Space Sample [-0.28385562 -0.9789819 -0.80975497]\n" + ] + } + ], "source": [ "print(\"\\n _____ACTION SPACE_____ \\n\")\n", "print(\"The Action Space is: \", a_size)\n", "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action" - ], - "metadata": { - "id": "ib1Kxy4AF-FC" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "5MHTHEHZS4yp" + }, "source": [ "The action space is a vector with 3 values:\n", "- Control x, y, z movement" - ], - "metadata": { - "id": "5MHTHEHZS4yp" - } + ] }, { "cell_type": "markdown", - "source": [ - "### Normalize observation and rewards" - ], "metadata": { "id": "S5sXcg469ysB" - } + }, + "source": [ + "### Normalize observation and rewards" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "1ZyX6qf3Zva9" + }, "source": [ "A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html).\n", "\n", @@ -415,140 +450,9205 @@ "We also normalize rewards with this same wrapper by adding `norm_reward = True`\n", "\n", "[You should check the documentation to fill this cell](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)" - ], - "metadata": { - "id": "1ZyX6qf3Zva9" - } + ] }, { "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "1RsDtHHAQ9Ie" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n" + ] + } + ], "source": [ "env = make_vec_env(env_id, n_envs=4)\n", "\n", "# Adding this wrapper to normalize the observation and the reward\n", - "env = # TODO: Add the wrapper" - ], - "metadata": { - "id": "1RsDtHHAQ9Ie" - }, - "execution_count": null, - "outputs": [] + "env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10)# TODO: Add the wrapper" + ] }, { "cell_type": "markdown", - "source": [ - "#### Solution" - ], "metadata": { "id": "tF42HvI7-gs5" - } + }, + "source": [ + "#### Solution" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2O67mqgC-hol" + }, + "outputs": [], "source": [ "env = make_vec_env(env_id, n_envs=4)\n", "\n", "env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)" - ], - "metadata": { - "id": "2O67mqgC-hol" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "4JmEVU6z1ZA-" + }, "source": [ "### Create the A2C Model 🤖\n", "\n", "For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n", "\n", "To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)." - ], - "metadata": { - "id": "4JmEVU6z1ZA-" - } + ] }, { "cell_type": "code", - "source": [ - "model = # Create the A2C model and try to find the best parameters" - ], + "execution_count": 26, "metadata": { "id": "vR3T4qFt164I" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using cuda device\n" + ] + } + ], + "source": [ + "model = A2C(\"MultiInputPolicy\", env, verbose=1) # Create the A2C model and try to find the best parameters" + ] }, { "cell_type": "markdown", - "source": [ - "#### Solution" - ], "metadata": { "id": "nWAuOOLh-oQf" - } + }, + "source": [ + "#### Solution" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FKFLY54T-pU1" + }, + "outputs": [], "source": [ "model = A2C(policy = \"MultiInputPolicy\",\n", " env = env,\n", " verbose=1)" - ], - "metadata": { - "id": "FKFLY54T-pU1" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "opyK3mpJ1-m9" + }, "source": [ "### Train the A2C agent 🏃\n", "- Let's train our agent for 1,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~25-40min" - ], - "metadata": { - "id": "opyK3mpJ1-m9" - } + ] }, { "cell_type": "code", - "source": [ - "model.learn(1_000_000)" - ], + "execution_count": 27, "metadata": { "id": "4TuGHZD7RF1G" }, - "execution_count": null, - "outputs": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 44.5 |\n", + "| ep_rew_mean | -12.4 |\n", + "| time/ | |\n", + "| fps | 313 |\n", + "| iterations | 100 |\n", + "| time_elapsed | 6 |\n", + "| total_timesteps | 2000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.22 |\n", + "| explained_variance | 0.9545538 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 99 |\n", + "| policy_loss | -0.349 |\n", + "| std | 0.988 |\n", + "| value_loss | 0.322 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 45.4 |\n", + "| ep_rew_mean | -13 |\n", + "| time/ | |\n", + "| fps | 316 |\n", + "| iterations | 200 |\n", + "| time_elapsed | 12 |\n", + "| total_timesteps | 4000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.97950953 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 199 |\n", + "| policy_loss | -1.26 |\n", + "| std | 0.998 |\n", + "| value_loss | 0.118 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 44.8 |\n", + "| ep_rew_mean | -13.4 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 300 |\n", + "| time_elapsed | 18 |\n", + "| total_timesteps | 6000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.9330401 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 299 |\n", + "| policy_loss | 0.0686 |\n", + "| std | 0.998 |\n", + "| value_loss | 0.38 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 43.2 |\n", + "| ep_rew_mean | -13.1 |\n", + "| time/ | |\n", + "| fps | 284 |\n", + "| iterations | 400 |\n", + "| time_elapsed | 28 |\n", + "| total_timesteps | 8000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.521664 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 399 |\n", + "| policy_loss | 0.192 |\n", + "| std | 0.999 |\n", + "| value_loss | 0.0753 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 43.8 |\n", + "| ep_rew_mean | -12.6 |\n", + "| time/ | |\n", + "| fps | 295 |\n", + "| iterations | 500 |\n", + "| time_elapsed | 33 |\n", + "| total_timesteps | 10000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.26 |\n", + "| explained_variance | 0.9645154 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 499 |\n", + "| policy_loss | 0.284 |\n", + "| std | 1 |\n", + "| value_loss | 0.0268 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 44.8 |\n", + "| ep_rew_mean | -12.5 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 600 |\n", + "| time_elapsed | 40 |\n", + "| total_timesteps | 12000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.27 |\n", + "| explained_variance | 0.9733915 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 599 |\n", + "| policy_loss | -0.31 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0512 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 43.1 |\n", + "| ep_rew_mean | -11.9 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 700 |\n", + "| time_elapsed | 46 |\n", + "| total_timesteps | 14000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.27 |\n", + "| explained_variance | 0.9599183 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 699 |\n", + "| policy_loss | -0.199 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0319 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 38.5 |\n", + "| ep_rew_mean | -9.69 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 800 |\n", + "| time_elapsed | 52 |\n", + "| total_timesteps | 16000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.26 |\n", + "| explained_variance | 0.9946942 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 799 |\n", + "| policy_loss | -0.0755 |\n", + "| std | 1 |\n", + "| value_loss | 0.0182 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 36.8 |\n", + "| ep_rew_mean | -8.98 |\n", + "| time/ | |\n", + "| fps | 289 |\n", + "| iterations | 900 |\n", + "| time_elapsed | 62 |\n", + "| total_timesteps | 18000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.29 |\n", + "| explained_variance | 0.9358697 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 899 |\n", + "| policy_loss | 0.139 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0267 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 39 |\n", + "| ep_rew_mean | -9.08 |\n", + "| time/ | |\n", + "| fps | 294 |\n", + "| iterations | 1000 |\n", + "| time_elapsed | 68 |\n", + "| total_timesteps | 20000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.3 |\n", + "| explained_variance | 0.6885923 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 999 |\n", + "| policy_loss | 0.508 |\n", + "| std | 1.02 |\n", + "| value_loss | 0.108 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 37 |\n", + "| ep_rew_mean | -8.26 |\n", + "| time/ | |\n", + "| fps | 295 |\n", + "| iterations | 1100 |\n", + "| time_elapsed | 74 |\n", + "| total_timesteps | 22000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.32 |\n", + "| explained_variance | 0.97540057 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1099 |\n", + "| policy_loss | -0.747 |\n", + "| std | 1.02 |\n", + "| value_loss | 0.0566 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 38.1 |\n", + "| ep_rew_mean | -8.9 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 1200 |\n", + "| time_elapsed | 80 |\n", + "| total_timesteps | 24000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.32 |\n", + "| explained_variance | 0.9357179 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1199 |\n", + "| policy_loss | -0.0341 |\n", + "| std | 1.02 |\n", + "| value_loss | 0.0184 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 40.9 |\n", + "| ep_rew_mean | -10.1 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 1300 |\n", + "| time_elapsed | 85 |\n", + "| total_timesteps | 26000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.9974439 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1299 |\n", + "| policy_loss | -0.35 |\n", + "| std | 1.03 |\n", + "| value_loss | 0.0159 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 39.5 |\n", + "| ep_rew_mean | -9.84 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 1400 |\n", + "| time_elapsed | 91 |\n", + "| total_timesteps | 28000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.99562174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1399 |\n", + "| policy_loss | -0.294 |\n", + "| std | 1.03 |\n", + "| value_loss | 0.0101 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 37.8 |\n", + "| ep_rew_mean | -8.64 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 1500 |\n", + "| time_elapsed | 100 |\n", + "| total_timesteps | 30000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.34 |\n", + "| explained_variance | 0.9823143 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1499 |\n", + "| policy_loss | 0.104 |\n", + "| std | 1.03 |\n", + "| value_loss | 0.0163 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 33.4 |\n", + "| ep_rew_mean | -6.76 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 1600 |\n", + "| time_elapsed | 105 |\n", + "| total_timesteps | 32000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.34 |\n", + "| explained_variance | 0.68713135 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1599 |\n", + "| policy_loss | -0.544 |\n", + "| std | 1.03 |\n", + "| value_loss | 0.0491 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 32.2 |\n", + "| ep_rew_mean | -5.76 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 1700 |\n", + "| time_elapsed | 110 |\n", + "| total_timesteps | 34000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.8737136 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1699 |\n", + "| policy_loss | -0.729 |\n", + "| std | 1.02 |\n", + "| value_loss | 0.0799 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 22.7 |\n", + "| ep_rew_mean | -3.62 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 1800 |\n", + "| time_elapsed | 116 |\n", + "| total_timesteps | 36000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.9746877 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1799 |\n", + "| policy_loss | -0.063 |\n", + "| std | 1.03 |\n", + "| value_loss | 0.00838 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 20.6 |\n", + "| ep_rew_mean | -2.93 |\n", + "| time/ | |\n", + "| fps | 309 |\n", + "| iterations | 1900 |\n", + "| time_elapsed | 122 |\n", + "| total_timesteps | 38000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.32 |\n", + "| explained_variance | 0.87348247 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1899 |\n", + "| policy_loss | -0.449 |\n", + "| std | 1.02 |\n", + "| value_loss | 0.0365 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 18.6 |\n", + "| ep_rew_mean | -2.43 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 2000 |\n", + "| time_elapsed | 132 |\n", + "| total_timesteps | 40000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.3 |\n", + "| explained_variance | 0.9903151 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1999 |\n", + "| policy_loss | -0.0678 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.00914 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 14.8 |\n", + "| ep_rew_mean | -1.72 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 2100 |\n", + "| time_elapsed | 139 |\n", + "| total_timesteps | 42000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.28 |\n", + "| explained_variance | -1.61925 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2099 |\n", + "| policy_loss | 1.93 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.395 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 8.37 |\n", + "| ep_rew_mean | -0.83 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 2200 |\n", + "| time_elapsed | 145 |\n", + "| total_timesteps | 44000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.24723881 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2199 |\n", + "| policy_loss | -0.415 |\n", + "| std | 0.998 |\n", + "| value_loss | 0.0231 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 6.63 |\n", + "| ep_rew_mean | -0.601 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 2300 |\n", + "| time_elapsed | 151 |\n", + "| total_timesteps | 46000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.18 |\n", + "| explained_variance | 0.5324587 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2299 |\n", + "| policy_loss | 0.143 |\n", + "| std | 0.977 |\n", + "| value_loss | 0.00811 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 4.89 |\n", + "| ep_rew_mean | -0.413 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 2400 |\n", + "| time_elapsed | 157 |\n", + "| total_timesteps | 48000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.14 |\n", + "| explained_variance | -0.40441775 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2399 |\n", + "| policy_loss | 0.153 |\n", + "| std | 0.962 |\n", + "| value_loss | 0.00267 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.79 |\n", + "| ep_rew_mean | -0.294 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 2500 |\n", + "| time_elapsed | 168 |\n", + "| total_timesteps | 50000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.11 |\n", + "| explained_variance | -0.5201168 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2499 |\n", + "| policy_loss | 0.119 |\n", + "| std | 0.955 |\n", + "| value_loss | 0.0067 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.5 |\n", + "| ep_rew_mean | -0.273 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 2600 |\n", + "| time_elapsed | 175 |\n", + "| total_timesteps | 52000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.06 |\n", + "| explained_variance | 0.44135898 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2599 |\n", + "| policy_loss | -0.151 |\n", + "| std | 0.938 |\n", + "| value_loss | 0.00198 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.36 |\n", + "| ep_rew_mean | -0.265 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 2700 |\n", + "| time_elapsed | 181 |\n", + "| total_timesteps | 54000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.22025794 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2699 |\n", + "| policy_loss | 0.0714 |\n", + "| std | 0.931 |\n", + "| value_loss | 0.00233 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.41 |\n", + "| ep_rew_mean | -0.266 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 2800 |\n", + "| time_elapsed | 188 |\n", + "| total_timesteps | 56000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.99 |\n", + "| explained_variance | 0.51552105 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2799 |\n", + "| policy_loss | -0.101 |\n", + "| std | 0.917 |\n", + "| value_loss | 0.0015 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.39 |\n", + "| ep_rew_mean | -0.275 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 2900 |\n", + "| time_elapsed | 195 |\n", + "| total_timesteps | 58000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.97 |\n", + "| explained_variance | 0.455292 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2899 |\n", + "| policy_loss | -0.0549 |\n", + "| std | 0.908 |\n", + "| value_loss | 0.0009 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.07 |\n", + "| ep_rew_mean | -0.243 |\n", + "| time/ | |\n", + "| fps | 292 |\n", + "| iterations | 3000 |\n", + "| time_elapsed | 205 |\n", + "| total_timesteps | 60000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.94 |\n", + "| explained_variance | 0.7100135 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2999 |\n", + "| policy_loss | 0.0327 |\n", + "| std | 0.9 |\n", + "| value_loss | 0.000537 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.12 |\n", + "| ep_rew_mean | -0.249 |\n", + "| time/ | |\n", + "| fps | 293 |\n", + "| iterations | 3100 |\n", + "| time_elapsed | 210 |\n", + "| total_timesteps | 62000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.9 |\n", + "| explained_variance | 0.5587412 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3099 |\n", + "| policy_loss | -0.0118 |\n", + "| std | 0.889 |\n", + "| value_loss | 0.000493 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.08 |\n", + "| ep_rew_mean | -0.247 |\n", + "| time/ | |\n", + "| fps | 295 |\n", + "| iterations | 3200 |\n", + "| time_elapsed | 216 |\n", + "| total_timesteps | 64000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.87 |\n", + "| explained_variance | 0.12363589 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3199 |\n", + "| policy_loss | 0.0663 |\n", + "| std | 0.878 |\n", + "| value_loss | 0.0016 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.44 |\n", + "| ep_rew_mean | -0.273 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 3300 |\n", + "| time_elapsed | 222 |\n", + "| total_timesteps | 66000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.84 |\n", + "| explained_variance | -0.74497736 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3299 |\n", + "| policy_loss | 0.111 |\n", + "| std | 0.873 |\n", + "| value_loss | 0.00291 |\n", + "---------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.25 |\n", + "| ep_rew_mean | -0.265 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 3400 |\n", + "| time_elapsed | 228 |\n", + "| total_timesteps | 68000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.8 |\n", + "| explained_variance | -0.30396366 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3399 |\n", + "| policy_loss | 0.216 |\n", + "| std | 0.86 |\n", + "| value_loss | 0.00472 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.13 |\n", + "| ep_rew_mean | -0.247 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 3500 |\n", + "| time_elapsed | 234 |\n", + "| total_timesteps | 70000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.78 |\n", + "| explained_variance | 0.67658997 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3499 |\n", + "| policy_loss | -0.0138 |\n", + "| std | 0.854 |\n", + "| value_loss | 0.00042 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.33 |\n", + "| ep_rew_mean | -0.263 |\n", + "| time/ | |\n", + "| fps | 294 |\n", + "| iterations | 3600 |\n", + "| time_elapsed | 244 |\n", + "| total_timesteps | 72000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.74 |\n", + "| explained_variance | -0.9163362 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3599 |\n", + "| policy_loss | 0.134 |\n", + "| std | 0.842 |\n", + "| value_loss | 0.00447 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.232 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 3700 |\n", + "| time_elapsed | 249 |\n", + "| total_timesteps | 74000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.7 |\n", + "| explained_variance | -0.2427007 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3699 |\n", + "| policy_loss | -0.0458 |\n", + "| std | 0.832 |\n", + "| value_loss | 0.000685 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 3800 |\n", + "| time_elapsed | 255 |\n", + "| total_timesteps | 76000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.69 |\n", + "| explained_variance | 0.70643455 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3799 |\n", + "| policy_loss | 0.087 |\n", + "| std | 0.827 |\n", + "| value_loss | 0.00111 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.92 |\n", + "| ep_rew_mean | -0.237 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 3900 |\n", + "| time_elapsed | 260 |\n", + "| total_timesteps | 78000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.64 |\n", + "| explained_variance | 0.3595901 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3899 |\n", + "| policy_loss | -0.207 |\n", + "| std | 0.815 |\n", + "| value_loss | 0.00451 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.01 |\n", + "| ep_rew_mean | -0.232 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 4000 |\n", + "| time_elapsed | 266 |\n", + "| total_timesteps | 80000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.62 |\n", + "| explained_variance | -0.26341498 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3999 |\n", + "| policy_loss | 0.0463 |\n", + "| std | 0.807 |\n", + "| value_loss | 0.00229 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 4100 |\n", + "| time_elapsed | 275 |\n", + "| total_timesteps | 82000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.6 |\n", + "| explained_variance | 0.66822636 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4099 |\n", + "| policy_loss | -0.00514 |\n", + "| std | 0.804 |\n", + "| value_loss | 0.000157 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.03 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 4200 |\n", + "| time_elapsed | 282 |\n", + "| total_timesteps | 84000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.56 |\n", + "| explained_variance | 0.62520474 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4199 |\n", + "| policy_loss | 0.0369 |\n", + "| std | 0.793 |\n", + "| value_loss | 0.00071 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.96 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 4300 |\n", + "| time_elapsed | 288 |\n", + "| total_timesteps | 86000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.53 |\n", + "| explained_variance | -0.7739824 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4299 |\n", + "| policy_loss | 0.0406 |\n", + "| std | 0.786 |\n", + "| value_loss | 0.00184 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.13 |\n", + "| ep_rew_mean | -0.251 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 4400 |\n", + "| time_elapsed | 294 |\n", + "| total_timesteps | 88000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.52 |\n", + "| explained_variance | 0.36605334 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4399 |\n", + "| policy_loss | -0.0104 |\n", + "| std | 0.784 |\n", + "| value_loss | 0.000911 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.94 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 4500 |\n", + "| time_elapsed | 300 |\n", + "| total_timesteps | 90000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | -1.494292 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4499 |\n", + "| policy_loss | 0.166 |\n", + "| std | 0.776 |\n", + "| value_loss | 0.00448 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 4600 |\n", + "| time_elapsed | 307 |\n", + "| total_timesteps | 92000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.86099774 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4599 |\n", + "| policy_loss | -0.00686 |\n", + "| std | 0.776 |\n", + "| value_loss | 0.000183 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.96 |\n", + "| ep_rew_mean | -0.236 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 4700 |\n", + "| time_elapsed | 317 |\n", + "| total_timesteps | 94000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.49 |\n", + "| explained_variance | 0.8097523 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4699 |\n", + "| policy_loss | -0.0231 |\n", + "| std | 0.775 |\n", + "| value_loss | 0.000184 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.241 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 4800 |\n", + "| time_elapsed | 323 |\n", + "| total_timesteps | 96000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.49 |\n", + "| explained_variance | 0.85981035 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4799 |\n", + "| policy_loss | 0.039 |\n", + "| std | 0.774 |\n", + "| value_loss | 0.000231 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.03 |\n", + "| ep_rew_mean | -0.232 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 4900 |\n", + "| time_elapsed | 328 |\n", + "| total_timesteps | 98000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.48 |\n", + "| explained_variance | 0.78069174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4899 |\n", + "| policy_loss | 0.0365 |\n", + "| std | 0.771 |\n", + "| value_loss | 0.000307 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 5000 |\n", + "| time_elapsed | 334 |\n", + "| total_timesteps | 100000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.46 |\n", + "| explained_variance | 0.9213156 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4999 |\n", + "| policy_loss | 0.0419 |\n", + "| std | 0.766 |\n", + "| value_loss | 0.000267 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.91 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 5100 |\n", + "| time_elapsed | 339 |\n", + "| total_timesteps | 102000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.43 |\n", + "| explained_variance | 0.39434808 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5099 |\n", + "| policy_loss | -0.036 |\n", + "| std | 0.759 |\n", + "| value_loss | 0.000629 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.95 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 5200 |\n", + "| time_elapsed | 350 |\n", + "| total_timesteps | 104000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.839466 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5199 |\n", + "| policy_loss | 0.0243 |\n", + "| std | 0.749 |\n", + "| value_loss | 0.000199 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.02 |\n", + "| ep_rew_mean | -0.238 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 5300 |\n", + "| time_elapsed | 356 |\n", + "| total_timesteps | 106000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | -1.5323973 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5299 |\n", + "| policy_loss | -0.0499 |\n", + "| std | 0.739 |\n", + "| value_loss | 0.0026 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.04 |\n", + "| ep_rew_mean | -0.24 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 5400 |\n", + "| time_elapsed | 362 |\n", + "| total_timesteps | 108000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | 0.73881704 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5399 |\n", + "| policy_loss | 0.0459 |\n", + "| std | 0.739 |\n", + "| value_loss | 0.000478 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.97 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 5500 |\n", + "| time_elapsed | 368 |\n", + "| total_timesteps | 110000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.32 |\n", + "| explained_variance | 0.8745833 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5499 |\n", + "| policy_loss | 0.0212 |\n", + "| std | 0.732 |\n", + "| value_loss | 0.000189 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.91 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 5600 |\n", + "| time_elapsed | 373 |\n", + "| total_timesteps | 112000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.31 |\n", + "| explained_variance | 0.44390965 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5599 |\n", + "| policy_loss | 0.04 |\n", + "| std | 0.729 |\n", + "| value_loss | 0.000846 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 5700 |\n", + "| time_elapsed | 383 |\n", + "| total_timesteps | 114000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.29 |\n", + "| explained_variance | 0.76370406 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5699 |\n", + "| policy_loss | 0.042 |\n", + "| std | 0.726 |\n", + "| value_loss | 0.000353 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.97 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 5800 |\n", + "| time_elapsed | 389 |\n", + "| total_timesteps | 116000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.48743385 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5799 |\n", + "| policy_loss | 0.0545 |\n", + "| std | 0.719 |\n", + "| value_loss | 0.000668 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.06 |\n", + "| ep_rew_mean | -0.24 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 5900 |\n", + "| time_elapsed | 395 |\n", + "| total_timesteps | 118000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.24 |\n", + "| explained_variance | 0.48620242 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5899 |\n", + "| policy_loss | -0.00115 |\n", + "| std | 0.713 |\n", + "| value_loss | 0.0011 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 6000 |\n", + "| time_elapsed | 402 |\n", + "| total_timesteps | 120000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.22 |\n", + "| explained_variance | 0.48468244 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5999 |\n", + "| policy_loss | -0.0515 |\n", + "| std | 0.708 |\n", + "| value_loss | 0.000484 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.96 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 6100 |\n", + "| time_elapsed | 408 |\n", + "| total_timesteps | 122000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.2 |\n", + "| explained_variance | 0.36996192 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6099 |\n", + "| policy_loss | 0.0359 |\n", + "| std | 0.704 |\n", + "| value_loss | 0.000894 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 6200 |\n", + "| time_elapsed | 414 |\n", + "| total_timesteps | 124000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.17 |\n", + "| explained_variance | 0.96674925 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6199 |\n", + "| policy_loss | 0.0178 |\n", + "| std | 0.696 |\n", + "| value_loss | 8.14e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 6300 |\n", + "| time_elapsed | 424 |\n", + "| total_timesteps | 126000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.16 |\n", + "| explained_variance | 0.15048164 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6299 |\n", + "| policy_loss | 0.0595 |\n", + "| std | 0.695 |\n", + "| value_loss | 0.000888 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.237 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 6400 |\n", + "| time_elapsed | 430 |\n", + "| total_timesteps | 128000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.15 |\n", + "| explained_variance | 0.840007 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6399 |\n", + "| policy_loss | 0.0199 |\n", + "| std | 0.693 |\n", + "| value_loss | 0.000293 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 6500 |\n", + "| time_elapsed | 435 |\n", + "| total_timesteps | 130000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.14 |\n", + "| explained_variance | 0.93121815 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6499 |\n", + "| policy_loss | -0.00323 |\n", + "| std | 0.689 |\n", + "| value_loss | 8.66e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 6600 |\n", + "| time_elapsed | 441 |\n", + "| total_timesteps | 132000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.1 |\n", + "| explained_variance | 0.86104846 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6599 |\n", + "| policy_loss | 0.0496 |\n", + "| std | 0.681 |\n", + "| value_loss | 0.000438 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 6700 |\n", + "| time_elapsed | 446 |\n", + "| total_timesteps | 134000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.09 |\n", + "| explained_variance | 0.90795654 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6699 |\n", + "| policy_loss | 0.017 |\n", + "| std | 0.678 |\n", + "| value_loss | 0.000259 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 6800 |\n", + "| time_elapsed | 456 |\n", + "| total_timesteps | 136000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.08 |\n", + "| explained_variance | 0.5615423 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6799 |\n", + "| policy_loss | -0.0315 |\n", + "| std | 0.677 |\n", + "| value_loss | 0.000951 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.239 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 6900 |\n", + "| time_elapsed | 462 |\n", + "| total_timesteps | 138000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.07 |\n", + "| explained_variance | 0.53915024 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6899 |\n", + "| policy_loss | -0.0635 |\n", + "| std | 0.673 |\n", + "| value_loss | 0.000951 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 7000 |\n", + "| time_elapsed | 468 |\n", + "| total_timesteps | 140000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.06 |\n", + "| explained_variance | 0.89674634 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6999 |\n", + "| policy_loss | -0.025 |\n", + "| std | 0.672 |\n", + "| value_loss | 0.000152 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 7100 |\n", + "| time_elapsed | 474 |\n", + "| total_timesteps | 142000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.03 |\n", + "| explained_variance | 0.78443396 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7099 |\n", + "| policy_loss | 0.0301 |\n", + "| std | 0.665 |\n", + "| value_loss | 0.00029 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 7200 |\n", + "| time_elapsed | 480 |\n", + "| total_timesteps | 144000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.01 |\n", + "| explained_variance | 0.8908392 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7199 |\n", + "| policy_loss | 0.0116 |\n", + "| std | 0.66 |\n", + "| value_loss | 9.44e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 7300 |\n", + "| time_elapsed | 486 |\n", + "| total_timesteps | 146000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.02 |\n", + "| explained_variance | 0.8968644 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7299 |\n", + "| policy_loss | 0.00186 |\n", + "| std | 0.663 |\n", + "| value_loss | 0.000115 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 7400 |\n", + "| time_elapsed | 495 |\n", + "| total_timesteps | 148000 |\n", + "| train/ | |\n", + "| entropy_loss | -3 |\n", + "| explained_variance | 0.9469945 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7399 |\n", + "| policy_loss | 0.0135 |\n", + "| std | 0.658 |\n", + "| value_loss | 6.55e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 7500 |\n", + "| time_elapsed | 501 |\n", + "| total_timesteps | 150000 |\n", + "| train/ | |\n", + "| entropy_loss | -3 |\n", + "| explained_variance | 0.95443714 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7499 |\n", + "| policy_loss | -0.0519 |\n", + "| std | 0.657 |\n", + "| value_loss | 0.000292 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 7600 |\n", + "| time_elapsed | 507 |\n", + "| total_timesteps | 152000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.96 |\n", + "| explained_variance | 0.89563817 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7599 |\n", + "| policy_loss | -0.0471 |\n", + "| std | 0.649 |\n", + "| value_loss | 0.000453 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 7700 |\n", + "| time_elapsed | 513 |\n", + "| total_timesteps | 154000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.95 |\n", + "| explained_variance | 0.379259 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7699 |\n", + "| policy_loss | -0.0699 |\n", + "| std | 0.645 |\n", + "| value_loss | 0.000852 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.06 |\n", + "| ep_rew_mean | -0.258 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 7800 |\n", + "| time_elapsed | 520 |\n", + "| total_timesteps | 156000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.92 |\n", + "| explained_variance | 0.9296672 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7799 |\n", + "| policy_loss | 0.0176 |\n", + "| std | 0.64 |\n", + "| value_loss | 0.000166 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 7900 |\n", + "| time_elapsed | 530 |\n", + "| total_timesteps | 158000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.91 |\n", + "| explained_variance | 0.87379307 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7899 |\n", + "| policy_loss | -0.0405 |\n", + "| std | 0.638 |\n", + "| value_loss | 0.000283 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.07 |\n", + "| ep_rew_mean | -0.249 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8000 |\n", + "| time_elapsed | 536 |\n", + "| total_timesteps | 160000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.9 |\n", + "| explained_variance | 0.4368084 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7999 |\n", + "| policy_loss | -0.00266 |\n", + "| std | 0.636 |\n", + "| value_loss | 0.000391 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8100 |\n", + "| time_elapsed | 542 |\n", + "| total_timesteps | 162000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.88 |\n", + "| explained_variance | 0.75877607 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8099 |\n", + "| policy_loss | -0.0239 |\n", + "| std | 0.633 |\n", + "| value_loss | 0.000364 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.96 |\n", + "| ep_rew_mean | -0.238 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8200 |\n", + "| time_elapsed | 548 |\n", + "| total_timesteps | 164000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.89 |\n", + "| explained_variance | 0.703933 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8199 |\n", + "| policy_loss | 0.00433 |\n", + "| std | 0.633 |\n", + "| value_loss | 0.000467 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 8300 |\n", + "| time_elapsed | 554 |\n", + "| total_timesteps | 166000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.89 |\n", + "| explained_variance | 0.7966901 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8299 |\n", + "| policy_loss | 0.0451 |\n", + "| std | 0.635 |\n", + "| value_loss | 0.00083 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 8400 |\n", + "| time_elapsed | 564 |\n", + "| total_timesteps | 168000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.88 |\n", + "| explained_variance | 0.7485693 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8399 |\n", + "| policy_loss | -0.00633 |\n", + "| std | 0.633 |\n", + "| value_loss | 0.000419 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 8500 |\n", + "| time_elapsed | 570 |\n", + "| total_timesteps | 170000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.88 |\n", + "| explained_variance | 0.7972623 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8499 |\n", + "| policy_loss | -0.0462 |\n", + "| std | 0.631 |\n", + "| value_loss | 0.000487 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8600 |\n", + "| time_elapsed | 576 |\n", + "| total_timesteps | 172000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.85 |\n", + "| explained_variance | 0.90221983 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8599 |\n", + "| policy_loss | 0.0116 |\n", + "| std | 0.625 |\n", + "| value_loss | 0.000125 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8700 |\n", + "| time_elapsed | 582 |\n", + "| total_timesteps | 174000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.81 |\n", + "| explained_variance | 0.98035014 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8699 |\n", + "| policy_loss | -9.69e-05 |\n", + "| std | 0.618 |\n", + "| value_loss | 2.93e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.01 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 8800 |\n", + "| time_elapsed | 588 |\n", + "| total_timesteps | 176000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.81 |\n", + "| explained_variance | 0.91983217 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8799 |\n", + "| policy_loss | -0.0214 |\n", + "| std | 0.618 |\n", + "| value_loss | 0.000142 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.02 |\n", + "| ep_rew_mean | -0.244 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 8900 |\n", + "| time_elapsed | 598 |\n", + "| total_timesteps | 178000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.8 |\n", + "| explained_variance | 0.8616807 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8899 |\n", + "| policy_loss | 0.0346 |\n", + "| std | 0.615 |\n", + "| value_loss | 0.000569 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9000 |\n", + "| time_elapsed | 604 |\n", + "| total_timesteps | 180000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.78 |\n", + "| explained_variance | 0.951248 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8999 |\n", + "| policy_loss | -0.0212 |\n", + "| std | 0.611 |\n", + "| value_loss | 7.19e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9100 |\n", + "| time_elapsed | 611 |\n", + "| total_timesteps | 182000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.75 |\n", + "| explained_variance | 0.9046812 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9099 |\n", + "| policy_loss | -0.0139 |\n", + "| std | 0.605 |\n", + "| value_loss | 0.000137 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9200 |\n", + "| time_elapsed | 617 |\n", + "| total_timesteps | 184000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.73 |\n", + "| explained_variance | 0.91447115 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9199 |\n", + "| policy_loss | 0.0295 |\n", + "| std | 0.602 |\n", + "| value_loss | 0.000232 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.56 |\n", + "| ep_rew_mean | -0.188 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9300 |\n", + "| time_elapsed | 624 |\n", + "| total_timesteps | 186000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.71 |\n", + "| explained_variance | 0.93495154 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9299 |\n", + "| policy_loss | -0.00335 |\n", + "| std | 0.597 |\n", + "| value_loss | 7.19e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 296 |\n", + "| iterations | 9400 |\n", + "| time_elapsed | 633 |\n", + "| total_timesteps | 188000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.72 |\n", + "| explained_variance | 0.8910692 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9399 |\n", + "| policy_loss | -0.021 |\n", + "| std | 0.599 |\n", + "| value_loss | 0.000176 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9500 |\n", + "| time_elapsed | 639 |\n", + "| total_timesteps | 190000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.71 |\n", + "| explained_variance | 0.8695343 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9499 |\n", + "| policy_loss | 0.00743 |\n", + "| std | 0.596 |\n", + "| value_loss | 0.000188 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 9600 |\n", + "| time_elapsed | 645 |\n", + "| total_timesteps | 192000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.7 |\n", + "| explained_variance | 0.9704885 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9599 |\n", + "| policy_loss | 0.0164 |\n", + "| std | 0.595 |\n", + "| value_loss | 0.000116 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 9700 |\n", + "| time_elapsed | 650 |\n", + "| total_timesteps | 194000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.67 |\n", + "| explained_variance | 0.9151263 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9699 |\n", + "| policy_loss | -0.00882 |\n", + "| std | 0.59 |\n", + "| value_loss | 0.000251 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 9800 |\n", + "| time_elapsed | 656 |\n", + "| total_timesteps | 196000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.66 |\n", + "| explained_variance | 0.9109387 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9799 |\n", + "| policy_loss | -0.0265 |\n", + "| std | 0.587 |\n", + "| value_loss | 0.000159 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 9900 |\n", + "| time_elapsed | 662 |\n", + "| total_timesteps | 198000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.63 |\n", + "| explained_variance | 0.90884244 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9899 |\n", + "| policy_loss | -0.0342 |\n", + "| std | 0.581 |\n", + "| value_loss | 0.000265 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3 |\n", + "| ep_rew_mean | -0.239 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10000 |\n", + "| time_elapsed | 672 |\n", + "| total_timesteps | 200000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.62 |\n", + "| explained_variance | 0.9498335 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9999 |\n", + "| policy_loss | 0.0115 |\n", + "| std | 0.58 |\n", + "| value_loss | 0.000148 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10100 |\n", + "| time_elapsed | 678 |\n", + "| total_timesteps | 202000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.61 |\n", + "| explained_variance | 0.9500687 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10099 |\n", + "| policy_loss | 0.0277 |\n", + "| std | 0.578 |\n", + "| value_loss | 0.000288 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.93 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10200 |\n", + "| time_elapsed | 684 |\n", + "| total_timesteps | 204000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.58 |\n", + "| explained_variance | 0.9149855 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10199 |\n", + "| policy_loss | 0.0532 |\n", + "| std | 0.571 |\n", + "| value_loss | 0.000604 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.35 |\n", + "| ep_rew_mean | -0.263 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 10300 |\n", + "| time_elapsed | 690 |\n", + "| total_timesteps | 206000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.57 |\n", + "| explained_variance | 0.42474955 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10299 |\n", + "| policy_loss | -0.212 |\n", + "| std | 0.571 |\n", + "| value_loss | 0.0399 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 10400 |\n", + "| time_elapsed | 696 |\n", + "| total_timesteps | 208000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.57 |\n", + "| explained_variance | 0.30399168 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10399 |\n", + "| policy_loss | 0.00621 |\n", + "| std | 0.57 |\n", + "| value_loss | 0.00118 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.95 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10500 |\n", + "| time_elapsed | 705 |\n", + "| total_timesteps | 210000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.56 |\n", + "| explained_variance | 0.8920195 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10499 |\n", + "| policy_loss | -0.0377 |\n", + "| std | 0.569 |\n", + "| value_loss | 0.000283 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.225 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10600 |\n", + "| time_elapsed | 711 |\n", + "| total_timesteps | 212000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.55 |\n", + "| explained_variance | 0.87574327 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10599 |\n", + "| policy_loss | 0.0187 |\n", + "| std | 0.566 |\n", + "| value_loss | 0.000248 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 10700 |\n", + "| time_elapsed | 718 |\n", + "| total_timesteps | 214000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.55 |\n", + "| explained_variance | 0.4303016 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10699 |\n", + "| policy_loss | -0.0253 |\n", + "| std | 0.566 |\n", + "| value_loss | 0.00045 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 10800 |\n", + "| time_elapsed | 724 |\n", + "| total_timesteps | 216000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.53 |\n", + "| explained_variance | 0.94789743 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10799 |\n", + "| policy_loss | 0.00158 |\n", + "| std | 0.564 |\n", + "| value_loss | 0.000105 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.91 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 10900 |\n", + "| time_elapsed | 730 |\n", + "| total_timesteps | 218000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.52 |\n", + "| explained_variance | 0.89148146 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10899 |\n", + "| policy_loss | -0.00433 |\n", + "| std | 0.561 |\n", + "| value_loss | 0.000233 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11000 |\n", + "| time_elapsed | 736 |\n", + "| total_timesteps | 220000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.52 |\n", + "| explained_variance | 0.6723757 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10999 |\n", + "| policy_loss | 0.0337 |\n", + "| std | 0.561 |\n", + "| value_loss | 0.000452 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 297 |\n", + "| iterations | 11100 |\n", + "| time_elapsed | 745 |\n", + "| total_timesteps | 222000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.51 |\n", + "| explained_variance | 0.9861437 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11099 |\n", + "| policy_loss | 0.00968 |\n", + "| std | 0.56 |\n", + "| value_loss | 6.46e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11200 |\n", + "| time_elapsed | 751 |\n", + "| total_timesteps | 224000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.52 |\n", + "| explained_variance | 0.9299464 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11199 |\n", + "| policy_loss | -0.00854 |\n", + "| std | 0.56 |\n", + "| value_loss | 8.38e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.96 |\n", + "| ep_rew_mean | -0.236 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11300 |\n", + "| time_elapsed | 757 |\n", + "| total_timesteps | 226000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.5 |\n", + "| explained_variance | 0.8100773 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11299 |\n", + "| policy_loss | 0.0223 |\n", + "| std | 0.558 |\n", + "| value_loss | 0.000133 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11400 |\n", + "| time_elapsed | 762 |\n", + "| total_timesteps | 228000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.48 |\n", + "| explained_variance | 0.9284025 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11399 |\n", + "| policy_loss | 0.0199 |\n", + "| std | 0.553 |\n", + "| value_loss | 0.000181 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 11500 |\n", + "| time_elapsed | 768 |\n", + "| total_timesteps | 230000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.46 |\n", + "| explained_variance | 0.91747606 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11499 |\n", + "| policy_loss | 0.00503 |\n", + "| std | 0.55 |\n", + "| value_loss | 9.29e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11600 |\n", + "| time_elapsed | 777 |\n", + "| total_timesteps | 232000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.44 |\n", + "| explained_variance | 0.94396347 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11599 |\n", + "| policy_loss | 0.0116 |\n", + "| std | 0.546 |\n", + "| value_loss | 0.000102 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 298 |\n", + "| iterations | 11700 |\n", + "| time_elapsed | 783 |\n", + "| total_timesteps | 234000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.41 |\n", + "| explained_variance | 0.90888155 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11699 |\n", + "| policy_loss | -0.00239 |\n", + "| std | 0.542 |\n", + "| value_loss | 0.00012 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 11800 |\n", + "| time_elapsed | 789 |\n", + "| total_timesteps | 236000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.41 |\n", + "| explained_variance | 0.8832614 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11799 |\n", + "| policy_loss | 0.00491 |\n", + "| std | 0.542 |\n", + "| value_loss | 0.000124 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 11900 |\n", + "| time_elapsed | 795 |\n", + "| total_timesteps | 238000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.41 |\n", + "| explained_variance | 0.74971235 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11899 |\n", + "| policy_loss | -0.0369 |\n", + "| std | 0.542 |\n", + "| value_loss | 0.000599 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 12000 |\n", + "| time_elapsed | 800 |\n", + "| total_timesteps | 240000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.41 |\n", + "| explained_variance | 0.9652594 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11999 |\n", + "| policy_loss | 0.0374 |\n", + "| std | 0.542 |\n", + "| value_loss | 0.00025 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 12100 |\n", + "| time_elapsed | 806 |\n", + "| total_timesteps | 242000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.4 |\n", + "| explained_variance | 0.7710167 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12099 |\n", + "| policy_loss | 0.0269 |\n", + "| std | 0.54 |\n", + "| value_loss | 0.000402 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 12200 |\n", + "| time_elapsed | 815 |\n", + "| total_timesteps | 244000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.39 |\n", + "| explained_variance | 0.91825235 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12199 |\n", + "| policy_loss | 0.0442 |\n", + "| std | 0.539 |\n", + "| value_loss | 0.000402 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.92 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 12300 |\n", + "| time_elapsed | 821 |\n", + "| total_timesteps | 246000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.39 |\n", + "| explained_variance | 0.9738185 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12299 |\n", + "| policy_loss | 0.0144 |\n", + "| std | 0.538 |\n", + "| value_loss | 9.16e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 299 |\n", + "| iterations | 12400 |\n", + "| time_elapsed | 826 |\n", + "| total_timesteps | 248000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.4 |\n", + "| explained_variance | 0.89304084 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12399 |\n", + "| policy_loss | 0.0163 |\n", + "| std | 0.539 |\n", + "| value_loss | 0.000238 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 12500 |\n", + "| time_elapsed | 832 |\n", + "| total_timesteps | 250000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.4 |\n", + "| explained_variance | 0.9213255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12499 |\n", + "| policy_loss | -0.00162 |\n", + "| std | 0.538 |\n", + "| value_loss | 7.67e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 12600 |\n", + "| time_elapsed | 837 |\n", + "| total_timesteps | 252000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.37 |\n", + "| explained_variance | 0.8798297 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12599 |\n", + "| policy_loss | -0.0144 |\n", + "| std | 0.534 |\n", + "| value_loss | 0.000193 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 12700 |\n", + "| time_elapsed | 843 |\n", + "| total_timesteps | 254000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.35 |\n", + "| explained_variance | 0.9373393 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12699 |\n", + "| policy_loss | -0.0111 |\n", + "| std | 0.53 |\n", + "| value_loss | 9.91e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 12800 |\n", + "| time_elapsed | 853 |\n", + "| total_timesteps | 256000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.33 |\n", + "| explained_variance | 0.9766309 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12799 |\n", + "| policy_loss | -0.0042 |\n", + "| std | 0.527 |\n", + "| value_loss | 7.05e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 12900 |\n", + "| time_elapsed | 858 |\n", + "| total_timesteps | 258000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.34 |\n", + "| explained_variance | 0.9444415 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12899 |\n", + "| policy_loss | 0.0116 |\n", + "| std | 0.528 |\n", + "| value_loss | 0.000126 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 13000 |\n", + "| time_elapsed | 864 |\n", + "| total_timesteps | 260000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.32 |\n", + "| explained_variance | 0.8382063 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12999 |\n", + "| policy_loss | -0.0395 |\n", + "| std | 0.524 |\n", + "| value_loss | 0.000643 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 13100 |\n", + "| time_elapsed | 870 |\n", + "| total_timesteps | 262000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.32 |\n", + "| explained_variance | 0.9722576 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13099 |\n", + "| policy_loss | 0.0103 |\n", + "| std | 0.525 |\n", + "| value_loss | 7.43e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 13200 |\n", + "| time_elapsed | 875 |\n", + "| total_timesteps | 264000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.3 |\n", + "| explained_variance | 0.9009595 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13199 |\n", + "| policy_loss | -0.00122 |\n", + "| std | 0.522 |\n", + "| value_loss | 9.14e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 13300 |\n", + "| time_elapsed | 885 |\n", + "| total_timesteps | 266000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.3 |\n", + "| explained_variance | 0.95411175 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13299 |\n", + "| policy_loss | -0.00844 |\n", + "| std | 0.522 |\n", + "| value_loss | 8.08e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.235 |\n", + "| time/ | |\n", + "| fps | 300 |\n", + "| iterations | 13400 |\n", + "| time_elapsed | 890 |\n", + "| total_timesteps | 268000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.3 |\n", + "| explained_variance | 0.9836456 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13399 |\n", + "| policy_loss | 0.0253 |\n", + "| std | 0.522 |\n", + "| value_loss | 0.000132 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 13500 |\n", + "| time_elapsed | 896 |\n", + "| total_timesteps | 270000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.3 |\n", + "| explained_variance | 0.9624939 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13499 |\n", + "| policy_loss | -0.000755 |\n", + "| std | 0.522 |\n", + "| value_loss | 7.38e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.202 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 13600 |\n", + "| time_elapsed | 901 |\n", + "| total_timesteps | 272000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.27 |\n", + "| explained_variance | 0.0788098 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13599 |\n", + "| policy_loss | -0.0734 |\n", + "| std | 0.516 |\n", + "| value_loss | 0.00111 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.91 |\n", + "| ep_rew_mean | -0.24 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 13700 |\n", + "| time_elapsed | 907 |\n", + "| total_timesteps | 274000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.25 |\n", + "| explained_variance | 0.94273585 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13699 |\n", + "| policy_loss | 0.00141 |\n", + "| std | 0.513 |\n", + "| value_loss | 0.000132 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 13800 |\n", + "| time_elapsed | 912 |\n", + "| total_timesteps | 276000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.25 |\n", + "| explained_variance | 0.8948201 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13799 |\n", + "| policy_loss | 0.0349 |\n", + "| std | 0.514 |\n", + "| value_loss | 0.000385 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 13900 |\n", + "| time_elapsed | 921 |\n", + "| total_timesteps | 278000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.25 |\n", + "| explained_variance | 0.69086885 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13899 |\n", + "| policy_loss | -0.014 |\n", + "| std | 0.513 |\n", + "| value_loss | 0.000472 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 301 |\n", + "| iterations | 14000 |\n", + "| time_elapsed | 927 |\n", + "| total_timesteps | 280000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.24 |\n", + "| explained_variance | 0.92939675 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13999 |\n", + "| policy_loss | 0.00778 |\n", + "| std | 0.511 |\n", + "| value_loss | 0.000158 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.91 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14100 |\n", + "| time_elapsed | 933 |\n", + "| total_timesteps | 282000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.23 |\n", + "| explained_variance | 0.97067803 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14099 |\n", + "| policy_loss | -0.00249 |\n", + "| std | 0.51 |\n", + "| value_loss | 5.35e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14200 |\n", + "| time_elapsed | 938 |\n", + "| total_timesteps | 284000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.23 |\n", + "| explained_variance | 0.9633729 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14199 |\n", + "| policy_loss | 0.000154 |\n", + "| std | 0.51 |\n", + "| value_loss | 5.12e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14300 |\n", + "| time_elapsed | 944 |\n", + "| total_timesteps | 286000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.23 |\n", + "| explained_variance | 0.97925717 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14299 |\n", + "| policy_loss | -0.0138 |\n", + "| std | 0.51 |\n", + "| value_loss | 8.23e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.97 |\n", + "| ep_rew_mean | -0.241 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14400 |\n", + "| time_elapsed | 950 |\n", + "| total_timesteps | 288000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.23 |\n", + "| explained_variance | 0.8687136 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14399 |\n", + "| policy_loss | -0.0248 |\n", + "| std | 0.511 |\n", + "| value_loss | 0.000259 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14500 |\n", + "| time_elapsed | 959 |\n", + "| total_timesteps | 290000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.21 |\n", + "| explained_variance | 0.98206425 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14499 |\n", + "| policy_loss | -0.0198 |\n", + "| std | 0.508 |\n", + "| value_loss | 0.000166 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14600 |\n", + "| time_elapsed | 965 |\n", + "| total_timesteps | 292000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.19 |\n", + "| explained_variance | 0.98284197 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14599 |\n", + "| policy_loss | 0.0095 |\n", + "| std | 0.505 |\n", + "| value_loss | 4.55e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 14700 |\n", + "| time_elapsed | 970 |\n", + "| total_timesteps | 294000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.16 |\n", + "| explained_variance | 0.7622324 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14699 |\n", + "| policy_loss | -0.0373 |\n", + "| std | 0.499 |\n", + "| value_loss | 0.000854 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 14800 |\n", + "| time_elapsed | 976 |\n", + "| total_timesteps | 296000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.15 |\n", + "| explained_variance | 0.94090515 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14799 |\n", + "| policy_loss | -0.0101 |\n", + "| std | 0.497 |\n", + "| value_loss | 0.000149 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.99 |\n", + "| ep_rew_mean | -0.242 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 14900 |\n", + "| time_elapsed | 982 |\n", + "| total_timesteps | 298000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.15 |\n", + "| explained_variance | 0.94472414 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14899 |\n", + "| policy_loss | 0.0115 |\n", + "| std | 0.498 |\n", + "| value_loss | 0.000171 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.04 |\n", + "| ep_rew_mean | -0.237 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15000 |\n", + "| time_elapsed | 988 |\n", + "| total_timesteps | 300000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.16 |\n", + "| explained_variance | 0.93526465 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14999 |\n", + "| policy_loss | 0.0374 |\n", + "| std | 0.499 |\n", + "| value_loss | 0.000519 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 302 |\n", + "| iterations | 15100 |\n", + "| time_elapsed | 997 |\n", + "| total_timesteps | 302000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.16 |\n", + "| explained_variance | 0.9759287 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15099 |\n", + "| policy_loss | 0.0122 |\n", + "| std | 0.499 |\n", + "| value_loss | 7.45e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.195 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15200 |\n", + "| time_elapsed | 1003 |\n", + "| total_timesteps | 304000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.14 |\n", + "| explained_variance | 0.96417016 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15199 |\n", + "| policy_loss | 0.0111 |\n", + "| std | 0.497 |\n", + "| value_loss | 7.27e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.225 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15300 |\n", + "| time_elapsed | 1009 |\n", + "| total_timesteps | 306000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.15 |\n", + "| explained_variance | 0.96453744 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15299 |\n", + "| policy_loss | 0.011 |\n", + "| std | 0.499 |\n", + "| value_loss | 0.00012 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15400 |\n", + "| time_elapsed | 1014 |\n", + "| total_timesteps | 308000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.13 |\n", + "| explained_variance | 0.6340892 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15399 |\n", + "| policy_loss | -0.0265 |\n", + "| std | 0.496 |\n", + "| value_loss | 0.000917 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.89 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15500 |\n", + "| time_elapsed | 1020 |\n", + "| total_timesteps | 310000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.12 |\n", + "| explained_variance | 0.9757865 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15499 |\n", + "| policy_loss | 0.00557 |\n", + "| std | 0.493 |\n", + "| value_loss | 5.41e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15600 |\n", + "| time_elapsed | 1029 |\n", + "| total_timesteps | 312000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.1 |\n", + "| explained_variance | 0.9816367 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15599 |\n", + "| policy_loss | -0.00155 |\n", + "| std | 0.49 |\n", + "| value_loss | 4.53e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.87 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15700 |\n", + "| time_elapsed | 1034 |\n", + "| total_timesteps | 314000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.07 |\n", + "| explained_variance | 0.59498584 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15699 |\n", + "| policy_loss | 0.00891 |\n", + "| std | 0.484 |\n", + "| value_loss | 0.0011 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 303 |\n", + "| iterations | 15800 |\n", + "| time_elapsed | 1039 |\n", + "| total_timesteps | 316000 |\n", + "| train/ | |\n", + "| entropy_loss | -2.04 |\n", + "| explained_variance | 0.9804266 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15799 |\n", + "| policy_loss | -0.0119 |\n", + "| std | 0.48 |\n", + "| value_loss | 6.58e-05 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 15900 |\n", + "| time_elapsed | 1045 |\n", + "| total_timesteps | 318000 |\n", + "| train/ | |\n", + "| entropy_loss | -2 |\n", + "| explained_variance | 0.974797 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15899 |\n", + "| policy_loss | -0.0222 |\n", + "| std | 0.475 |\n", + "| value_loss | 0.000213 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.227 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16000 |\n", + "| time_elapsed | 1050 |\n", + "| total_timesteps | 320000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.98 |\n", + "| explained_variance | 0.90541655 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15999 |\n", + "| policy_loss | 0.0202 |\n", + "| std | 0.47 |\n", + "| value_loss | 0.000274 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.232 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16100 |\n", + "| time_elapsed | 1056 |\n", + "| total_timesteps | 322000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.98 |\n", + "| explained_variance | 0.9099645 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16099 |\n", + "| policy_loss | 0.00779 |\n", + "| std | 0.471 |\n", + "| value_loss | 0.000138 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16200 |\n", + "| time_elapsed | 1065 |\n", + "| total_timesteps | 324000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.98 |\n", + "| explained_variance | 0.90357727 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16199 |\n", + "| policy_loss | -0.0167 |\n", + "| std | 0.471 |\n", + "| value_loss | 0.000208 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.93 |\n", + "| ep_rew_mean | -0.232 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16300 |\n", + "| time_elapsed | 1071 |\n", + "| total_timesteps | 326000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.97 |\n", + "| explained_variance | 0.87062764 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16299 |\n", + "| policy_loss | -0.00104 |\n", + "| std | 0.469 |\n", + "| value_loss | 0.000104 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16400 |\n", + "| time_elapsed | 1076 |\n", + "| total_timesteps | 328000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.95 |\n", + "| explained_variance | 0.96559095 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16399 |\n", + "| policy_loss | 0.00231 |\n", + "| std | 0.467 |\n", + "| value_loss | 5.08e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16500 |\n", + "| time_elapsed | 1082 |\n", + "| total_timesteps | 330000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.97 |\n", + "| explained_variance | 0.9584811 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16499 |\n", + "| policy_loss | 0.00624 |\n", + "| std | 0.469 |\n", + "| value_loss | 0.000136 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 16600 |\n", + "| time_elapsed | 1087 |\n", + "| total_timesteps | 332000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.95 |\n", + "| explained_variance | 0.9770625 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16599 |\n", + "| policy_loss | 0.00544 |\n", + "| std | 0.467 |\n", + "| value_loss | 6.65e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 16700 |\n", + "| time_elapsed | 1093 |\n", + "| total_timesteps | 334000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.95 |\n", + "| explained_variance | 0.63326836 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16699 |\n", + "| policy_loss | -0.0177 |\n", + "| std | 0.467 |\n", + "| value_loss | 0.00115 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 304 |\n", + "| iterations | 16800 |\n", + "| time_elapsed | 1102 |\n", + "| total_timesteps | 336000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.93 |\n", + "| explained_variance | 0.98614395 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16799 |\n", + "| policy_loss | 0.00795 |\n", + "| std | 0.463 |\n", + "| value_loss | 4.68e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 16900 |\n", + "| time_elapsed | 1108 |\n", + "| total_timesteps | 338000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.89 |\n", + "| explained_variance | 0.9440542 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16899 |\n", + "| policy_loss | -0.0238 |\n", + "| std | 0.458 |\n", + "| value_loss | 0.000362 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17000 |\n", + "| time_elapsed | 1113 |\n", + "| total_timesteps | 340000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.88 |\n", + "| explained_variance | 0.9288571 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16999 |\n", + "| policy_loss | 0.0118 |\n", + "| std | 0.456 |\n", + "| value_loss | 0.00026 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17100 |\n", + "| time_elapsed | 1119 |\n", + "| total_timesteps | 342000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.88 |\n", + "| explained_variance | 0.9744407 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17099 |\n", + "| policy_loss | -0.0129 |\n", + "| std | 0.455 |\n", + "| value_loss | 7.61e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17200 |\n", + "| time_elapsed | 1125 |\n", + "| total_timesteps | 344000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.87 |\n", + "| explained_variance | 0.9596539 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17199 |\n", + "| policy_loss | -0.0125 |\n", + "| std | 0.455 |\n", + "| value_loss | 0.000129 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 17300 |\n", + "| time_elapsed | 1130 |\n", + "| total_timesteps | 346000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.86 |\n", + "| explained_variance | 0.97371745 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17299 |\n", + "| policy_loss | 0.0135 |\n", + "| std | 0.453 |\n", + "| value_loss | 0.000125 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17400 |\n", + "| time_elapsed | 1139 |\n", + "| total_timesteps | 348000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.84 |\n", + "| explained_variance | 0.97407156 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17399 |\n", + "| policy_loss | -0.000319 |\n", + "| std | 0.451 |\n", + "| value_loss | 7.55e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.227 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17500 |\n", + "| time_elapsed | 1145 |\n", + "| total_timesteps | 350000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.83 |\n", + "| explained_variance | 0.9800369 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17499 |\n", + "| policy_loss | -0.00272 |\n", + "| std | 0.449 |\n", + "| value_loss | 7.61e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.57 |\n", + "| ep_rew_mean | -0.192 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17600 |\n", + "| time_elapsed | 1151 |\n", + "| total_timesteps | 352000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.83 |\n", + "| explained_variance | 0.9593339 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17599 |\n", + "| policy_loss | -0.0111 |\n", + "| std | 0.449 |\n", + "| value_loss | 9e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 17700 |\n", + "| time_elapsed | 1156 |\n", + "| total_timesteps | 354000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.81 |\n", + "| explained_variance | 0.9596555 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17699 |\n", + "| policy_loss | -0.00942 |\n", + "| std | 0.447 |\n", + "| value_loss | 6.95e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 17800 |\n", + "| time_elapsed | 1162 |\n", + "| total_timesteps | 356000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.81 |\n", + "| explained_variance | 0.98978764 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17799 |\n", + "| policy_loss | -0.00202 |\n", + "| std | 0.446 |\n", + "| value_loss | 3.37e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 17900 |\n", + "| time_elapsed | 1171 |\n", + "| total_timesteps | 358000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.81 |\n", + "| explained_variance | 0.7599305 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17899 |\n", + "| policy_loss | -0.0287 |\n", + "| std | 0.446 |\n", + "| value_loss | 0.000693 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 305 |\n", + "| iterations | 18000 |\n", + "| time_elapsed | 1177 |\n", + "| total_timesteps | 360000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.8 |\n", + "| explained_variance | 0.98177564 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17999 |\n", + "| policy_loss | 0.0091 |\n", + "| std | 0.446 |\n", + "| value_loss | 0.00011 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.202 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18100 |\n", + "| time_elapsed | 1182 |\n", + "| total_timesteps | 362000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.79 |\n", + "| explained_variance | 0.934681 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18099 |\n", + "| policy_loss | -0.0094 |\n", + "| std | 0.444 |\n", + "| value_loss | 0.000113 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18200 |\n", + "| time_elapsed | 1188 |\n", + "| total_timesteps | 364000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.79 |\n", + "| explained_variance | 0.95457464 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18199 |\n", + "| policy_loss | -0.00085 |\n", + "| std | 0.444 |\n", + "| value_loss | 6.5e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18300 |\n", + "| time_elapsed | 1194 |\n", + "| total_timesteps | 366000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.77 |\n", + "| explained_variance | 0.95278776 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18299 |\n", + "| policy_loss | -0.00984 |\n", + "| std | 0.44 |\n", + "| value_loss | 8.43e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18400 |\n", + "| time_elapsed | 1199 |\n", + "| total_timesteps | 368000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.76 |\n", + "| explained_variance | 0.94221777 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18399 |\n", + "| policy_loss | 0.000339 |\n", + "| std | 0.439 |\n", + "| value_loss | 0.000151 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18500 |\n", + "| time_elapsed | 1208 |\n", + "| total_timesteps | 370000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.75 |\n", + "| explained_variance | 0.80928534 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18499 |\n", + "| policy_loss | 0.00918 |\n", + "| std | 0.438 |\n", + "| value_loss | 0.000224 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18600 |\n", + "| time_elapsed | 1214 |\n", + "| total_timesteps | 372000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.74 |\n", + "| explained_variance | 0.9688626 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18599 |\n", + "| policy_loss | -8.22e-06 |\n", + "| std | 0.437 |\n", + "| value_loss | 0.000102 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18700 |\n", + "| time_elapsed | 1219 |\n", + "| total_timesteps | 374000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.72 |\n", + "| explained_variance | 0.9825928 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18699 |\n", + "| policy_loss | -0.00274 |\n", + "| std | 0.434 |\n", + "| value_loss | 5.74e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 18800 |\n", + "| time_elapsed | 1225 |\n", + "| total_timesteps | 376000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.7 |\n", + "| explained_variance | 0.9257292 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18799 |\n", + "| policy_loss | 0.0254 |\n", + "| std | 0.431 |\n", + "| value_loss | 0.000312 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 18900 |\n", + "| time_elapsed | 1230 |\n", + "| total_timesteps | 378000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.67 |\n", + "| explained_variance | 0.62272656 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18899 |\n", + "| policy_loss | 0.0324 |\n", + "| std | 0.428 |\n", + "| value_loss | 0.00136 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19000 |\n", + "| time_elapsed | 1236 |\n", + "| total_timesteps | 380000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.67 |\n", + "| explained_variance | 0.8762253 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18999 |\n", + "| policy_loss | 0.0126 |\n", + "| std | 0.427 |\n", + "| value_loss | 0.000196 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 306 |\n", + "| iterations | 19100 |\n", + "| time_elapsed | 1245 |\n", + "| total_timesteps | 382000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.67 |\n", + "| explained_variance | 0.9610008 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19099 |\n", + "| policy_loss | -0.00729 |\n", + "| std | 0.428 |\n", + "| value_loss | 9.97e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19200 |\n", + "| time_elapsed | 1250 |\n", + "| total_timesteps | 384000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.65 |\n", + "| explained_variance | 0.9789909 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19199 |\n", + "| policy_loss | 0.00905 |\n", + "| std | 0.425 |\n", + "| value_loss | 8.39e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19300 |\n", + "| time_elapsed | 1256 |\n", + "| total_timesteps | 386000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.65 |\n", + "| explained_variance | 0.9824363 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19299 |\n", + "| policy_loss | 0.0189 |\n", + "| std | 0.425 |\n", + "| value_loss | 0.000169 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19400 |\n", + "| time_elapsed | 1262 |\n", + "| total_timesteps | 388000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.64 |\n", + "| explained_variance | 0.86109364 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19399 |\n", + "| policy_loss | -0.0201 |\n", + "| std | 0.424 |\n", + "| value_loss | 0.000426 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19500 |\n", + "| time_elapsed | 1267 |\n", + "| total_timesteps | 390000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.63 |\n", + "| explained_variance | 0.70620143 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19499 |\n", + "| policy_loss | 0.0118 |\n", + "| std | 0.423 |\n", + "| value_loss | 0.000389 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19600 |\n", + "| time_elapsed | 1273 |\n", + "| total_timesteps | 392000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.61 |\n", + "| explained_variance | 0.39207745 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19599 |\n", + "| policy_loss | 0.0517 |\n", + "| std | 0.42 |\n", + "| value_loss | 0.00316 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19700 |\n", + "| time_elapsed | 1282 |\n", + "| total_timesteps | 394000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.6 |\n", + "| explained_variance | 0.96949595 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19699 |\n", + "| policy_loss | 0.0161 |\n", + "| std | 0.419 |\n", + "| value_loss | 0.000128 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19800 |\n", + "| time_elapsed | 1288 |\n", + "| total_timesteps | 396000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.59 |\n", + "| explained_variance | 0.9210366 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19799 |\n", + "| policy_loss | -0.0121 |\n", + "| std | 0.419 |\n", + "| value_loss | 0.000303 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.227 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 19900 |\n", + "| time_elapsed | 1294 |\n", + "| total_timesteps | 398000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.59 |\n", + "| explained_variance | 0.91853833 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19899 |\n", + "| policy_loss | -0.0153 |\n", + "| std | 0.42 |\n", + "| value_loss | 0.000214 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.64 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20000 |\n", + "| time_elapsed | 1300 |\n", + "| total_timesteps | 400000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.58 |\n", + "| explained_variance | 0.8020137 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19999 |\n", + "| policy_loss | -0.0282 |\n", + "| std | 0.418 |\n", + "| value_loss | 0.000849 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20100 |\n", + "| time_elapsed | 1306 |\n", + "| total_timesteps | 402000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.56 |\n", + "| explained_variance | 0.95373213 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20099 |\n", + "| policy_loss | 0.00944 |\n", + "| std | 0.416 |\n", + "| value_loss | 0.000117 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20200 |\n", + "| time_elapsed | 1315 |\n", + "| total_timesteps | 404000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.55 |\n", + "| explained_variance | 0.9797992 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20199 |\n", + "| policy_loss | -0.0116 |\n", + "| std | 0.414 |\n", + "| value_loss | 0.000106 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20300 |\n", + "| time_elapsed | 1320 |\n", + "| total_timesteps | 406000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.54 |\n", + "| explained_variance | 0.9802046 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20299 |\n", + "| policy_loss | -0.04 |\n", + "| std | 0.412 |\n", + "| value_loss | 0.000222 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.51 |\n", + "| ep_rew_mean | -0.185 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20400 |\n", + "| time_elapsed | 1326 |\n", + "| total_timesteps | 408000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.53 |\n", + "| explained_variance | 0.97018176 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20399 |\n", + "| policy_loss | -0.000202 |\n", + "| std | 0.412 |\n", + "| value_loss | 4.38e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.235 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20500 |\n", + "| time_elapsed | 1332 |\n", + "| total_timesteps | 410000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.5 |\n", + "| explained_variance | 0.957062 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20499 |\n", + "| policy_loss | -0.000156 |\n", + "| std | 0.407 |\n", + "| value_loss | 8.57e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20600 |\n", + "| time_elapsed | 1338 |\n", + "| total_timesteps | 412000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.49 |\n", + "| explained_variance | 0.94187564 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20599 |\n", + "| policy_loss | 0.00251 |\n", + "| std | 0.405 |\n", + "| value_loss | 8.7e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20700 |\n", + "| time_elapsed | 1345 |\n", + "| total_timesteps | 414000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.49 |\n", + "| explained_variance | 0.88869613 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20699 |\n", + "| policy_loss | 0.00114 |\n", + "| std | 0.407 |\n", + "| value_loss | 0.000132 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20800 |\n", + "| time_elapsed | 1354 |\n", + "| total_timesteps | 416000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.48 |\n", + "| explained_variance | 0.92743963 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20799 |\n", + "| policy_loss | 0.00954 |\n", + "| std | 0.406 |\n", + "| value_loss | 0.000317 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.93 |\n", + "| ep_rew_mean | -0.227 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 20900 |\n", + "| time_elapsed | 1360 |\n", + "| total_timesteps | 418000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.46 |\n", + "| explained_variance | 0.90706563 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20899 |\n", + "| policy_loss | 0.0194 |\n", + "| std | 0.403 |\n", + "| value_loss | 0.000414 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.02 |\n", + "| ep_rew_mean | -0.237 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 21000 |\n", + "| time_elapsed | 1364 |\n", + "| total_timesteps | 420000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.44 |\n", + "| explained_variance | -3.516026 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20999 |\n", + "| policy_loss | 0.0564 |\n", + "| std | 0.401 |\n", + "| value_loss | 0.0189 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21100 |\n", + "| time_elapsed | 1370 |\n", + "| total_timesteps | 422000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.42 |\n", + "| explained_variance | 0.9308317 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21099 |\n", + "| policy_loss | -0.00902 |\n", + "| std | 0.398 |\n", + "| value_loss | 0.000172 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21200 |\n", + "| time_elapsed | 1376 |\n", + "| total_timesteps | 424000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.4 |\n", + "| explained_variance | 0.9731085 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21199 |\n", + "| policy_loss | 0.00145 |\n", + "| std | 0.396 |\n", + "| value_loss | 6.53e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21300 |\n", + "| time_elapsed | 1381 |\n", + "| total_timesteps | 426000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.4 |\n", + "| explained_variance | 0.32687587 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21299 |\n", + "| policy_loss | -0.0387 |\n", + "| std | 0.396 |\n", + "| value_loss | 0.0013 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 21400 |\n", + "| time_elapsed | 1391 |\n", + "| total_timesteps | 428000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.39 |\n", + "| explained_variance | 0.9532345 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21399 |\n", + "| policy_loss | -0.00503 |\n", + "| std | 0.395 |\n", + "| value_loss | 9.31e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 307 |\n", + "| iterations | 21500 |\n", + "| time_elapsed | 1396 |\n", + "| total_timesteps | 430000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.4 |\n", + "| explained_variance | 0.87219936 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21499 |\n", + "| policy_loss | -0.00404 |\n", + "| std | 0.396 |\n", + "| value_loss | 0.000149 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21600 |\n", + "| time_elapsed | 1401 |\n", + "| total_timesteps | 432000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.38 |\n", + "| explained_variance | 0.9910213 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21599 |\n", + "| policy_loss | 0.00199 |\n", + "| std | 0.393 |\n", + "| value_loss | 3.43e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21700 |\n", + "| time_elapsed | 1406 |\n", + "| total_timesteps | 434000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.37 |\n", + "| explained_variance | 0.9567144 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21699 |\n", + "| policy_loss | -0.0111 |\n", + "| std | 0.393 |\n", + "| value_loss | 0.000193 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 21800 |\n", + "| time_elapsed | 1411 |\n", + "| total_timesteps | 436000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.38 |\n", + "| explained_variance | 0.97068435 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21799 |\n", + "| policy_loss | 0.00666 |\n", + "| std | 0.393 |\n", + "| value_loss | 0.000148 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 309 |\n", + "| iterations | 21900 |\n", + "| time_elapsed | 1417 |\n", + "| total_timesteps | 438000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.34 |\n", + "| explained_variance | 0.9512937 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21899 |\n", + "| policy_loss | 0.00756 |\n", + "| std | 0.388 |\n", + "| value_loss | 0.00022 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.241 |\n", + "| time/ | |\n", + "| fps | 308 |\n", + "| iterations | 22000 |\n", + "| time_elapsed | 1424 |\n", + "| total_timesteps | 440000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.33 |\n", + "| explained_variance | 0.9791408 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21999 |\n", + "| policy_loss | -0.00404 |\n", + "| std | 0.388 |\n", + "| value_loss | 8.91e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 309 |\n", + "| iterations | 22100 |\n", + "| time_elapsed | 1429 |\n", + "| total_timesteps | 442000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.33 |\n", + "| explained_variance | 0.9878471 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22099 |\n", + "| policy_loss | 0.00572 |\n", + "| std | 0.388 |\n", + "| value_loss | 3.05e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 309 |\n", + "| iterations | 22200 |\n", + "| time_elapsed | 1434 |\n", + "| total_timesteps | 444000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.32 |\n", + "| explained_variance | 0.9775455 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22199 |\n", + "| policy_loss | -0.0153 |\n", + "| std | 0.386 |\n", + "| value_loss | 0.000116 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 309 |\n", + "| iterations | 22300 |\n", + "| time_elapsed | 1438 |\n", + "| total_timesteps | 446000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.32 |\n", + "| explained_variance | 0.9002794 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22299 |\n", + "| policy_loss | -0.0161 |\n", + "| std | 0.386 |\n", + "| value_loss | 0.000522 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 310 |\n", + "| iterations | 22400 |\n", + "| time_elapsed | 1442 |\n", + "| total_timesteps | 448000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.3 |\n", + "| explained_variance | 0.9801052 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22399 |\n", + "| policy_loss | -0.00323 |\n", + "| std | 0.385 |\n", + "| value_loss | 6.21e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.14 |\n", + "| ep_rew_mean | -0.249 |\n", + "| time/ | |\n", + "| fps | 311 |\n", + "| iterations | 22500 |\n", + "| time_elapsed | 1446 |\n", + "| total_timesteps | 450000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.29 |\n", + "| explained_variance | 0.94702476 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22499 |\n", + "| policy_loss | 0.00382 |\n", + "| std | 0.384 |\n", + "| value_loss | 0.00018 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 311 |\n", + "| iterations | 22600 |\n", + "| time_elapsed | 1450 |\n", + "| total_timesteps | 452000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.27 |\n", + "| explained_variance | 0.857116 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22599 |\n", + "| policy_loss | 0.0067 |\n", + "| std | 0.382 |\n", + "| value_loss | 0.000226 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 311 |\n", + "| iterations | 22700 |\n", + "| time_elapsed | 1458 |\n", + "| total_timesteps | 454000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.26 |\n", + "| explained_variance | 0.9482857 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22699 |\n", + "| policy_loss | 0.00906 |\n", + "| std | 0.381 |\n", + "| value_loss | 0.000168 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.225 |\n", + "| time/ | |\n", + "| fps | 311 |\n", + "| iterations | 22800 |\n", + "| time_elapsed | 1462 |\n", + "| total_timesteps | 456000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.25 |\n", + "| explained_variance | 0.97008514 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22799 |\n", + "| policy_loss | 0.015 |\n", + "| std | 0.381 |\n", + "| value_loss | 0.000137 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 312 |\n", + "| iterations | 22900 |\n", + "| time_elapsed | 1466 |\n", + "| total_timesteps | 458000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.24 |\n", + "| explained_variance | 0.94469047 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22899 |\n", + "| policy_loss | -0.0147 |\n", + "| std | 0.379 |\n", + "| value_loss | 0.000413 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 312 |\n", + "| iterations | 23000 |\n", + "| time_elapsed | 1470 |\n", + "| total_timesteps | 460000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.23 |\n", + "| explained_variance | 0.9643724 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22999 |\n", + "| policy_loss | -0.0134 |\n", + "| std | 0.377 |\n", + "| value_loss | 0.000259 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 313 |\n", + "| iterations | 23100 |\n", + "| time_elapsed | 1473 |\n", + "| total_timesteps | 462000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.23 |\n", + "| explained_variance | 0.9723082 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23099 |\n", + "| policy_loss | -0.00796 |\n", + "| std | 0.376 |\n", + "| value_loss | 0.000132 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.227 |\n", + "| time/ | |\n", + "| fps | 314 |\n", + "| iterations | 23200 |\n", + "| time_elapsed | 1477 |\n", + "| total_timesteps | 464000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.22 |\n", + "| explained_variance | 0.9105156 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23199 |\n", + "| policy_loss | 0.00883 |\n", + "| std | 0.375 |\n", + "| value_loss | 0.00018 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 314 |\n", + "| iterations | 23300 |\n", + "| time_elapsed | 1481 |\n", + "| total_timesteps | 466000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.21 |\n", + "| explained_variance | 0.9802915 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23299 |\n", + "| policy_loss | 0.00295 |\n", + "| std | 0.374 |\n", + "| value_loss | 6.99e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.58 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 314 |\n", + "| iterations | 23400 |\n", + "| time_elapsed | 1485 |\n", + "| total_timesteps | 468000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.21 |\n", + "| explained_variance | 0.94888604 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23399 |\n", + "| policy_loss | 0.00301 |\n", + "| std | 0.375 |\n", + "| value_loss | 0.000121 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 315 |\n", + "| iterations | 23500 |\n", + "| time_elapsed | 1489 |\n", + "| total_timesteps | 470000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.21 |\n", + "| explained_variance | 0.9344809 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23499 |\n", + "| policy_loss | 0.0167 |\n", + "| std | 0.375 |\n", + "| value_loss | 0.000243 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 315 |\n", + "| iterations | 23600 |\n", + "| time_elapsed | 1497 |\n", + "| total_timesteps | 472000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.2 |\n", + "| explained_variance | 0.9716975 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23599 |\n", + "| policy_loss | 0.0191 |\n", + "| std | 0.373 |\n", + "| value_loss | 0.000384 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 315 |\n", + "| iterations | 23700 |\n", + "| time_elapsed | 1501 |\n", + "| total_timesteps | 474000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.19 |\n", + "| explained_variance | 0.9716921 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23699 |\n", + "| policy_loss | 0.00755 |\n", + "| std | 0.372 |\n", + "| value_loss | 7.3e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 316 |\n", + "| iterations | 23800 |\n", + "| time_elapsed | 1505 |\n", + "| total_timesteps | 476000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.2 |\n", + "| explained_variance | 0.90949667 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23799 |\n", + "| policy_loss | 0.0125 |\n", + "| std | 0.375 |\n", + "| value_loss | 0.000263 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.95 |\n", + "| ep_rew_mean | -0.242 |\n", + "| time/ | |\n", + "| fps | 316 |\n", + "| iterations | 23900 |\n", + "| time_elapsed | 1509 |\n", + "| total_timesteps | 478000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.18 |\n", + "| explained_variance | 0.96989167 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23899 |\n", + "| policy_loss | -0.00168 |\n", + "| std | 0.372 |\n", + "| value_loss | 0.000114 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 317 |\n", + "| iterations | 24000 |\n", + "| time_elapsed | 1513 |\n", + "| total_timesteps | 480000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.17 |\n", + "| explained_variance | 0.95877844 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23999 |\n", + "| policy_loss | -0.00787 |\n", + "| std | 0.37 |\n", + "| value_loss | 0.000112 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 317 |\n", + "| iterations | 24100 |\n", + "| time_elapsed | 1517 |\n", + "| total_timesteps | 482000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.14 |\n", + "| explained_variance | 0.95311856 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24099 |\n", + "| policy_loss | 0.00426 |\n", + "| std | 0.366 |\n", + "| value_loss | 0.000191 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 318 |\n", + "| iterations | 24200 |\n", + "| time_elapsed | 1521 |\n", + "| total_timesteps | 484000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.15 |\n", + "| explained_variance | 0.8985226 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24199 |\n", + "| policy_loss | 0.0182 |\n", + "| std | 0.367 |\n", + "| value_loss | 0.000501 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 318 |\n", + "| iterations | 24300 |\n", + "| time_elapsed | 1525 |\n", + "| total_timesteps | 486000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.12 |\n", + "| explained_variance | 0.8981765 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24299 |\n", + "| policy_loss | -0.00977 |\n", + "| std | 0.365 |\n", + "| value_loss | 0.000423 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 318 |\n", + "| iterations | 24400 |\n", + "| time_elapsed | 1533 |\n", + "| total_timesteps | 488000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.12 |\n", + "| explained_variance | 0.81411386 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24399 |\n", + "| policy_loss | -0.00471 |\n", + "| std | 0.364 |\n", + "| value_loss | 0.000483 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 318 |\n", + "| iterations | 24500 |\n", + "| time_elapsed | 1536 |\n", + "| total_timesteps | 490000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.11 |\n", + "| explained_variance | 0.97057235 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24499 |\n", + "| policy_loss | 0.000997 |\n", + "| std | 0.363 |\n", + "| value_loss | 8.4e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 319 |\n", + "| iterations | 24600 |\n", + "| time_elapsed | 1540 |\n", + "| total_timesteps | 492000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.1 |\n", + "| explained_variance | 0.9207422 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24599 |\n", + "| policy_loss | 0.00211 |\n", + "| std | 0.362 |\n", + "| value_loss | 0.000173 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 319 |\n", + "| iterations | 24700 |\n", + "| time_elapsed | 1545 |\n", + "| total_timesteps | 494000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.11 |\n", + "| explained_variance | 0.7847704 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24699 |\n", + "| policy_loss | -0.0204 |\n", + "| std | 0.363 |\n", + "| value_loss | 0.0011 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 24800 |\n", + "| time_elapsed | 1549 |\n", + "| total_timesteps | 496000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.09 |\n", + "| explained_variance | 0.53824604 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24799 |\n", + "| policy_loss | -0.00417 |\n", + "| std | 0.363 |\n", + "| value_loss | 0.00134 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 24900 |\n", + "| time_elapsed | 1553 |\n", + "| total_timesteps | 498000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.08 |\n", + "| explained_variance | 0.96404845 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24899 |\n", + "| policy_loss | 0.00329 |\n", + "| std | 0.361 |\n", + "| value_loss | 0.000237 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 25000 |\n", + "| time_elapsed | 1557 |\n", + "| total_timesteps | 500000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.06 |\n", + "| explained_variance | 0.97420067 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24999 |\n", + "| policy_loss | 0.00337 |\n", + "| std | 0.358 |\n", + "| value_loss | 7.87e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.89 |\n", + "| ep_rew_mean | -0.241 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 25100 |\n", + "| time_elapsed | 1565 |\n", + "| total_timesteps | 502000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.05 |\n", + "| explained_variance | 0.9236581 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25099 |\n", + "| policy_loss | -0.0275 |\n", + "| std | 0.356 |\n", + "| value_loss | 0.00045 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 25200 |\n", + "| time_elapsed | 1569 |\n", + "| total_timesteps | 504000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.03 |\n", + "| explained_variance | 0.94585973 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25199 |\n", + "| policy_loss | 0.00344 |\n", + "| std | 0.355 |\n", + "| value_loss | 0.000152 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 25300 |\n", + "| time_elapsed | 1574 |\n", + "| total_timesteps | 506000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.02 |\n", + "| explained_variance | 0.9588999 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25299 |\n", + "| policy_loss | -0.00463 |\n", + "| std | 0.354 |\n", + "| value_loss | 0.000145 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 25400 |\n", + "| time_elapsed | 1578 |\n", + "| total_timesteps | 508000 |\n", + "| train/ | |\n", + "| entropy_loss | -1.03 |\n", + "| explained_variance | 0.98241514 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25399 |\n", + "| policy_loss | -0.0109 |\n", + "| std | 0.354 |\n", + "| value_loss | 7.51e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 25500 |\n", + "| time_elapsed | 1583 |\n", + "| total_timesteps | 510000 |\n", + "| train/ | |\n", + "| entropy_loss | -1 |\n", + "| explained_variance | 0.9718913 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25499 |\n", + "| policy_loss | -0.00206 |\n", + "| std | 0.351 |\n", + "| value_loss | 6.36e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 25600 |\n", + "| time_elapsed | 1589 |\n", + "| total_timesteps | 512000 |\n", + "| train/ | |\n", + "| entropy_loss | -1 |\n", + "| explained_variance | 0.9023121 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25599 |\n", + "| policy_loss | -0.00161 |\n", + "| std | 0.351 |\n", + "| value_loss | 0.000443 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.89 |\n", + "| ep_rew_mean | -0.238 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 25700 |\n", + "| time_elapsed | 1595 |\n", + "| total_timesteps | 514000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.978 |\n", + "| explained_variance | 0.93654746 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25699 |\n", + "| policy_loss | -0.00729 |\n", + "| std | 0.348 |\n", + "| value_loss | 0.000471 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.202 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 25800 |\n", + "| time_elapsed | 1605 |\n", + "| total_timesteps | 516000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.987 |\n", + "| explained_variance | 0.97084665 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25799 |\n", + "| policy_loss | 0.000631 |\n", + "| std | 0.349 |\n", + "| value_loss | 0.000164 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 25900 |\n", + "| time_elapsed | 1611 |\n", + "| total_timesteps | 518000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.972 |\n", + "| explained_variance | 0.97883755 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25899 |\n", + "| policy_loss | 0.00163 |\n", + "| std | 0.347 |\n", + "| value_loss | 8.23e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 26000 |\n", + "| time_elapsed | 1617 |\n", + "| total_timesteps | 520000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.952 |\n", + "| explained_variance | 0.9676675 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25999 |\n", + "| policy_loss | 0.00262 |\n", + "| std | 0.345 |\n", + "| value_loss | 0.000103 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 26100 |\n", + "| time_elapsed | 1623 |\n", + "| total_timesteps | 522000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.943 |\n", + "| explained_variance | 0.97848845 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26099 |\n", + "| policy_loss | -0.0121 |\n", + "| std | 0.344 |\n", + "| value_loss | 9.36e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 26200 |\n", + "| time_elapsed | 1629 |\n", + "| total_timesteps | 524000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.929 |\n", + "| explained_variance | 0.9451687 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26199 |\n", + "| policy_loss | -0.000101 |\n", + "| std | 0.343 |\n", + "| value_loss | 0.000122 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26300 |\n", + "| time_elapsed | 1639 |\n", + "| total_timesteps | 526000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.919 |\n", + "| explained_variance | 0.98636997 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26299 |\n", + "| policy_loss | -0.00152 |\n", + "| std | 0.341 |\n", + "| value_loss | 3.07e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26400 |\n", + "| time_elapsed | 1645 |\n", + "| total_timesteps | 528000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.914 |\n", + "| explained_variance | 0.97304374 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26399 |\n", + "| policy_loss | -0.00223 |\n", + "| std | 0.341 |\n", + "| value_loss | 9.55e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26500 |\n", + "| time_elapsed | 1651 |\n", + "| total_timesteps | 530000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.896 |\n", + "| explained_variance | 0.9557487 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26499 |\n", + "| policy_loss | -0.00242 |\n", + "| std | 0.339 |\n", + "| value_loss | 0.000109 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26600 |\n", + "| time_elapsed | 1657 |\n", + "| total_timesteps | 532000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.896 |\n", + "| explained_variance | 0.95472234 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26599 |\n", + "| policy_loss | -0.00131 |\n", + "| std | 0.339 |\n", + "| value_loss | 0.000107 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26700 |\n", + "| time_elapsed | 1664 |\n", + "| total_timesteps | 534000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.898 |\n", + "| explained_variance | 0.97528225 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26699 |\n", + "| policy_loss | -0.00439 |\n", + "| std | 0.34 |\n", + "| value_loss | 8e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26800 |\n", + "| time_elapsed | 1673 |\n", + "| total_timesteps | 536000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.896 |\n", + "| explained_variance | 0.9557003 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26799 |\n", + "| policy_loss | -0.00705 |\n", + "| std | 0.339 |\n", + "| value_loss | 0.000122 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 26900 |\n", + "| time_elapsed | 1677 |\n", + "| total_timesteps | 538000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.899 |\n", + "| explained_variance | 0.9643465 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26899 |\n", + "| policy_loss | -0.00129 |\n", + "| std | 0.339 |\n", + "| value_loss | 0.000202 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 320 |\n", + "| iterations | 27000 |\n", + "| time_elapsed | 1682 |\n", + "| total_timesteps | 540000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.878 |\n", + "| explained_variance | 0.9536327 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26999 |\n", + "| policy_loss | 0.00474 |\n", + "| std | 0.337 |\n", + "| value_loss | 0.000155 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 27100 |\n", + "| time_elapsed | 1686 |\n", + "| total_timesteps | 542000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.872 |\n", + "| explained_variance | 0.9857932 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27099 |\n", + "| policy_loss | 0.00156 |\n", + "| std | 0.337 |\n", + "| value_loss | 4.18e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 321 |\n", + "| iterations | 27200 |\n", + "| time_elapsed | 1690 |\n", + "| total_timesteps | 544000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.867 |\n", + "| explained_variance | 0.95144564 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27199 |\n", + "| policy_loss | 0.00798 |\n", + "| std | 0.335 |\n", + "| value_loss | 0.000209 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 27300 |\n", + "| time_elapsed | 1695 |\n", + "| total_timesteps | 546000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.861 |\n", + "| explained_variance | 0.96456105 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27299 |\n", + "| policy_loss | 0.00775 |\n", + "| std | 0.335 |\n", + "| value_loss | 0.000121 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 27400 |\n", + "| time_elapsed | 1699 |\n", + "| total_timesteps | 548000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.85 |\n", + "| explained_variance | 0.9638762 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27399 |\n", + "| policy_loss | 0.00218 |\n", + "| std | 0.334 |\n", + "| value_loss | 0.000101 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.58 |\n", + "| ep_rew_mean | -0.189 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 27500 |\n", + "| time_elapsed | 1703 |\n", + "| total_timesteps | 550000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.827 |\n", + "| explained_variance | 0.97292376 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27499 |\n", + "| policy_loss | 0.000216 |\n", + "| std | 0.332 |\n", + "| value_loss | 0.000109 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.221 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 27600 |\n", + "| time_elapsed | 1711 |\n", + "| total_timesteps | 552000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.817 |\n", + "| explained_variance | 0.9647702 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27599 |\n", + "| policy_loss | -0.000127 |\n", + "| std | 0.331 |\n", + "| value_loss | 0.000103 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 322 |\n", + "| iterations | 27700 |\n", + "| time_elapsed | 1716 |\n", + "| total_timesteps | 554000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.804 |\n", + "| explained_variance | 0.8263781 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27699 |\n", + "| policy_loss | 0.0203 |\n", + "| std | 0.329 |\n", + "| value_loss | 0.00182 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 27800 |\n", + "| time_elapsed | 1720 |\n", + "| total_timesteps | 556000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.797 |\n", + "| explained_variance | 0.97323596 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27799 |\n", + "| policy_loss | 0.00182 |\n", + "| std | 0.328 |\n", + "| value_loss | 0.000173 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 27900 |\n", + "| time_elapsed | 1725 |\n", + "| total_timesteps | 558000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.774 |\n", + "| explained_variance | 0.22033525 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27899 |\n", + "| policy_loss | 0.00906 |\n", + "| std | 0.325 |\n", + "| value_loss | 0.00211 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 28000 |\n", + "| time_elapsed | 1730 |\n", + "| total_timesteps | 560000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.764 |\n", + "| explained_variance | 0.9667121 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27999 |\n", + "| policy_loss | -0.00939 |\n", + "| std | 0.324 |\n", + "| value_loss | 0.000226 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 28100 |\n", + "| time_elapsed | 1735 |\n", + "| total_timesteps | 562000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.76 |\n", + "| explained_variance | 0.9180963 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28099 |\n", + "| policy_loss | 0.000922 |\n", + "| std | 0.324 |\n", + "| value_loss | 0.00013 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 28200 |\n", + "| time_elapsed | 1740 |\n", + "| total_timesteps | 564000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.758 |\n", + "| explained_variance | 0.58245325 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28199 |\n", + "| policy_loss | -0.0188 |\n", + "| std | 0.324 |\n", + "| value_loss | 0.00212 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 28300 |\n", + "| time_elapsed | 1748 |\n", + "| total_timesteps | 566000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.747 |\n", + "| explained_variance | 0.98212785 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28299 |\n", + "| policy_loss | -0.00883 |\n", + "| std | 0.322 |\n", + "| value_loss | 0.000125 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 323 |\n", + "| iterations | 28400 |\n", + "| time_elapsed | 1753 |\n", + "| total_timesteps | 568000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.77 |\n", + "| explained_variance | 0.9311479 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28399 |\n", + "| policy_loss | -0.00106 |\n", + "| std | 0.325 |\n", + "| value_loss | 8.91e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 28500 |\n", + "| time_elapsed | 1757 |\n", + "| total_timesteps | 570000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.724 |\n", + "| explained_variance | 0.99005914 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28499 |\n", + "| policy_loss | 0.000466 |\n", + "| std | 0.32 |\n", + "| value_loss | 2.75e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.194 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 28600 |\n", + "| time_elapsed | 1762 |\n", + "| total_timesteps | 572000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.688 |\n", + "| explained_variance | 0.97127336 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28599 |\n", + "| policy_loss | -0.00282 |\n", + "| std | 0.316 |\n", + "| value_loss | 0.000111 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 28700 |\n", + "| time_elapsed | 1767 |\n", + "| total_timesteps | 574000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.673 |\n", + "| explained_variance | 0.97094136 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28699 |\n", + "| policy_loss | -0.00301 |\n", + "| std | 0.315 |\n", + "| value_loss | 0.00014 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 28800 |\n", + "| time_elapsed | 1772 |\n", + "| total_timesteps | 576000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.658 |\n", + "| explained_variance | 0.9512483 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28799 |\n", + "| policy_loss | -0.00559 |\n", + "| std | 0.313 |\n", + "| value_loss | 0.000173 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 28900 |\n", + "| time_elapsed | 1780 |\n", + "| total_timesteps | 578000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.65 |\n", + "| explained_variance | 0.97945994 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28899 |\n", + "| policy_loss | -0.00472 |\n", + "| std | 0.312 |\n", + "| value_loss | 0.000126 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.225 |\n", + "| time/ | |\n", + "| fps | 324 |\n", + "| iterations | 29000 |\n", + "| time_elapsed | 1785 |\n", + "| total_timesteps | 580000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.631 |\n", + "| explained_variance | 0.9418761 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28999 |\n", + "| policy_loss | -2.69e-05 |\n", + "| std | 0.311 |\n", + "| value_loss | 0.000187 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29100 |\n", + "| time_elapsed | 1790 |\n", + "| total_timesteps | 582000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.62 |\n", + "| explained_variance | 0.9561486 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29099 |\n", + "| policy_loss | 0.0037 |\n", + "| std | 0.309 |\n", + "| value_loss | 0.0002 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29200 |\n", + "| time_elapsed | 1795 |\n", + "| total_timesteps | 584000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.618 |\n", + "| explained_variance | 0.9853928 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29199 |\n", + "| policy_loss | -0.00218 |\n", + "| std | 0.309 |\n", + "| value_loss | 5.64e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29300 |\n", + "| time_elapsed | 1800 |\n", + "| total_timesteps | 586000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.611 |\n", + "| explained_variance | 0.97477955 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29299 |\n", + "| policy_loss | 0.00242 |\n", + "| std | 0.309 |\n", + "| value_loss | 0.000135 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29400 |\n", + "| time_elapsed | 1804 |\n", + "| total_timesteps | 588000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.613 |\n", + "| explained_variance | 0.6755737 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29399 |\n", + "| policy_loss | -0.0265 |\n", + "| std | 0.309 |\n", + "| value_loss | 0.00202 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 29500 |\n", + "| time_elapsed | 1809 |\n", + "| total_timesteps | 590000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.596 |\n", + "| explained_variance | 0.9880968 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29499 |\n", + "| policy_loss | 0.00439 |\n", + "| std | 0.307 |\n", + "| value_loss | 5.99e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29600 |\n", + "| time_elapsed | 1818 |\n", + "| total_timesteps | 592000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.589 |\n", + "| explained_variance | 0.93901527 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29599 |\n", + "| policy_loss | 0.00187 |\n", + "| std | 0.306 |\n", + "| value_loss | 0.000172 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.4 |\n", + "| ep_rew_mean | -0.275 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29700 |\n", + "| time_elapsed | 1823 |\n", + "| total_timesteps | 594000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.584 |\n", + "| explained_variance | -1.5040169 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29699 |\n", + "| policy_loss | -0.0121 |\n", + "| std | 0.306 |\n", + "| value_loss | 0.00608 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.08 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 325 |\n", + "| iterations | 29800 |\n", + "| time_elapsed | 1828 |\n", + "| total_timesteps | 596000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.602 |\n", + "| explained_variance | 0.35909313 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29799 |\n", + "| policy_loss | -0.00559 |\n", + "| std | 0.308 |\n", + "| value_loss | 0.00806 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.95 |\n", + "| ep_rew_mean | -0.23 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 29900 |\n", + "| time_elapsed | 1833 |\n", + "| total_timesteps | 598000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.617 |\n", + "| explained_variance | -10.621065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29899 |\n", + "| policy_loss | -0.0245 |\n", + "| std | 0.309 |\n", + "| value_loss | 0.0672 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30000 |\n", + "| time_elapsed | 1838 |\n", + "| total_timesteps | 600000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.585 |\n", + "| explained_variance | 0.41773236 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29999 |\n", + "| policy_loss | -0.0285 |\n", + "| std | 0.305 |\n", + "| value_loss | 0.00465 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30100 |\n", + "| time_elapsed | 1843 |\n", + "| total_timesteps | 602000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.596 |\n", + "| explained_variance | 0.9414502 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30099 |\n", + "| policy_loss | 0.00102 |\n", + "| std | 0.307 |\n", + "| value_loss | 0.000178 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30200 |\n", + "| time_elapsed | 1852 |\n", + "| total_timesteps | 604000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.586 |\n", + "| explained_variance | 0.598702 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30199 |\n", + "| policy_loss | -0.0509 |\n", + "| std | 0.306 |\n", + "| value_loss | 0.00411 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.94 |\n", + "| ep_rew_mean | -0.233 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30300 |\n", + "| time_elapsed | 1857 |\n", + "| total_timesteps | 606000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.568 |\n", + "| explained_variance | 0.9546901 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30299 |\n", + "| policy_loss | 0.00376 |\n", + "| std | 0.304 |\n", + "| value_loss | 0.000196 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30400 |\n", + "| time_elapsed | 1861 |\n", + "| total_timesteps | 608000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.556 |\n", + "| explained_variance | 0.9634257 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30399 |\n", + "| policy_loss | -0.00117 |\n", + "| std | 0.304 |\n", + "| value_loss | 0.000128 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.192 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30500 |\n", + "| time_elapsed | 1866 |\n", + "| total_timesteps | 610000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.559 |\n", + "| explained_variance | 0.9729128 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30499 |\n", + "| policy_loss | -0.0041 |\n", + "| std | 0.304 |\n", + "| value_loss | 8.32e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 326 |\n", + "| iterations | 30600 |\n", + "| time_elapsed | 1871 |\n", + "| total_timesteps | 612000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.542 |\n", + "| explained_variance | 0.98225373 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30599 |\n", + "| policy_loss | -0.00189 |\n", + "| std | 0.303 |\n", + "| value_loss | 9.61e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 30700 |\n", + "| time_elapsed | 1876 |\n", + "| total_timesteps | 614000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.539 |\n", + "| explained_variance | 0.92065257 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30699 |\n", + "| policy_loss | -0.00712 |\n", + "| std | 0.302 |\n", + "| value_loss | 0.000221 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 30800 |\n", + "| time_elapsed | 1881 |\n", + "| total_timesteps | 616000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.539 |\n", + "| explained_variance | 0.9718508 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30799 |\n", + "| policy_loss | -0.00467 |\n", + "| std | 0.303 |\n", + "| value_loss | 9.48e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 30900 |\n", + "| time_elapsed | 1889 |\n", + "| total_timesteps | 618000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.526 |\n", + "| explained_variance | 0.97893435 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30899 |\n", + "| policy_loss | 0.00621 |\n", + "| std | 0.302 |\n", + "| value_loss | 0.000222 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31000 |\n", + "| time_elapsed | 1893 |\n", + "| total_timesteps | 620000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.512 |\n", + "| explained_variance | 0.95006245 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30999 |\n", + "| policy_loss | 0.00233 |\n", + "| std | 0.301 |\n", + "| value_loss | 0.000139 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.84 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31100 |\n", + "| time_elapsed | 1898 |\n", + "| total_timesteps | 622000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.512 |\n", + "| explained_variance | 0.93657434 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31099 |\n", + "| policy_loss | -0.000242 |\n", + "| std | 0.301 |\n", + "| value_loss | 0.000198 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31200 |\n", + "| time_elapsed | 1905 |\n", + "| total_timesteps | 624000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.466 |\n", + "| explained_variance | 0.9877893 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31199 |\n", + "| policy_loss | 0.00217 |\n", + "| std | 0.297 |\n", + "| value_loss | 4.42e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31300 |\n", + "| time_elapsed | 1911 |\n", + "| total_timesteps | 626000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.437 |\n", + "| explained_variance | 0.98232895 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31299 |\n", + "| policy_loss | -0.00259 |\n", + "| std | 0.294 |\n", + "| value_loss | 6.34e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31400 |\n", + "| time_elapsed | 1916 |\n", + "| total_timesteps | 628000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.447 |\n", + "| explained_variance | 0.97575915 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31399 |\n", + "| policy_loss | 0.000886 |\n", + "| std | 0.295 |\n", + "| value_loss | 8.11e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31500 |\n", + "| time_elapsed | 1925 |\n", + "| total_timesteps | 630000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.438 |\n", + "| explained_variance | 0.97809935 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31499 |\n", + "| policy_loss | 0.00624 |\n", + "| std | 0.294 |\n", + "| value_loss | 7.33e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31600 |\n", + "| time_elapsed | 1930 |\n", + "| total_timesteps | 632000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.411 |\n", + "| explained_variance | 0.95562655 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31599 |\n", + "| policy_loss | -0.00718 |\n", + "| std | 0.291 |\n", + "| value_loss | 0.00027 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 327 |\n", + "| iterations | 31700 |\n", + "| time_elapsed | 1934 |\n", + "| total_timesteps | 634000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.402 |\n", + "| explained_variance | 0.98256177 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31699 |\n", + "| policy_loss | -0.000849 |\n", + "| std | 0.29 |\n", + "| value_loss | 5.61e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 328 |\n", + "| iterations | 31800 |\n", + "| time_elapsed | 1938 |\n", + "| total_timesteps | 636000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.398 |\n", + "| explained_variance | 0.49550873 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31799 |\n", + "| policy_loss | -0.0132 |\n", + "| std | 0.289 |\n", + "| value_loss | 0.00197 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 328 |\n", + "| iterations | 31900 |\n", + "| time_elapsed | 1942 |\n", + "| total_timesteps | 638000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.37 |\n", + "| explained_variance | 0.9492866 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31899 |\n", + "| policy_loss | 0.00263 |\n", + "| std | 0.287 |\n", + "| value_loss | 0.000154 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 328 |\n", + "| iterations | 32000 |\n", + "| time_elapsed | 1946 |\n", + "| total_timesteps | 640000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.369 |\n", + "| explained_variance | 0.94642025 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31999 |\n", + "| policy_loss | 0.00381 |\n", + "| std | 0.287 |\n", + "| value_loss | 0.000234 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.225 |\n", + "| time/ | |\n", + "| fps | 328 |\n", + "| iterations | 32100 |\n", + "| time_elapsed | 1951 |\n", + "| total_timesteps | 642000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.361 |\n", + "| explained_variance | 0.98188424 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32099 |\n", + "| policy_loss | 0.00164 |\n", + "| std | 0.286 |\n", + "| value_loss | 6.05e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 329 |\n", + "| iterations | 32200 |\n", + "| time_elapsed | 1955 |\n", + "| total_timesteps | 644000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.343 |\n", + "| explained_variance | 0.9386951 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32199 |\n", + "| policy_loss | -0.00134 |\n", + "| std | 0.285 |\n", + "| value_loss | 0.00101 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 328 |\n", + "| iterations | 32300 |\n", + "| time_elapsed | 1963 |\n", + "| total_timesteps | 646000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.338 |\n", + "| explained_variance | 0.9384046 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32299 |\n", + "| policy_loss | 0.00246 |\n", + "| std | 0.285 |\n", + "| value_loss | 0.000127 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.5 |\n", + "| ep_rew_mean | -0.19 |\n", + "| time/ | |\n", + "| fps | 329 |\n", + "| iterations | 32400 |\n", + "| time_elapsed | 1968 |\n", + "| total_timesteps | 648000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.325 |\n", + "| explained_variance | 0.9613883 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32399 |\n", + "| policy_loss | 0.00174 |\n", + "| std | 0.284 |\n", + "| value_loss | 0.00013 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 329 |\n", + "| iterations | 32500 |\n", + "| time_elapsed | 1972 |\n", + "| total_timesteps | 650000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.317 |\n", + "| explained_variance | 0.9388133 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32499 |\n", + "| policy_loss | -0.00116 |\n", + "| std | 0.283 |\n", + "| value_loss | 0.000238 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 329 |\n", + "| iterations | 32600 |\n", + "| time_elapsed | 1976 |\n", + "| total_timesteps | 652000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.301 |\n", + "| explained_variance | 0.97595453 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32599 |\n", + "| policy_loss | 0.0013 |\n", + "| std | 0.281 |\n", + "| value_loss | 7.5e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 329 |\n", + "| iterations | 32700 |\n", + "| time_elapsed | 1981 |\n", + "| total_timesteps | 654000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.303 |\n", + "| explained_variance | 0.9642559 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32699 |\n", + "| policy_loss | -0.00392 |\n", + "| std | 0.281 |\n", + "| value_loss | 0.000116 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 32800 |\n", + "| time_elapsed | 1986 |\n", + "| total_timesteps | 656000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.308 |\n", + "| explained_variance | 0.96840066 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32799 |\n", + "| policy_loss | -0.00165 |\n", + "| std | 0.281 |\n", + "| value_loss | 0.000215 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 32900 |\n", + "| time_elapsed | 1990 |\n", + "| total_timesteps | 658000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.28 |\n", + "| explained_variance | 0.8290812 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32899 |\n", + "| policy_loss | -0.00931 |\n", + "| std | 0.279 |\n", + "| value_loss | 0.000542 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33000 |\n", + "| time_elapsed | 1999 |\n", + "| total_timesteps | 660000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.271 |\n", + "| explained_variance | 0.9427947 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32999 |\n", + "| policy_loss | -0.00779 |\n", + "| std | 0.278 |\n", + "| value_loss | 0.000146 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33100 |\n", + "| time_elapsed | 2003 |\n", + "| total_timesteps | 662000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.252 |\n", + "| explained_variance | 0.98349446 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33099 |\n", + "| policy_loss | -0.000681 |\n", + "| std | 0.277 |\n", + "| value_loss | 8.71e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.88 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33200 |\n", + "| time_elapsed | 2008 |\n", + "| total_timesteps | 664000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.251 |\n", + "| explained_variance | 0.91295236 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33199 |\n", + "| policy_loss | 0.00814 |\n", + "| std | 0.278 |\n", + "| value_loss | 0.000507 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33300 |\n", + "| time_elapsed | 2013 |\n", + "| total_timesteps | 666000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.248 |\n", + "| explained_variance | 0.9723704 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33299 |\n", + "| policy_loss | 0.00126 |\n", + "| std | 0.278 |\n", + "| value_loss | 0.000105 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.54 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 33400 |\n", + "| time_elapsed | 2018 |\n", + "| total_timesteps | 668000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.226 |\n", + "| explained_variance | 0.96713173 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33399 |\n", + "| policy_loss | 0.000928 |\n", + "| std | 0.275 |\n", + "| value_loss | 0.000142 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 33500 |\n", + "| time_elapsed | 2023 |\n", + "| total_timesteps | 670000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.221 |\n", + "| explained_variance | 0.98223233 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33499 |\n", + "| policy_loss | -0.000939 |\n", + "| std | 0.275 |\n", + "| value_loss | 6.86e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33600 |\n", + "| time_elapsed | 2031 |\n", + "| total_timesteps | 672000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.215 |\n", + "| explained_variance | 0.97093904 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33599 |\n", + "| policy_loss | 0.00312 |\n", + "| std | 0.275 |\n", + "| value_loss | 0.000142 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 330 |\n", + "| iterations | 33700 |\n", + "| time_elapsed | 2037 |\n", + "| total_timesteps | 674000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.212 |\n", + "| explained_variance | 0.9846972 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33699 |\n", + "| policy_loss | -0.0108 |\n", + "| std | 0.274 |\n", + "| value_loss | 0.000128 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.196 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 33800 |\n", + "| time_elapsed | 2042 |\n", + "| total_timesteps | 676000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.224 |\n", + "| explained_variance | 0.9849943 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33799 |\n", + "| policy_loss | 0.000105 |\n", + "| std | 0.276 |\n", + "| value_loss | 0.000179 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.05 |\n", + "| ep_rew_mean | -0.245 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 33900 |\n", + "| time_elapsed | 2046 |\n", + "| total_timesteps | 678000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.218 |\n", + "| explained_variance | 0.7833823 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33899 |\n", + "| policy_loss | 0.000391 |\n", + "| std | 0.276 |\n", + "| value_loss | 0.000815 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 34000 |\n", + "| time_elapsed | 2051 |\n", + "| total_timesteps | 680000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.196 |\n", + "| explained_variance | 0.98827285 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33999 |\n", + "| policy_loss | -0.00283 |\n", + "| std | 0.273 |\n", + "| value_loss | 4.13e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 34100 |\n", + "| time_elapsed | 2055 |\n", + "| total_timesteps | 682000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.19 |\n", + "| explained_variance | 0.97333074 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34099 |\n", + "| policy_loss | 0.00383 |\n", + "| std | 0.273 |\n", + "| value_loss | 6.79e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 332 |\n", + "| iterations | 34200 |\n", + "| time_elapsed | 2060 |\n", + "| total_timesteps | 684000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.18 |\n", + "| explained_variance | 0.96711797 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34199 |\n", + "| policy_loss | 0.00052 |\n", + "| std | 0.272 |\n", + "| value_loss | 7.96e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 331 |\n", + "| iterations | 34300 |\n", + "| time_elapsed | 2067 |\n", + "| total_timesteps | 686000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.167 |\n", + "| explained_variance | 0.98783183 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34299 |\n", + "| policy_loss | -0.00374 |\n", + "| std | 0.27 |\n", + "| value_loss | 6.82e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 332 |\n", + "| iterations | 34400 |\n", + "| time_elapsed | 2072 |\n", + "| total_timesteps | 688000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.147 |\n", + "| explained_variance | 0.97669905 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34399 |\n", + "| policy_loss | -0.00153 |\n", + "| std | 0.269 |\n", + "| value_loss | 9.47e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.64 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 332 |\n", + "| iterations | 34500 |\n", + "| time_elapsed | 2076 |\n", + "| total_timesteps | 690000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.131 |\n", + "| explained_variance | 0.9739228 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34499 |\n", + "| policy_loss | -0.00306 |\n", + "| std | 0.267 |\n", + "| value_loss | 0.000116 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.192 |\n", + "| time/ | |\n", + "| fps | 332 |\n", + "| iterations | 34600 |\n", + "| time_elapsed | 2080 |\n", + "| total_timesteps | 692000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.112 |\n", + "| explained_variance | 0.98734295 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34599 |\n", + "| policy_loss | -0.000203 |\n", + "| std | 0.266 |\n", + "| value_loss | 5.21e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 332 |\n", + "| iterations | 34700 |\n", + "| time_elapsed | 2084 |\n", + "| total_timesteps | 694000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.101 |\n", + "| explained_variance | 0.9875052 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34699 |\n", + "| policy_loss | -0.000905 |\n", + "| std | 0.265 |\n", + "| value_loss | 6e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 333 |\n", + "| iterations | 34800 |\n", + "| time_elapsed | 2088 |\n", + "| total_timesteps | 696000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.0791 |\n", + "| explained_variance | 0.8980861 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34799 |\n", + "| policy_loss | 0.00288 |\n", + "| std | 0.263 |\n", + "| value_loss | 0.000259 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 333 |\n", + "| iterations | 34900 |\n", + "| time_elapsed | 2093 |\n", + "| total_timesteps | 698000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.0663 |\n", + "| explained_variance | 0.89876664 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34899 |\n", + "| policy_loss | -0.00562 |\n", + "| std | 0.262 |\n", + "| value_loss | 0.000626 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.82 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 333 |\n", + "| iterations | 35000 |\n", + "| time_elapsed | 2097 |\n", + "| total_timesteps | 700000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.0387 |\n", + "| explained_variance | 0.9740894 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34999 |\n", + "| policy_loss | 0.00189 |\n", + "| std | 0.261 |\n", + "| value_loss | 0.000131 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.97 |\n", + "| ep_rew_mean | -0.243 |\n", + "| time/ | |\n", + "| fps | 333 |\n", + "| iterations | 35100 |\n", + "| time_elapsed | 2105 |\n", + "| total_timesteps | 702000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.049 |\n", + "| explained_variance | 0.9775146 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35099 |\n", + "| policy_loss | -0.000261 |\n", + "| std | 0.262 |\n", + "| value_loss | 6.19e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 333 |\n", + "| iterations | 35200 |\n", + "| time_elapsed | 2109 |\n", + "| total_timesteps | 704000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.016 |\n", + "| explained_variance | 0.9900468 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35199 |\n", + "| policy_loss | -0.00254 |\n", + "| std | 0.26 |\n", + "| value_loss | 4.96e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35300 |\n", + "| time_elapsed | 2113 |\n", + "| total_timesteps | 706000 |\n", + "| train/ | |\n", + "| entropy_loss | -0.00382 |\n", + "| explained_variance | 0.9701562 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35299 |\n", + "| policy_loss | 0.000467 |\n", + "| std | 0.258 |\n", + "| value_loss | 8.83e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35400 |\n", + "| time_elapsed | 2117 |\n", + "| total_timesteps | 708000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0109 |\n", + "| explained_variance | 0.98845273 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35399 |\n", + "| policy_loss | 0.00228 |\n", + "| std | 0.258 |\n", + "| value_loss | 7.26e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.53 |\n", + "| ep_rew_mean | -0.192 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35500 |\n", + "| time_elapsed | 2122 |\n", + "| total_timesteps | 710000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0242 |\n", + "| explained_variance | 0.98698086 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35499 |\n", + "| policy_loss | 0.000431 |\n", + "| std | 0.257 |\n", + "| value_loss | 5.86e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35600 |\n", + "| time_elapsed | 2126 |\n", + "| total_timesteps | 712000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.055 |\n", + "| explained_variance | 0.9600166 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35599 |\n", + "| policy_loss | -0.00257 |\n", + "| std | 0.254 |\n", + "| value_loss | 0.00043 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 35700 |\n", + "| time_elapsed | 2131 |\n", + "| total_timesteps | 714000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0481 |\n", + "| explained_variance | 0.98819304 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35699 |\n", + "| policy_loss | -0.000964 |\n", + "| std | 0.255 |\n", + "| value_loss | 0.000127 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35800 |\n", + "| time_elapsed | 2139 |\n", + "| total_timesteps | 716000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0711 |\n", + "| explained_variance | 0.6373999 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35799 |\n", + "| policy_loss | -0.00345 |\n", + "| std | 0.253 |\n", + "| value_loss | 0.00253 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 334 |\n", + "| iterations | 35900 |\n", + "| time_elapsed | 2144 |\n", + "| total_timesteps | 718000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0727 |\n", + "| explained_variance | 0.9650853 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35899 |\n", + "| policy_loss | 0.000592 |\n", + "| std | 0.253 |\n", + "| value_loss | 0.000188 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36000 |\n", + "| time_elapsed | 2149 |\n", + "| total_timesteps | 720000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.0974 |\n", + "| explained_variance | 0.96997017 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35999 |\n", + "| policy_loss | 0.00803 |\n", + "| std | 0.251 |\n", + "| value_loss | 0.000125 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.57 |\n", + "| ep_rew_mean | -0.195 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36100 |\n", + "| time_elapsed | 2153 |\n", + "| total_timesteps | 722000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.116 |\n", + "| explained_variance | 0.9551712 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36099 |\n", + "| policy_loss | 0.00695 |\n", + "| std | 0.249 |\n", + "| value_loss | 0.000214 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.195 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36200 |\n", + "| time_elapsed | 2158 |\n", + "| total_timesteps | 724000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.115 |\n", + "| explained_variance | 0.9668451 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36199 |\n", + "| policy_loss | -0.00064 |\n", + "| std | 0.249 |\n", + "| value_loss | 9.82e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36300 |\n", + "| time_elapsed | 2162 |\n", + "| total_timesteps | 726000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.138 |\n", + "| explained_variance | 0.9729329 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36299 |\n", + "| policy_loss | 0.00271 |\n", + "| std | 0.247 |\n", + "| value_loss | 7.26e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36400 |\n", + "| time_elapsed | 2167 |\n", + "| total_timesteps | 728000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.139 |\n", + "| explained_variance | 0.93757355 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36399 |\n", + "| policy_loss | 0.00147 |\n", + "| std | 0.246 |\n", + "| value_loss | 0.00028 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36500 |\n", + "| time_elapsed | 2175 |\n", + "| total_timesteps | 730000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.159 |\n", + "| explained_variance | 0.9726939 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36499 |\n", + "| policy_loss | 0.000194 |\n", + "| std | 0.245 |\n", + "| value_loss | 0.000112 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 335 |\n", + "| iterations | 36600 |\n", + "| time_elapsed | 2179 |\n", + "| total_timesteps | 732000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.161 |\n", + "| explained_variance | 0.8855412 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36599 |\n", + "| policy_loss | -0.00331 |\n", + "| std | 0.245 |\n", + "| value_loss | 0.000437 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.52 |\n", + "| ep_rew_mean | -0.181 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 36700 |\n", + "| time_elapsed | 2184 |\n", + "| total_timesteps | 734000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.179 |\n", + "| explained_variance | 0.9100865 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36699 |\n", + "| policy_loss | -0.00193 |\n", + "| std | 0.244 |\n", + "| value_loss | 0.000243 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 36800 |\n", + "| time_elapsed | 2189 |\n", + "| total_timesteps | 736000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.179 |\n", + "| explained_variance | 0.9853102 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36799 |\n", + "| policy_loss | -0.000901 |\n", + "| std | 0.243 |\n", + "| value_loss | 0.000174 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 36900 |\n", + "| time_elapsed | 2194 |\n", + "| total_timesteps | 738000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.185 |\n", + "| explained_variance | 0.9825456 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36899 |\n", + "| policy_loss | -0.00222 |\n", + "| std | 0.243 |\n", + "| value_loss | 6.28e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37000 |\n", + "| time_elapsed | 2198 |\n", + "| total_timesteps | 740000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.185 |\n", + "| explained_variance | 0.9939991 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36999 |\n", + "| policy_loss | 0.00416 |\n", + "| std | 0.243 |\n", + "| value_loss | 0.000158 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37100 |\n", + "| time_elapsed | 2203 |\n", + "| total_timesteps | 742000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.198 |\n", + "| explained_variance | 0.99118507 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37099 |\n", + "| policy_loss | -0.00179 |\n", + "| std | 0.241 |\n", + "| value_loss | 3.91e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37200 |\n", + "| time_elapsed | 2212 |\n", + "| total_timesteps | 744000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.21 |\n", + "| explained_variance | 0.9838521 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37199 |\n", + "| policy_loss | -0.00318 |\n", + "| std | 0.24 |\n", + "| value_loss | 0.000107 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37300 |\n", + "| time_elapsed | 2217 |\n", + "| total_timesteps | 746000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.221 |\n", + "| explained_variance | 0.98889595 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37299 |\n", + "| policy_loss | -0.00152 |\n", + "| std | 0.239 |\n", + "| value_loss | 5.86e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37400 |\n", + "| time_elapsed | 2222 |\n", + "| total_timesteps | 748000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.231 |\n", + "| explained_variance | 0.9741186 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37399 |\n", + "| policy_loss | 0.00128 |\n", + "| std | 0.239 |\n", + "| value_loss | 0.000114 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.92 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37500 |\n", + "| time_elapsed | 2226 |\n", + "| total_timesteps | 750000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.255 |\n", + "| explained_variance | 0.9857825 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37499 |\n", + "| policy_loss | 0.00347 |\n", + "| std | 0.236 |\n", + "| value_loss | 9.37e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 37600 |\n", + "| time_elapsed | 2231 |\n", + "| total_timesteps | 752000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.274 |\n", + "| explained_variance | 0.97550106 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37599 |\n", + "| policy_loss | -0.00678 |\n", + "| std | 0.235 |\n", + "| value_loss | 9.08e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 37700 |\n", + "| time_elapsed | 2236 |\n", + "| total_timesteps | 754000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.292 |\n", + "| explained_variance | 0.979112 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37699 |\n", + "| policy_loss | -0.000219 |\n", + "| std | 0.234 |\n", + "| value_loss | 8.7e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 37800 |\n", + "| time_elapsed | 2241 |\n", + "| total_timesteps | 756000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.317 |\n", + "| explained_variance | 0.97191083 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37799 |\n", + "| policy_loss | 0.00196 |\n", + "| std | 0.232 |\n", + "| value_loss | 0.000147 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 336 |\n", + "| iterations | 37900 |\n", + "| time_elapsed | 2249 |\n", + "| total_timesteps | 758000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.31 |\n", + "| explained_variance | 0.98574996 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37899 |\n", + "| policy_loss | -0.00227 |\n", + "| std | 0.232 |\n", + "| value_loss | 5.8e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38000 |\n", + "| time_elapsed | 2254 |\n", + "| total_timesteps | 760000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.312 |\n", + "| explained_variance | 0.9653875 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37999 |\n", + "| policy_loss | -0.00167 |\n", + "| std | 0.232 |\n", + "| value_loss | 0.000143 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38100 |\n", + "| time_elapsed | 2258 |\n", + "| total_timesteps | 762000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.337 |\n", + "| explained_variance | 0.98960257 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38099 |\n", + "| policy_loss | 0.00316 |\n", + "| std | 0.23 |\n", + "| value_loss | 4.85e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.57 |\n", + "| ep_rew_mean | -0.193 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38200 |\n", + "| time_elapsed | 2263 |\n", + "| total_timesteps | 764000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.342 |\n", + "| explained_variance | 0.9582597 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38199 |\n", + "| policy_loss | -0.0103 |\n", + "| std | 0.23 |\n", + "| value_loss | 0.000147 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38300 |\n", + "| time_elapsed | 2268 |\n", + "| total_timesteps | 766000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.363 |\n", + "| explained_variance | 0.99025834 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38299 |\n", + "| policy_loss | -0.002 |\n", + "| std | 0.228 |\n", + "| value_loss | 8.47e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.226 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38400 |\n", + "| time_elapsed | 2272 |\n", + "| total_timesteps | 768000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.371 |\n", + "| explained_variance | 0.9848292 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38399 |\n", + "| policy_loss | -0.0029 |\n", + "| std | 0.228 |\n", + "| value_loss | 9.26e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 38500 |\n", + "| time_elapsed | 2277 |\n", + "| total_timesteps | 770000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.377 |\n", + "| explained_variance | 0.9523325 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38499 |\n", + "| policy_loss | 0.000736 |\n", + "| std | 0.227 |\n", + "| value_loss | 0.000545 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.52 |\n", + "| ep_rew_mean | -0.188 |\n", + "| time/ | |\n", + "| fps | 337 |\n", + "| iterations | 38600 |\n", + "| time_elapsed | 2285 |\n", + "| total_timesteps | 772000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.379 |\n", + "| explained_variance | 0.98056465 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38599 |\n", + "| policy_loss | -0.00449 |\n", + "| std | 0.226 |\n", + "| value_loss | 0.000108 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 38700 |\n", + "| time_elapsed | 2289 |\n", + "| total_timesteps | 774000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.392 |\n", + "| explained_variance | 0.98470736 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38699 |\n", + "| policy_loss | -0.00333 |\n", + "| std | 0.226 |\n", + "| value_loss | 3.93e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 38800 |\n", + "| time_elapsed | 2293 |\n", + "| total_timesteps | 776000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.412 |\n", + "| explained_variance | 0.8823349 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38799 |\n", + "| policy_loss | 0.00308 |\n", + "| std | 0.224 |\n", + "| value_loss | 0.000418 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.9 |\n", + "| ep_rew_mean | -0.229 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 38900 |\n", + "| time_elapsed | 2298 |\n", + "| total_timesteps | 778000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.418 |\n", + "| explained_variance | 0.73990154 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38899 |\n", + "| policy_loss | 0.00874 |\n", + "| std | 0.223 |\n", + "| value_loss | 0.00106 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.64 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39000 |\n", + "| time_elapsed | 2302 |\n", + "| total_timesteps | 780000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.396 |\n", + "| explained_variance | 0.966514 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38999 |\n", + "| policy_loss | -0.00503 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.000231 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.234 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39100 |\n", + "| time_elapsed | 2307 |\n", + "| total_timesteps | 782000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.402 |\n", + "| explained_variance | 0.97168076 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39099 |\n", + "| policy_loss | 0.00411 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.000162 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.55 |\n", + "| ep_rew_mean | -0.187 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 39200 |\n", + "| time_elapsed | 2312 |\n", + "| total_timesteps | 784000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.393 |\n", + "| explained_variance | 0.9620045 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39199 |\n", + "| policy_loss | 0.00139 |\n", + "| std | 0.226 |\n", + "| value_loss | 0.000167 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.93 |\n", + "| ep_rew_mean | -0.228 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39300 |\n", + "| time_elapsed | 2321 |\n", + "| total_timesteps | 786000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.386 |\n", + "| explained_variance | 0.90208817 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39299 |\n", + "| policy_loss | -0.00257 |\n", + "| std | 0.226 |\n", + "| value_loss | 0.000814 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39400 |\n", + "| time_elapsed | 2328 |\n", + "| total_timesteps | 788000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.402 |\n", + "| explained_variance | 0.9760422 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39399 |\n", + "| policy_loss | 0.00142 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.000173 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39500 |\n", + "| time_elapsed | 2334 |\n", + "| total_timesteps | 790000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.415 |\n", + "| explained_variance | 0.97800255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39499 |\n", + "| policy_loss | -0.00893 |\n", + "| std | 0.224 |\n", + "| value_loss | 0.000148 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39600 |\n", + "| time_elapsed | 2339 |\n", + "| total_timesteps | 792000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.409 |\n", + "| explained_variance | 0.5413128 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39599 |\n", + "| policy_loss | 0.00744 |\n", + "| std | 0.224 |\n", + "| value_loss | 0.00194 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39700 |\n", + "| time_elapsed | 2344 |\n", + "| total_timesteps | 794000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.399 |\n", + "| explained_variance | 0.984776 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39699 |\n", + "| policy_loss | -0.00425 |\n", + "| std | 0.225 |\n", + "| value_loss | 7.3e-05 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.224 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39800 |\n", + "| time_elapsed | 2349 |\n", + "| total_timesteps | 796000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.399 |\n", + "| explained_variance | 0.96791893 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39799 |\n", + "| policy_loss | 0.00499 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.000191 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 39900 |\n", + "| time_elapsed | 2358 |\n", + "| total_timesteps | 798000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.383 |\n", + "| explained_variance | -2.428947 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39899 |\n", + "| policy_loss | -0.0136 |\n", + "| std | 0.226 |\n", + "| value_loss | 0.0102 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.65 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40000 |\n", + "| time_elapsed | 2363 |\n", + "| total_timesteps | 800000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.369 |\n", + "| explained_variance | 0.98433495 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39999 |\n", + "| policy_loss | -0.00361 |\n", + "| std | 0.228 |\n", + "| value_loss | 9.41e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40100 |\n", + "| time_elapsed | 2368 |\n", + "| total_timesteps | 802000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.365 |\n", + "| explained_variance | 0.9824995 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40099 |\n", + "| policy_loss | 0.000342 |\n", + "| std | 0.228 |\n", + "| value_loss | 5.56e-05 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.55 |\n", + "| ep_rew_mean | -0.193 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40200 |\n", + "| time_elapsed | 2373 |\n", + "| total_timesteps | 804000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.402 |\n", + "| explained_variance | 0.985323 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40199 |\n", + "| policy_loss | 0.00106 |\n", + "| std | 0.226 |\n", + "| value_loss | 5.57e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 7.44 |\n", + "| ep_rew_mean | -0.644 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40300 |\n", + "| time_elapsed | 2378 |\n", + "| total_timesteps | 806000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.396 |\n", + "| explained_variance | 0.8747574 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40299 |\n", + "| policy_loss | -0.0966 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.127 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.99 |\n", + "| ep_rew_mean | -0.314 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40400 |\n", + "| time_elapsed | 2384 |\n", + "| total_timesteps | 808000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.4 |\n", + "| explained_variance | -3.9502196 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40399 |\n", + "| policy_loss | -0.000899 |\n", + "| std | 0.224 |\n", + "| value_loss | 0.0228 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.68 |\n", + "| ep_rew_mean | -0.293 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40500 |\n", + "| time_elapsed | 2392 |\n", + "| total_timesteps | 810000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.379 |\n", + "| explained_variance | 0.84740096 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40499 |\n", + "| policy_loss | -0.000565 |\n", + "| std | 0.226 |\n", + "| value_loss | 0.0476 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40600 |\n", + "| time_elapsed | 2398 |\n", + "| total_timesteps | 812000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.395 |\n", + "| explained_variance | 0.80924624 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40599 |\n", + "| policy_loss | -0.00864 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.00108 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 3.04 |\n", + "| ep_rew_mean | -0.236 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40700 |\n", + "| time_elapsed | 2402 |\n", + "| total_timesteps | 814000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.417 |\n", + "| explained_variance | 0.73390794 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40699 |\n", + "| policy_loss | 0.0243 |\n", + "| std | 0.224 |\n", + "| value_loss | 0.0164 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 40800 |\n", + "| time_elapsed | 2407 |\n", + "| total_timesteps | 816000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.406 |\n", + "| explained_variance | 0.97868705 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40799 |\n", + "| policy_loss | -0.00146 |\n", + "| std | 0.225 |\n", + "| value_loss | 0.000133 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 40900 |\n", + "| time_elapsed | 2412 |\n", + "| total_timesteps | 818000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.429 |\n", + "| explained_variance | 0.8363371 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40899 |\n", + "| policy_loss | -0.00689 |\n", + "| std | 0.223 |\n", + "| value_loss | 0.000464 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41000 |\n", + "| time_elapsed | 2417 |\n", + "| total_timesteps | 820000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.445 |\n", + "| explained_variance | 0.977923 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40999 |\n", + "| policy_loss | -0.00173 |\n", + "| std | 0.222 |\n", + "| value_loss | 0.000178 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 41100 |\n", + "| time_elapsed | 2426 |\n", + "| total_timesteps | 822000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.458 |\n", + "| explained_variance | 0.63355607 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41099 |\n", + "| policy_loss | 0.0177 |\n", + "| std | 0.22 |\n", + "| value_loss | 0.00102 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 338 |\n", + "| iterations | 41200 |\n", + "| time_elapsed | 2431 |\n", + "| total_timesteps | 824000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.453 |\n", + "| explained_variance | 0.9759229 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41199 |\n", + "| policy_loss | -0.0158 |\n", + "| std | 0.22 |\n", + "| value_loss | 0.000228 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41300 |\n", + "| time_elapsed | 2436 |\n", + "| total_timesteps | 826000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.445 |\n", + "| explained_variance | 0.99040455 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41299 |\n", + "| policy_loss | 0.00678 |\n", + "| std | 0.221 |\n", + "| value_loss | 7.98e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41400 |\n", + "| time_elapsed | 2441 |\n", + "| total_timesteps | 828000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.458 |\n", + "| explained_variance | 0.9926231 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41399 |\n", + "| policy_loss | 0.00175 |\n", + "| std | 0.22 |\n", + "| value_loss | 2.89e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.58 |\n", + "| ep_rew_mean | -0.196 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41500 |\n", + "| time_elapsed | 2447 |\n", + "| total_timesteps | 830000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.47 |\n", + "| explained_variance | 0.97897565 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41499 |\n", + "| policy_loss | 0.0038 |\n", + "| std | 0.219 |\n", + "| value_loss | 9.45e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41600 |\n", + "| time_elapsed | 2452 |\n", + "| total_timesteps | 832000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.446 |\n", + "| explained_variance | 0.9452324 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41599 |\n", + "| policy_loss | -0.011 |\n", + "| std | 0.22 |\n", + "| value_loss | 0.000302 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.202 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41700 |\n", + "| time_elapsed | 2457 |\n", + "| total_timesteps | 834000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.471 |\n", + "| explained_variance | 0.9743598 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41699 |\n", + "| policy_loss | 0.00613 |\n", + "| std | 0.218 |\n", + "| value_loss | 0.000198 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.78 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41800 |\n", + "| time_elapsed | 2465 |\n", + "| total_timesteps | 836000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.465 |\n", + "| explained_variance | 0.6682483 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41799 |\n", + "| policy_loss | -0.0067 |\n", + "| std | 0.219 |\n", + "| value_loss | 0.00284 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 41900 |\n", + "| time_elapsed | 2469 |\n", + "| total_timesteps | 838000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.488 |\n", + "| explained_variance | 0.9824863 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41899 |\n", + "| policy_loss | -0.00377 |\n", + "| std | 0.217 |\n", + "| value_loss | 9.89e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 42000 |\n", + "| time_elapsed | 2473 |\n", + "| total_timesteps | 840000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.508 |\n", + "| explained_variance | 0.97226715 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41999 |\n", + "| policy_loss | -0.0114 |\n", + "| std | 0.216 |\n", + "| value_loss | 0.000727 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 42100 |\n", + "| time_elapsed | 2477 |\n", + "| total_timesteps | 842000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.504 |\n", + "| explained_variance | 0.98028255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42099 |\n", + "| policy_loss | 0.00354 |\n", + "| std | 0.217 |\n", + "| value_loss | 0.000129 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.57 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 42200 |\n", + "| time_elapsed | 2482 |\n", + "| total_timesteps | 844000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.489 |\n", + "| explained_variance | 0.96648335 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42199 |\n", + "| policy_loss | -0.00583 |\n", + "| std | 0.218 |\n", + "| value_loss | 0.00013 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42300 |\n", + "| time_elapsed | 2487 |\n", + "| total_timesteps | 846000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.503 |\n", + "| explained_variance | 0.9749611 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42299 |\n", + "| policy_loss | 0.00564 |\n", + "| std | 0.218 |\n", + "| value_loss | 0.000109 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42400 |\n", + "| time_elapsed | 2491 |\n", + "| total_timesteps | 848000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.524 |\n", + "| explained_variance | 0.99248254 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42399 |\n", + "| policy_loss | -0.000632 |\n", + "| std | 0.216 |\n", + "| value_loss | 4.3e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42500 |\n", + "| time_elapsed | 2499 |\n", + "| total_timesteps | 850000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.546 |\n", + "| explained_variance | 0.98732716 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42499 |\n", + "| policy_loss | 0.000381 |\n", + "| std | 0.214 |\n", + "| value_loss | 4.16e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42600 |\n", + "| time_elapsed | 2504 |\n", + "| total_timesteps | 852000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.548 |\n", + "| explained_variance | 0.9690981 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42599 |\n", + "| policy_loss | -0.0125 |\n", + "| std | 0.213 |\n", + "| value_loss | 0.000401 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.58 |\n", + "| ep_rew_mean | -0.194 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42700 |\n", + "| time_elapsed | 2510 |\n", + "| total_timesteps | 854000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.548 |\n", + "| explained_variance | 0.948852 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42699 |\n", + "| policy_loss | 0.00354 |\n", + "| std | 0.214 |\n", + "| value_loss | 9.49e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42800 |\n", + "| time_elapsed | 2514 |\n", + "| total_timesteps | 856000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.541 |\n", + "| explained_variance | 0.9658161 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42799 |\n", + "| policy_loss | -3.41e-05 |\n", + "| std | 0.214 |\n", + "| value_loss | 0.000158 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 42900 |\n", + "| time_elapsed | 2519 |\n", + "| total_timesteps | 858000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.531 |\n", + "| explained_variance | 0.9794916 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42899 |\n", + "| policy_loss | 0.00583 |\n", + "| std | 0.214 |\n", + "| value_loss | 9.42e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.47 |\n", + "| ep_rew_mean | -0.187 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43000 |\n", + "| time_elapsed | 2524 |\n", + "| total_timesteps | 860000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.527 |\n", + "| explained_variance | 0.98890656 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42999 |\n", + "| policy_loss | -0.00859 |\n", + "| std | 0.214 |\n", + "| value_loss | 0.000106 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43100 |\n", + "| time_elapsed | 2528 |\n", + "| total_timesteps | 862000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.519 |\n", + "| explained_variance | 0.97553414 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43099 |\n", + "| policy_loss | -0.00251 |\n", + "| std | 0.215 |\n", + "| value_loss | 6.28e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43200 |\n", + "| time_elapsed | 2537 |\n", + "| total_timesteps | 864000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.553 |\n", + "| explained_variance | 0.9948936 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43199 |\n", + "| policy_loss | 0.000498 |\n", + "| std | 0.212 |\n", + "| value_loss | 3.8e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43300 |\n", + "| time_elapsed | 2542 |\n", + "| total_timesteps | 866000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.561 |\n", + "| explained_variance | 0.9760311 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43299 |\n", + "| policy_loss | -0.00212 |\n", + "| std | 0.212 |\n", + "| value_loss | 0.000129 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.231 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43400 |\n", + "| time_elapsed | 2547 |\n", + "| total_timesteps | 868000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.558 |\n", + "| explained_variance | 0.9611102 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43399 |\n", + "| policy_loss | -0.000699 |\n", + "| std | 0.212 |\n", + "| value_loss | 0.000191 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43500 |\n", + "| time_elapsed | 2551 |\n", + "| total_timesteps | 870000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.573 |\n", + "| explained_variance | 0.98930174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43499 |\n", + "| policy_loss | -0.0037 |\n", + "| std | 0.211 |\n", + "| value_loss | 4.31e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.201 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 43600 |\n", + "| time_elapsed | 2556 |\n", + "| total_timesteps | 872000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.586 |\n", + "| explained_variance | 0.98348564 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43599 |\n", + "| policy_loss | 0.00287 |\n", + "| std | 0.21 |\n", + "| value_loss | 0.000106 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.199 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 43700 |\n", + "| time_elapsed | 2562 |\n", + "| total_timesteps | 874000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.616 |\n", + "| explained_variance | 0.69001275 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43699 |\n", + "| policy_loss | 0.0157 |\n", + "| std | 0.208 |\n", + "| value_loss | 0.00192 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43800 |\n", + "| time_elapsed | 2571 |\n", + "| total_timesteps | 876000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.615 |\n", + "| explained_variance | 0.97150284 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43799 |\n", + "| policy_loss | 0.000559 |\n", + "| std | 0.208 |\n", + "| value_loss | 0.000119 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.85 |\n", + "| ep_rew_mean | -0.218 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 43900 |\n", + "| time_elapsed | 2576 |\n", + "| total_timesteps | 878000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.633 |\n", + "| explained_variance | 0.98416793 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43899 |\n", + "| policy_loss | 0.00093 |\n", + "| std | 0.206 |\n", + "| value_loss | 0.000116 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.64 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 44000 |\n", + "| time_elapsed | 2582 |\n", + "| total_timesteps | 880000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.618 |\n", + "| explained_variance | 0.9784064 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43999 |\n", + "| policy_loss | 0.00364 |\n", + "| std | 0.208 |\n", + "| value_loss | 0.000147 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 44100 |\n", + "| time_elapsed | 2586 |\n", + "| total_timesteps | 882000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.635 |\n", + "| explained_variance | 0.9662014 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44099 |\n", + "| policy_loss | -0.00547 |\n", + "| std | 0.207 |\n", + "| value_loss | 0.000243 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.89 |\n", + "| ep_rew_mean | -0.241 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44200 |\n", + "| time_elapsed | 2591 |\n", + "| total_timesteps | 884000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.656 |\n", + "| explained_variance | 0.7693075 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44199 |\n", + "| policy_loss | 0.0143 |\n", + "| std | 0.206 |\n", + "| value_loss | 0.00191 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44300 |\n", + "| time_elapsed | 2595 |\n", + "| total_timesteps | 886000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.646 |\n", + "| explained_variance | 0.9649852 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44299 |\n", + "| policy_loss | -0.00818 |\n", + "| std | 0.206 |\n", + "| value_loss | 0.000203 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44400 |\n", + "| time_elapsed | 2600 |\n", + "| total_timesteps | 888000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.677 |\n", + "| explained_variance | 0.9866615 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44399 |\n", + "| policy_loss | -0.00452 |\n", + "| std | 0.204 |\n", + "| value_loss | 0.000119 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44500 |\n", + "| time_elapsed | 2608 |\n", + "| total_timesteps | 890000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.688 |\n", + "| explained_variance | 0.98133665 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44499 |\n", + "| policy_loss | 0.00382 |\n", + "| std | 0.204 |\n", + "| value_loss | 0.000157 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.189 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44600 |\n", + "| time_elapsed | 2612 |\n", + "| total_timesteps | 892000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.697 |\n", + "| explained_variance | 0.9878949 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44599 |\n", + "| policy_loss | -0.000211 |\n", + "| std | 0.203 |\n", + "| value_loss | 6.87e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.7 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44700 |\n", + "| time_elapsed | 2617 |\n", + "| total_timesteps | 894000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.71 |\n", + "| explained_variance | 0.9808317 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44699 |\n", + "| policy_loss | -0.000497 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000117 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44800 |\n", + "| time_elapsed | 2622 |\n", + "| total_timesteps | 896000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.737 |\n", + "| explained_variance | 0.9543187 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44799 |\n", + "| policy_loss | -0.0002 |\n", + "| std | 0.201 |\n", + "| value_loss | 0.00014 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 44900 |\n", + "| time_elapsed | 2626 |\n", + "| total_timesteps | 898000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.736 |\n", + "| explained_variance | 0.9573474 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44899 |\n", + "| policy_loss | -0.016 |\n", + "| std | 0.2 |\n", + "| value_loss | 0.000362 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.47 |\n", + "| ep_rew_mean | -0.181 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45000 |\n", + "| time_elapsed | 2631 |\n", + "| total_timesteps | 900000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.706 |\n", + "| explained_variance | 0.9849114 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44999 |\n", + "| policy_loss | 0.00167 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.000118 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.207 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45100 |\n", + "| time_elapsed | 2636 |\n", + "| total_timesteps | 902000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.696 |\n", + "| explained_variance | 0.80178624 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45099 |\n", + "| policy_loss | 0.00337 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.00101 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.64 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 45200 |\n", + "| time_elapsed | 2644 |\n", + "| total_timesteps | 904000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.717 |\n", + "| explained_variance | 0.9752399 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45199 |\n", + "| policy_loss | -0.00655 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.00014 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.204 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45300 |\n", + "| time_elapsed | 2649 |\n", + "| total_timesteps | 906000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.709 |\n", + "| explained_variance | 0.98351824 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45299 |\n", + "| policy_loss | -0.00476 |\n", + "| std | 0.203 |\n", + "| value_loss | 8.73e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45400 |\n", + "| time_elapsed | 2653 |\n", + "| total_timesteps | 908000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.71 |\n", + "| explained_variance | 0.99506396 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45399 |\n", + "| policy_loss | -0.00952 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.000135 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45500 |\n", + "| time_elapsed | 2658 |\n", + "| total_timesteps | 910000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.711 |\n", + "| explained_variance | 0.9815719 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45499 |\n", + "| policy_loss | 0.00707 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.000139 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.45 |\n", + "| ep_rew_mean | -0.171 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 45600 |\n", + "| time_elapsed | 2665 |\n", + "| total_timesteps | 912000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.718 |\n", + "| explained_variance | 0.93521357 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45599 |\n", + "| policy_loss | 0.016 |\n", + "| std | 0.203 |\n", + "| value_loss | 0.00045 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.93 |\n", + "| ep_rew_mean | -0.235 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 45700 |\n", + "| time_elapsed | 2676 |\n", + "| total_timesteps | 914000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.724 |\n", + "| explained_variance | 0.9420419 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45699 |\n", + "| policy_loss | -0.00632 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000384 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 45800 |\n", + "| time_elapsed | 2682 |\n", + "| total_timesteps | 916000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.739 |\n", + "| explained_variance | 0.95885766 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45799 |\n", + "| policy_loss | 0.00537 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000219 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 45900 |\n", + "| time_elapsed | 2689 |\n", + "| total_timesteps | 918000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.743 |\n", + "| explained_variance | 0.94796073 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45899 |\n", + "| policy_loss | -6.43e-05 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000202 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 46000 |\n", + "| time_elapsed | 2695 |\n", + "| total_timesteps | 920000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.734 |\n", + "| explained_variance | 0.9870241 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45999 |\n", + "| policy_loss | 0.00297 |\n", + "| std | 0.202 |\n", + "| value_loss | 6.71e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 46100 |\n", + "| time_elapsed | 2701 |\n", + "| total_timesteps | 922000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.735 |\n", + "| explained_variance | 0.97447634 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46099 |\n", + "| policy_loss | -0.00677 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000132 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 46200 |\n", + "| time_elapsed | 2707 |\n", + "| total_timesteps | 924000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.748 |\n", + "| explained_variance | 0.9668383 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46199 |\n", + "| policy_loss | -0.00958 |\n", + "| std | 0.201 |\n", + "| value_loss | 0.00026 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.83 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46300 |\n", + "| time_elapsed | 2717 |\n", + "| total_timesteps | 926000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.753 |\n", + "| explained_variance | 0.9713844 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46299 |\n", + "| policy_loss | -0.00175 |\n", + "| std | 0.202 |\n", + "| value_loss | 0.000267 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46400 |\n", + "| time_elapsed | 2724 |\n", + "| total_timesteps | 928000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.744 |\n", + "| explained_variance | 0.9690941 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46399 |\n", + "| policy_loss | 0.00198 |\n", + "| std | 0.202 |\n", + "| value_loss | 9.71e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46500 |\n", + "| time_elapsed | 2730 |\n", + "| total_timesteps | 930000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.752 |\n", + "| explained_variance | 0.98169756 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46499 |\n", + "| policy_loss | -0.00182 |\n", + "| std | 0.201 |\n", + "| value_loss | 7.57e-05 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46600 |\n", + "| time_elapsed | 2736 |\n", + "| total_timesteps | 932000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.768 |\n", + "| explained_variance | 0.958521 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46599 |\n", + "| policy_loss | 0.00796 |\n", + "| std | 0.2 |\n", + "| value_loss | 0.000321 |\n", + "------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.54 |\n", + "| ep_rew_mean | -0.194 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46700 |\n", + "| time_elapsed | 2743 |\n", + "| total_timesteps | 934000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.771 |\n", + "| explained_variance | 0.9603 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46699 |\n", + "| policy_loss | 0.00811 |\n", + "| std | 0.2 |\n", + "| value_loss | 0.000171 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.6 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 339 |\n", + "| iterations | 46800 |\n", + "| time_elapsed | 2753 |\n", + "| total_timesteps | 936000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.768 |\n", + "| explained_variance | 0.9908145 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46799 |\n", + "| policy_loss | 0.00219 |\n", + "| std | 0.199 |\n", + "| value_loss | 7.84e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.198 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 46900 |\n", + "| time_elapsed | 2757 |\n", + "| total_timesteps | 938000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.781 |\n", + "| explained_variance | 0.9639614 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46899 |\n", + "| policy_loss | 0.0111 |\n", + "| std | 0.199 |\n", + "| value_loss | 0.000552 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.213 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47000 |\n", + "| time_elapsed | 2762 |\n", + "| total_timesteps | 940000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.787 |\n", + "| explained_variance | 0.97391367 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46999 |\n", + "| policy_loss | 0.00372 |\n", + "| std | 0.199 |\n", + "| value_loss | 0.000137 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.81 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47100 |\n", + "| time_elapsed | 2766 |\n", + "| total_timesteps | 942000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.788 |\n", + "| explained_variance | 0.97501403 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47099 |\n", + "| policy_loss | -0.0168 |\n", + "| std | 0.198 |\n", + "| value_loss | 0.000357 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.59 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47200 |\n", + "| time_elapsed | 2771 |\n", + "| total_timesteps | 944000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.786 |\n", + "| explained_variance | 0.7917006 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47199 |\n", + "| policy_loss | 0.0273 |\n", + "| std | 0.199 |\n", + "| value_loss | 0.00183 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.217 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47300 |\n", + "| time_elapsed | 2775 |\n", + "| total_timesteps | 946000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.784 |\n", + "| explained_variance | 0.9474554 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47299 |\n", + "| policy_loss | 0.0125 |\n", + "| std | 0.199 |\n", + "| value_loss | 0.000405 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.74 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 47400 |\n", + "| time_elapsed | 2779 |\n", + "| total_timesteps | 948000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.786 |\n", + "| explained_variance | 0.98800665 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47399 |\n", + "| policy_loss | 0.00237 |\n", + "| std | 0.198 |\n", + "| value_loss | 8.46e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.211 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47500 |\n", + "| time_elapsed | 2787 |\n", + "| total_timesteps | 950000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.781 |\n", + "| explained_variance | 0.9724678 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47499 |\n", + "| policy_loss | 0.0024 |\n", + "| std | 0.199 |\n", + "| value_loss | 0.000164 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 340 |\n", + "| iterations | 47600 |\n", + "| time_elapsed | 2792 |\n", + "| total_timesteps | 952000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.785 |\n", + "| explained_variance | 0.99027014 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47599 |\n", + "| policy_loss | -0.00857 |\n", + "| std | 0.198 |\n", + "| value_loss | 0.000118 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.8 |\n", + "| ep_rew_mean | -0.223 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 47700 |\n", + "| time_elapsed | 2796 |\n", + "| total_timesteps | 954000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.789 |\n", + "| explained_variance | 0.990915 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47699 |\n", + "| policy_loss | -0.00501 |\n", + "| std | 0.198 |\n", + "| value_loss | 6.49e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.69 |\n", + "| ep_rew_mean | -0.209 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 47800 |\n", + "| time_elapsed | 2800 |\n", + "| total_timesteps | 956000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.792 |\n", + "| explained_variance | 0.9743882 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47799 |\n", + "| policy_loss | -0.00831 |\n", + "| std | 0.198 |\n", + "| value_loss | 0.00032 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 47900 |\n", + "| time_elapsed | 2806 |\n", + "| total_timesteps | 958000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.814 |\n", + "| explained_variance | 0.9837645 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47899 |\n", + "| policy_loss | 0.00121 |\n", + "| std | 0.196 |\n", + "| value_loss | 7.55e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48000 |\n", + "| time_elapsed | 2810 |\n", + "| total_timesteps | 960000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.816 |\n", + "| explained_variance | 0.9931013 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47999 |\n", + "| policy_loss | -0.00325 |\n", + "| std | 0.196 |\n", + "| value_loss | 5.91e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.58 |\n", + "| ep_rew_mean | -0.197 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48100 |\n", + "| time_elapsed | 2815 |\n", + "| total_timesteps | 962000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.838 |\n", + "| explained_variance | 0.97392714 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48099 |\n", + "| policy_loss | -0.00442 |\n", + "| std | 0.195 |\n", + "| value_loss | 7.03e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.205 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48200 |\n", + "| time_elapsed | 2823 |\n", + "| total_timesteps | 964000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.853 |\n", + "| explained_variance | 0.9561405 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48199 |\n", + "| policy_loss | 0.0005 |\n", + "| std | 0.194 |\n", + "| value_loss | 0.000178 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.63 |\n", + "| ep_rew_mean | -0.195 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48300 |\n", + "| time_elapsed | 2827 |\n", + "| total_timesteps | 966000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.871 |\n", + "| explained_variance | 0.98603976 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48299 |\n", + "| policy_loss | 0.0108 |\n", + "| std | 0.193 |\n", + "| value_loss | 0.00013 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.206 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48400 |\n", + "| time_elapsed | 2832 |\n", + "| total_timesteps | 968000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.876 |\n", + "| explained_variance | 0.8844548 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48399 |\n", + "| policy_loss | -0.00342 |\n", + "| std | 0.192 |\n", + "| value_loss | 0.000601 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.61 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 341 |\n", + "| iterations | 48500 |\n", + "| time_elapsed | 2837 |\n", + "| total_timesteps | 970000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.892 |\n", + "| explained_variance | 0.97415155 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48499 |\n", + "| policy_loss | 0.00535 |\n", + "| std | 0.191 |\n", + "| value_loss | 0.00015 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.75 |\n", + "| ep_rew_mean | -0.215 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 48600 |\n", + "| time_elapsed | 2841 |\n", + "| total_timesteps | 972000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.903 |\n", + "| explained_variance | 0.9617835 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48599 |\n", + "| policy_loss | 0.0132 |\n", + "| std | 0.191 |\n", + "| value_loss | 0.000383 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.72 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 48700 |\n", + "| time_elapsed | 2846 |\n", + "| total_timesteps | 974000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.89 |\n", + "| explained_variance | 0.9830824 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48699 |\n", + "| policy_loss | 0.00573 |\n", + "| std | 0.191 |\n", + "| value_loss | 0.000103 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.214 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 48800 |\n", + "| time_elapsed | 2850 |\n", + "| total_timesteps | 976000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.887 |\n", + "| explained_variance | 0.9614461 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48799 |\n", + "| policy_loss | -0.0139 |\n", + "| std | 0.192 |\n", + "| value_loss | 0.000635 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.76 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 48900 |\n", + "| time_elapsed | 2858 |\n", + "| total_timesteps | 978000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.88 |\n", + "| explained_variance | 0.9425929 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48899 |\n", + "| policy_loss | 4.7e-05 |\n", + "| std | 0.192 |\n", + "| value_loss | 0.000399 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.62 |\n", + "| ep_rew_mean | -0.208 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49000 |\n", + "| time_elapsed | 2862 |\n", + "| total_timesteps | 980000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.873 |\n", + "| explained_variance | 0.9772742 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48999 |\n", + "| policy_loss | 0.014 |\n", + "| std | 0.193 |\n", + "| value_loss | 0.000161 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.5 |\n", + "| ep_rew_mean | -0.185 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49100 |\n", + "| time_elapsed | 2868 |\n", + "| total_timesteps | 982000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.878 |\n", + "| explained_variance | 0.97900677 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49099 |\n", + "| policy_loss | -0.00167 |\n", + "| std | 0.191 |\n", + "| value_loss | 0.000136 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.77 |\n", + "| ep_rew_mean | -0.216 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49200 |\n", + "| time_elapsed | 2872 |\n", + "| total_timesteps | 984000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.883 |\n", + "| explained_variance | 0.96298355 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49199 |\n", + "| policy_loss | -0.000752 |\n", + "| std | 0.191 |\n", + "| value_loss | 9.23e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.79 |\n", + "| ep_rew_mean | -0.222 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49300 |\n", + "| time_elapsed | 2876 |\n", + "| total_timesteps | 986000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.911 |\n", + "| explained_variance | 0.98365396 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49299 |\n", + "| policy_loss | -0.00456 |\n", + "| std | 0.189 |\n", + "| value_loss | 0.000266 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.68 |\n", + "| ep_rew_mean | -0.212 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49400 |\n", + "| time_elapsed | 2881 |\n", + "| total_timesteps | 988000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.931 |\n", + "| explained_variance | 0.95916724 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49399 |\n", + "| policy_loss | -0.00675 |\n", + "| std | 0.187 |\n", + "| value_loss | 0.000311 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.73 |\n", + "| ep_rew_mean | -0.219 |\n", + "| time/ | |\n", + "| fps | 343 |\n", + "| iterations | 49500 |\n", + "| time_elapsed | 2885 |\n", + "| total_timesteps | 990000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.918 |\n", + "| explained_variance | 0.99461377 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49499 |\n", + "| policy_loss | 0.0052 |\n", + "| std | 0.188 |\n", + "| value_loss | 5.61e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.67 |\n", + "| ep_rew_mean | -0.203 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49600 |\n", + "| time_elapsed | 2893 |\n", + "| total_timesteps | 992000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.942 |\n", + "| explained_variance | 0.9908923 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49599 |\n", + "| policy_loss | 0.00297 |\n", + "| std | 0.187 |\n", + "| value_loss | 9.09e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.51 |\n", + "| ep_rew_mean | -0.186 |\n", + "| time/ | |\n", + "| fps | 342 |\n", + "| iterations | 49700 |\n", + "| time_elapsed | 2898 |\n", + "| total_timesteps | 994000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.961 |\n", + "| explained_variance | 0.97241545 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49699 |\n", + "| policy_loss | -0.00635 |\n", + "| std | 0.186 |\n", + "| value_loss | 0.000121 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.71 |\n", + "| ep_rew_mean | -0.21 |\n", + "| time/ | |\n", + "| fps | 343 |\n", + "| iterations | 49800 |\n", + "| time_elapsed | 2902 |\n", + "| total_timesteps | 996000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.958 |\n", + "| explained_variance | 0.99666107 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49799 |\n", + "| policy_loss | -0.00314 |\n", + "| std | 0.186 |\n", + "| value_loss | 4e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.66 |\n", + "| ep_rew_mean | -0.2 |\n", + "| time/ | |\n", + "| fps | 343 |\n", + "| iterations | 49900 |\n", + "| time_elapsed | 2906 |\n", + "| total_timesteps | 998000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.943 |\n", + "| explained_variance | 0.9459344 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49899 |\n", + "| policy_loss | -0.00787 |\n", + "| std | 0.187 |\n", + "| value_loss | 0.000279 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 2.86 |\n", + "| ep_rew_mean | -0.22 |\n", + "| time/ | |\n", + "| fps | 343 |\n", + "| iterations | 50000 |\n", + "| time_elapsed | 2911 |\n", + "| total_timesteps | 1000000 |\n", + "| train/ | |\n", + "| entropy_loss | 0.965 |\n", + "| explained_variance | 0.93244386 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49999 |\n", + "| policy_loss | -0.0262 |\n", + "| std | 0.186 |\n", + "| value_loss | 0.00066 |\n", + "--------------------------------------\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.learn(1_000_000)" + ] }, { "cell_type": "code", + "execution_count": 28, + "metadata": { + "id": "MfYtjj19cKFr" + }, + "outputs": [], "source": [ "# Save the model and VecNormalize statistics when saving the agent\n", "model.save(\"a2c-PandaReachDense-v3\")\n", "env.save(\"vec_normalize.pkl\")" - ], - "metadata": { - "id": "MfYtjj19cKFr" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "01M9GCd32Ig-" + }, "source": [ "### Evaluate the agent 📈\n", "- Now that's our agent is trained, we need to **check its performance**.\n", "- Stable-Baselines3 provides a method to do that: `evaluate_policy`" - ], - "metadata": { - "id": "01M9GCd32Ig-" - } + ] }, { "cell_type": "code", + "execution_count": 29, + "metadata": { + "id": "liirTVoDkHq3" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "Mean reward = -0.26 +/- 0.09\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/stable_baselines3/common/evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.\n", + " warnings.warn(\n" + ] + } + ], "source": [ "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n", "\n", @@ -570,27 +9670,25 @@ "mean_reward, std_reward = evaluate_policy(model, eval_env)\n", "\n", "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")" - ], - "metadata": { - "id": "liirTVoDkHq3" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "44L9LVQaavR8" + }, "source": [ "### Publish your trained model on the Hub 🔥\n", "Now that we saw we got good results after the training, we can publish our trained model on the Hub with one line of code.\n", "\n", "📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20\n" - ], - "metadata": { - "id": "44L9LVQaavR8" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "MkMk99m8bgaQ" + }, "source": [ "By using `package_to_hub`, as we already mentionned in the former units, **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n", "\n", @@ -599,10 +9697,7 @@ "- You can **visualize your agent playing** 👀\n", "- You can **share with the community an agent that others can use** 💾\n", "- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n" - ], - "metadata": { - "id": "MkMk99m8bgaQ" - } + ] }, { "cell_type": "markdown", @@ -655,15 +9750,350 @@ }, { "cell_type": "markdown", - "source": [ - "For this environment, **running this cell can take approximately 10min**" - ], "metadata": { "id": "juxItTNf1W74" - } + }, + "source": [ + "For this environment, **running this cell can take approximately 10min**" + ] }, { "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "V1N8r8QVwcCE" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n", + "create a model card and push everything to the hub. It might take up to 1min.\n", + "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n", + "Saving video to /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n", + "MoviePy - Building video /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4.\n", + "MoviePy - Writing video /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " \r" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MoviePy - Done !\n", + "MoviePy - video ready /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n", + " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n", + " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n", + " libavutil 58. 29.100 / 58. 29.100\n", + " libavcodec 60. 31.102 / 60. 31.102\n", + " libavformat 60. 16.100 / 60. 16.100\n", + " libavdevice 60. 3.100 / 60. 3.100\n", + " libavfilter 9. 12.100 / 9. 12.100\n", + " libswscale 7. 5.100 / 7. 5.100\n", + " libswresample 4. 12.100 / 4. 12.100\n", + " libpostproc 57. 3.100 / 57. 3.100\n", + "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf61.1.100\n", + " Duration: 00:00:40.00, start: 0.000000, bitrate: 118 kb/s\n", + " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 720x480, 116 kb/s, 25 fps, 25 tbr, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc61.3.100 libx264\n", + "Stream mapping:\n", + " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n", + "Press [q] to stop, [?] for help\n", + "[libx264 @ 0x55777995a9c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n", + "[libx264 @ 0x55777995a9c0] profile High, level 3.0, 4:2:0, 8-bit\n", + "[libx264 @ 0x55777995a9c0] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n", + "Output #0, mp4, to '/tmp/tmpo6y5pqyw/replay.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf60.16.100\n", + " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 720x480, q=2-31, 25 fps, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc60.31.102 libx264\n", + " Side data:\n", + " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n", + "[out#0/mp4 @ 0x5577798d6080] video:551kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.198598%\n", + "frame= 1000 fps=870 q=-1.0 Lsize= 564kB time=00:00:39.88 bitrate= 115.8kbits/s speed=34.7x \n", + "[libx264 @ 0x55777995a9c0] frame I:4 Avg QP:14.60 size: 7429\n", + "[libx264 @ 0x55777995a9c0] frame P:297 Avg QP:23.56 size: 727\n", + "[libx264 @ 0x55777995a9c0] frame B:699 Avg QP:23.15 size: 455\n", + "[libx264 @ 0x55777995a9c0] consecutive B-frames: 1.9% 9.2% 16.5% 72.4%\n", + "[libx264 @ 0x55777995a9c0] mb I I16..4: 24.4% 58.9% 16.6%\n", + "[libx264 @ 0x55777995a9c0] mb P I16..4: 0.1% 0.5% 0.3% P16..4: 2.5% 1.1% 0.7% 0.0% 0.0% skip:94.7%\n", + "[libx264 @ 0x55777995a9c0] mb B I16..4: 0.1% 0.1% 0.2% B16..8: 3.2% 1.1% 0.4% direct: 0.1% skip:94.9% L0:55.3% L1:43.6% BI: 1.1%\n", + "[libx264 @ 0x55777995a9c0] 8x8 transform intra:47.8% inter:8.5%\n", + "[libx264 @ 0x55777995a9c0] coded y,uvDC,uvAC intra: 18.0% 5.8% 4.5% inter: 0.7% 0.0% 0.0%\n", + "[libx264 @ 0x55777995a9c0] i16 v,h,dc,p: 66% 15% 18% 0%\n", + "[libx264 @ 0x55777995a9c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 38% 11% 49% 0% 0% 0% 0% 0% 0%\n", + "[libx264 @ 0x55777995a9c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 17% 34% 3% 5% 5% 6% 3% 5%\n", + "[libx264 @ 0x55777995a9c0] i8c dc,h,v,p: 94% 3% 3% 0%\n", + "[libx264 @ 0x55777995a9c0] Weighted P-Frames: Y:0.0% UV:0.0%\n", + "[libx264 @ 0x55777995a9c0] ref P L0: 44.0% 2.7% 36.4% 16.9%\n", + "[libx264 @ 0x55777995a9c0] ref B L0: 67.2% 24.2% 8.7%\n", + "[libx264 @ 0x55777995a9c0] ref B L1: 94.0% 6.0%\n", + "[libx264 @ 0x55777995a9c0] kb/s:112.80\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Pushing repo turbo-maikol/a2c-PandaReachDense-v3 to the Hugging Face\n", + "Hub\u001b[0m\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing Files (0 / 0) : | | 0.00B / 0.00B \n", + "\u001b[A\n", + "Processing Files (1 / 1) : 0%| | 1.26kB / 789kB, ???B/s \n", + "\u001b[A\n", + "\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (1 / 6) : 70%|███████ | 553kB / 789kB, 690kB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (6 / 6) : 100%|██████████| 789kB / 789kB, 563kB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (6 / 6) : 100%|██████████| 789kB / 789kB, 394kB/s \n", + "New Data Upload : 100%|██████████| 788kB / 788kB, 394kB/s \n", + " ...ReachDense-v3/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n", + " ...aReachDense-v3/policy.optimizer.pth: 100%|██████████| 48.9kB / 48.9kB \n", + " ...w/a2c-PandaReachDense-v3/policy.pth: 100%|██████████| 46.8kB / 46.8kB \n", + " ...o6y5pqyw/a2c-PandaReachDense-v3.zip: 100%|██████████| 113kB / 113kB \n", + " /tmp/tmpo6y5pqyw/replay.mp4 : 100%|██████████| 577kB / 577kB \n", + " /tmp/tmpo6y5pqyw/vec_normalize.pkl : 100%|██████████| 2.61kB / 2.61kB \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n", + "https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3/tree/main/\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3/commit/ca3b9e054bb58644bb45ae278b3f9887e1f7081d', commit_message='Initial commit', commit_description='', oid='ca3b9e054bb58644bb45ae278b3f9887e1f7081d', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/a2c-PandaReachDense-v3'), pr_revision=None, pr_num=None)" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "from huggingface_sb3 import package_to_hub\n", "\n", @@ -673,18 +10103,16 @@ " model_architecture=\"A2C\",\n", " env_id=env_id,\n", " eval_env=eval_env,\n", - " repo_id=f\"ThomasSimonini/a2c-{env_id}\", # Change the username\n", + " repo_id=f\"turbo-maikol/a2c-{env_id}\", # Change the username\n", " commit_message=\"Initial commit\",\n", ")" - ], - "metadata": { - "id": "V1N8r8QVwcCE" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "G3xy3Nf3c2O1" + }, "source": [ "## Some additional challenges 🏆\n", "The best way to learn **is to try things by your own**! Why not trying `PandaPickAndPlace-v3`?\n", @@ -705,22 +10133,9436 @@ "6. Save the model and VecNormalize statistics when saving the agent\n", "7. Evaluate your agent\n", "8. Publish your trained model on the Hub 🔥 with `package_to_hub`\n" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "Using cuda device\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 247 |\n", + "| iterations | 100 |\n", + "| time_elapsed | 8 |\n", + "| total_timesteps | 2000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.92173636 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 99 |\n", + "| policy_loss | -0.453 |\n", + "| std | 1 |\n", + "| value_loss | 0.0769 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 269 |\n", + "| iterations | 200 |\n", + "| time_elapsed | 14 |\n", + "| total_timesteps | 4000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.67 |\n", + "| explained_variance | 0.9866529 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 199 |\n", + "| policy_loss | -1.13 |\n", + "| std | 0.999 |\n", + "| value_loss | 0.0935 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 300 |\n", + "| time_elapsed | 21 |\n", + "| total_timesteps | 6000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.66 |\n", + "| explained_variance | 0.91406125 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 299 |\n", + "| policy_loss | -1.29 |\n", + "| std | 0.997 |\n", + "| value_loss | 0.106 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 254 |\n", + "| iterations | 400 |\n", + "| time_elapsed | 31 |\n", + "| total_timesteps | 8000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.97533536 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 399 |\n", + "| policy_loss | 0.149 |\n", + "| std | 1 |\n", + "| value_loss | 0.0134 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 264 |\n", + "| iterations | 500 |\n", + "| time_elapsed | 37 |\n", + "| total_timesteps | 10000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.71 |\n", + "| explained_variance | 0.97877157 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 499 |\n", + "| policy_loss | 0.671 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0334 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 270 |\n", + "| iterations | 600 |\n", + "| time_elapsed | 44 |\n", + "| total_timesteps | 12000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.72 |\n", + "| explained_variance | 0.941841 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 599 |\n", + "| policy_loss | -0.656 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0444 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 700 |\n", + "| time_elapsed | 50 |\n", + "| total_timesteps | 14000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.72 |\n", + "| explained_variance | 0.7981684 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 699 |\n", + "| policy_loss | -0.123 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0345 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 800 |\n", + "| time_elapsed | 57 |\n", + "| total_timesteps | 16000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.71 |\n", + "| explained_variance | 0.7265997 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 799 |\n", + "| policy_loss | 0.379 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0249 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 267 |\n", + "| iterations | 900 |\n", + "| time_elapsed | 67 |\n", + "| total_timesteps | 18000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.89900863 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 899 |\n", + "| policy_loss | -0.36 |\n", + "| std | 1 |\n", + "| value_loss | 0.0117 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 269 |\n", + "| iterations | 1000 |\n", + "| time_elapsed | 74 |\n", + "| total_timesteps | 20000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.9879093 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 999 |\n", + "| policy_loss | -0.238 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.0122 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 271 |\n", + "| iterations | 1100 |\n", + "| time_elapsed | 81 |\n", + "| total_timesteps | 22000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.67 |\n", + "| explained_variance | 0.96510875 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1099 |\n", + "| policy_loss | -0.0184 |\n", + "| std | 0.998 |\n", + "| value_loss | 0.0225 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 1200 |\n", + "| time_elapsed | 87 |\n", + "| total_timesteps | 24000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.98142165 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1199 |\n", + "| policy_loss | -0.408 |\n", + "| std | 1 |\n", + "| value_loss | 0.0161 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 1300 |\n", + "| time_elapsed | 94 |\n", + "| total_timesteps | 26000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.8481641 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1299 |\n", + "| policy_loss | -0.445 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.00926 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 266 |\n", + "| iterations | 1400 |\n", + "| time_elapsed | 105 |\n", + "| total_timesteps | 28000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.24699801 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1399 |\n", + "| policy_loss | -0.0865 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.00277 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 266 |\n", + "| iterations | 1500 |\n", + "| time_elapsed | 112 |\n", + "| total_timesteps | 30000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.98543787 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1499 |\n", + "| policy_loss | -0.122 |\n", + "| std | 1 |\n", + "| value_loss | 0.00198 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 268 |\n", + "| iterations | 1600 |\n", + "| time_elapsed | 119 |\n", + "| total_timesteps | 32000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.97692937 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1599 |\n", + "| policy_loss | 0.102 |\n", + "| std | 1 |\n", + "| value_loss | 0.00283 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 269 |\n", + "| iterations | 1700 |\n", + "| time_elapsed | 126 |\n", + "| total_timesteps | 34000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.9177654 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1699 |\n", + "| policy_loss | -0.247 |\n", + "| std | 1 |\n", + "| value_loss | 0.00606 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 264 |\n", + "| iterations | 1800 |\n", + "| time_elapsed | 135 |\n", + "| total_timesteps | 36000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.88942945 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1799 |\n", + "| policy_loss | -0.0544 |\n", + "| std | 1 |\n", + "| value_loss | 0.00655 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 268 |\n", + "| iterations | 1900 |\n", + "| time_elapsed | 141 |\n", + "| total_timesteps | 38000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.9895952 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1899 |\n", + "| policy_loss | 0.179 |\n", + "| std | 1 |\n", + "| value_loss | 0.00177 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 271 |\n", + "| iterations | 2000 |\n", + "| time_elapsed | 147 |\n", + "| total_timesteps | 40000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.7657582 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 1999 |\n", + "| policy_loss | 0.267 |\n", + "| std | 1 |\n", + "| value_loss | 0.00616 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 2100 |\n", + "| time_elapsed | 153 |\n", + "| total_timesteps | 42000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.67 |\n", + "| explained_variance | 0.9649579 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2099 |\n", + "| policy_loss | -0.0232 |\n", + "| std | 0.998 |\n", + "| value_loss | 0.000706 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 2200 |\n", + "| time_elapsed | 159 |\n", + "| total_timesteps | 44000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.67 |\n", + "| explained_variance | 0.9855432 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2199 |\n", + "| policy_loss | -0.388 |\n", + "| std | 0.999 |\n", + "| value_loss | 0.0113 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 2300 |\n", + "| time_elapsed | 166 |\n", + "| total_timesteps | 46000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.7222178 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2299 |\n", + "| policy_loss | -0.0183 |\n", + "| std | 1 |\n", + "| value_loss | 0.00072 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 2400 |\n", + "| time_elapsed | 175 |\n", + "| total_timesteps | 48000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.98888546 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2399 |\n", + "| policy_loss | -0.238 |\n", + "| std | 1 |\n", + "| value_loss | 0.00958 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 2500 |\n", + "| time_elapsed | 182 |\n", + "| total_timesteps | 50000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.96954125 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2499 |\n", + "| policy_loss | -0.0431 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.000864 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 2600 |\n", + "| time_elapsed | 188 |\n", + "| total_timesteps | 52000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.69 |\n", + "| explained_variance | 0.96610194 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2599 |\n", + "| policy_loss | -0.105 |\n", + "| std | 1 |\n", + "| value_loss | 0.00381 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 2700 |\n", + "| time_elapsed | 195 |\n", + "| total_timesteps | 54000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.9916272 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2699 |\n", + "| policy_loss | 0.0748 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.00139 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 2800 |\n", + "| time_elapsed | 201 |\n", + "| total_timesteps | 56000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.7 |\n", + "| explained_variance | 0.96441084 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2799 |\n", + "| policy_loss | 0.1 |\n", + "| std | 1.01 |\n", + "| value_loss | 0.00154 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 2900 |\n", + "| time_elapsed | 211 |\n", + "| total_timesteps | 58000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.68 |\n", + "| explained_variance | 0.9759128 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2899 |\n", + "| policy_loss | -0.0517 |\n", + "| std | 1 |\n", + "| value_loss | 0.00165 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 3000 |\n", + "| time_elapsed | 217 |\n", + "| total_timesteps | 60000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.66 |\n", + "| explained_variance | 0.9729539 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 2999 |\n", + "| policy_loss | 0.032 |\n", + "| std | 0.997 |\n", + "| value_loss | 0.000864 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 3100 |\n", + "| time_elapsed | 224 |\n", + "| total_timesteps | 62000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.66 |\n", + "| explained_variance | 0.9806961 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3099 |\n", + "| policy_loss | -0.0191 |\n", + "| std | 0.995 |\n", + "| value_loss | 0.000556 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 3200 |\n", + "| time_elapsed | 231 |\n", + "| total_timesteps | 64000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.64 |\n", + "| explained_variance | 0.9768019 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3199 |\n", + "| policy_loss | -0.0979 |\n", + "| std | 0.991 |\n", + "| value_loss | 0.00653 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 3300 |\n", + "| time_elapsed | 237 |\n", + "| total_timesteps | 66000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.61 |\n", + "| explained_variance | 0.98892736 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3299 |\n", + "| policy_loss | -0.292 |\n", + "| std | 0.984 |\n", + "| value_loss | 0.0137 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 3400 |\n", + "| time_elapsed | 247 |\n", + "| total_timesteps | 68000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.9900239 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3399 |\n", + "| policy_loss | 0.00371 |\n", + "| std | 0.98 |\n", + "| value_loss | 0.000155 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 3500 |\n", + "| time_elapsed | 253 |\n", + "| total_timesteps | 70000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.9155875 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3499 |\n", + "| policy_loss | 0.14 |\n", + "| std | 0.978 |\n", + "| value_loss | 0.00183 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 3600 |\n", + "| time_elapsed | 260 |\n", + "| total_timesteps | 72000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.58 |\n", + "| explained_variance | 0.9748997 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3599 |\n", + "| policy_loss | -0.173 |\n", + "| std | 0.976 |\n", + "| value_loss | 0.00162 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 3700 |\n", + "| time_elapsed | 266 |\n", + "| total_timesteps | 74000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.9340458 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3699 |\n", + "| policy_loss | -0.0189 |\n", + "| std | 0.978 |\n", + "| value_loss | 0.00181 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 3800 |\n", + "| time_elapsed | 272 |\n", + "| total_timesteps | 76000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.96108234 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3799 |\n", + "| policy_loss | 0.0171 |\n", + "| std | 0.979 |\n", + "| value_loss | 0.000512 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 3900 |\n", + "| time_elapsed | 282 |\n", + "| total_timesteps | 78000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.62 |\n", + "| explained_variance | 0.92484015 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3899 |\n", + "| policy_loss | -0.0346 |\n", + "| std | 0.986 |\n", + "| value_loss | 0.000586 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 4000 |\n", + "| time_elapsed | 289 |\n", + "| total_timesteps | 80000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.63 |\n", + "| explained_variance | 0.99392503 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 3999 |\n", + "| policy_loss | 0.0322 |\n", + "| std | 0.988 |\n", + "| value_loss | 0.000124 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 4100 |\n", + "| time_elapsed | 296 |\n", + "| total_timesteps | 82000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.6 |\n", + "| explained_variance | 0.93192434 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4099 |\n", + "| policy_loss | 0.122 |\n", + "| std | 0.981 |\n", + "| value_loss | 0.0031 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 4200 |\n", + "| time_elapsed | 302 |\n", + "| total_timesteps | 84000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.9695567 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4199 |\n", + "| policy_loss | 0.013 |\n", + "| std | 0.98 |\n", + "| value_loss | 0.000614 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 4300 |\n", + "| time_elapsed | 308 |\n", + "| total_timesteps | 86000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.58 |\n", + "| explained_variance | 0.90897274 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4299 |\n", + "| policy_loss | 0.0164 |\n", + "| std | 0.976 |\n", + "| value_loss | 0.00136 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 4400 |\n", + "| time_elapsed | 318 |\n", + "| total_timesteps | 88000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.57 |\n", + "| explained_variance | -0.491444 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4399 |\n", + "| policy_loss | 0.313 |\n", + "| std | 0.976 |\n", + "| value_loss | 0.00968 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 4500 |\n", + "| time_elapsed | 324 |\n", + "| total_timesteps | 90000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.57 |\n", + "| explained_variance | 0.9918581 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4499 |\n", + "| policy_loss | 0.00828 |\n", + "| std | 0.975 |\n", + "| value_loss | 0.000251 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 4600 |\n", + "| time_elapsed | 330 |\n", + "| total_timesteps | 92000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.9781639 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4599 |\n", + "| policy_loss | -0.0435 |\n", + "| std | 0.98 |\n", + "| value_loss | 0.00035 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 4700 |\n", + "| time_elapsed | 337 |\n", + "| total_timesteps | 94000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.6 |\n", + "| explained_variance | 0.11130333 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4699 |\n", + "| policy_loss | -1.37 |\n", + "| std | 0.982 |\n", + "| value_loss | 0.515 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 4800 |\n", + "| time_elapsed | 344 |\n", + "| total_timesteps | 96000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.871283 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4799 |\n", + "| policy_loss | -0.17 |\n", + "| std | 0.98 |\n", + "| value_loss | 0.0107 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 4900 |\n", + "| time_elapsed | 355 |\n", + "| total_timesteps | 98000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.58 |\n", + "| explained_variance | 0.99265605 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4899 |\n", + "| policy_loss | 0.0609 |\n", + "| std | 0.977 |\n", + "| value_loss | 0.000309 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 5000 |\n", + "| time_elapsed | 361 |\n", + "| total_timesteps | 100000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.90010345 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 4999 |\n", + "| policy_loss | 0.0621 |\n", + "| std | 0.979 |\n", + "| value_loss | 0.000767 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 5100 |\n", + "| time_elapsed | 368 |\n", + "| total_timesteps | 102000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.47578835 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5099 |\n", + "| policy_loss | -0.121 |\n", + "| std | 0.979 |\n", + "| value_loss | 0.000866 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 5200 |\n", + "| time_elapsed | 374 |\n", + "| total_timesteps | 104000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.59 |\n", + "| explained_variance | 0.8169235 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5199 |\n", + "| policy_loss | -0.177 |\n", + "| std | 0.98 |\n", + "| value_loss | 0.00596 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 5300 |\n", + "| time_elapsed | 381 |\n", + "| total_timesteps | 106000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.58 |\n", + "| explained_variance | 0.5145811 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5299 |\n", + "| policy_loss | -0.316 |\n", + "| std | 0.976 |\n", + "| value_loss | 0.0262 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 5400 |\n", + "| time_elapsed | 391 |\n", + "| total_timesteps | 108000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.57 |\n", + "| explained_variance | 0.99181134 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5399 |\n", + "| policy_loss | -0.00888 |\n", + "| std | 0.975 |\n", + "| value_loss | 0.0171 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 5500 |\n", + "| time_elapsed | 398 |\n", + "| total_timesteps | 110000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.56 |\n", + "| explained_variance | 0.76695526 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5499 |\n", + "| policy_loss | -0.0179 |\n", + "| std | 0.973 |\n", + "| value_loss | 0.000661 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 5600 |\n", + "| time_elapsed | 405 |\n", + "| total_timesteps | 112000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.54 |\n", + "| explained_variance | 0.98992884 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5599 |\n", + "| policy_loss | 0.00993 |\n", + "| std | 0.968 |\n", + "| value_loss | 9.45e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 5700 |\n", + "| time_elapsed | 412 |\n", + "| total_timesteps | 114000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.54 |\n", + "| explained_variance | 0.9023917 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5699 |\n", + "| policy_loss | -0.0113 |\n", + "| std | 0.967 |\n", + "| value_loss | 5.5e-05 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 5800 |\n", + "| time_elapsed | 423 |\n", + "| total_timesteps | 116000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.52 |\n", + "| explained_variance | -0.55550146 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5799 |\n", + "| policy_loss | 0.0259 |\n", + "| std | 0.962 |\n", + "| value_loss | 0.000285 |\n", + "---------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 5900 |\n", + "| time_elapsed | 430 |\n", + "| total_timesteps | 118000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.52 |\n", + "| explained_variance | 0.579239 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5899 |\n", + "| policy_loss | 0.468 |\n", + "| std | 0.961 |\n", + "| value_loss | 0.0336 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 6000 |\n", + "| time_elapsed | 437 |\n", + "| total_timesteps | 120000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.52 |\n", + "| explained_variance | 0.98050183 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 5999 |\n", + "| policy_loss | -0.0048 |\n", + "| std | 0.962 |\n", + "| value_loss | 1.05e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 6100 |\n", + "| time_elapsed | 443 |\n", + "| total_timesteps | 122000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.5 |\n", + "| explained_variance | -18.419111 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6099 |\n", + "| policy_loss | 0.408 |\n", + "| std | 0.958 |\n", + "| value_loss | 0.0274 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 6200 |\n", + "| time_elapsed | 450 |\n", + "| total_timesteps | 124000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.5 |\n", + "| explained_variance | 0.8330802 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6199 |\n", + "| policy_loss | 0.269 |\n", + "| std | 0.959 |\n", + "| value_loss | 0.00514 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 6300 |\n", + "| time_elapsed | 460 |\n", + "| total_timesteps | 126000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.51 |\n", + "| explained_variance | 0.93740755 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6299 |\n", + "| policy_loss | -0.0936 |\n", + "| std | 0.959 |\n", + "| value_loss | 0.000951 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 6400 |\n", + "| time_elapsed | 467 |\n", + "| total_timesteps | 128000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.51 |\n", + "| explained_variance | 0.9103676 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6399 |\n", + "| policy_loss | -0.0194 |\n", + "| std | 0.96 |\n", + "| value_loss | 3.56e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 6500 |\n", + "| time_elapsed | 473 |\n", + "| total_timesteps | 130000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.52 |\n", + "| explained_variance | 0.94888294 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6499 |\n", + "| policy_loss | -0.0402 |\n", + "| std | 0.961 |\n", + "| value_loss | 0.00023 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 6600 |\n", + "| time_elapsed | 480 |\n", + "| total_timesteps | 132000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.5 |\n", + "| explained_variance | -5.1157684 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6599 |\n", + "| policy_loss | -0.0434 |\n", + "| std | 0.959 |\n", + "| value_loss | 0.0138 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 6700 |\n", + "| time_elapsed | 487 |\n", + "| total_timesteps | 134000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.5 |\n", + "| explained_variance | 0.94738376 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6699 |\n", + "| policy_loss | -0.0506 |\n", + "| std | 0.958 |\n", + "| value_loss | 0.000277 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 6800 |\n", + "| time_elapsed | 497 |\n", + "| total_timesteps | 136000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.5 |\n", + "| explained_variance | 0.9724636 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6799 |\n", + "| policy_loss | 0.0752 |\n", + "| std | 0.957 |\n", + "| value_loss | 0.254 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 6900 |\n", + "| time_elapsed | 504 |\n", + "| total_timesteps | 138000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.47 |\n", + "| explained_variance | 0.79116476 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6899 |\n", + "| policy_loss | -0.0356 |\n", + "| std | 0.951 |\n", + "| value_loss | 0.00429 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 7000 |\n", + "| time_elapsed | 510 |\n", + "| total_timesteps | 140000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.47 |\n", + "| explained_variance | 0.9288944 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 6999 |\n", + "| policy_loss | -0.0441 |\n", + "| std | 0.951 |\n", + "| value_loss | 0.00104 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 7100 |\n", + "| time_elapsed | 516 |\n", + "| total_timesteps | 142000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.5042453 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7099 |\n", + "| policy_loss | 0.0777 |\n", + "| std | 0.948 |\n", + "| value_loss | 0.000893 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 7200 |\n", + "| time_elapsed | 523 |\n", + "| total_timesteps | 144000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.20719147 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7199 |\n", + "| policy_loss | -0.131 |\n", + "| std | 0.949 |\n", + "| value_loss | 0.00243 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 7300 |\n", + "| time_elapsed | 533 |\n", + "| total_timesteps | 146000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.93220496 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7299 |\n", + "| policy_loss | 0.0805 |\n", + "| std | 0.948 |\n", + "| value_loss | 0.001 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 7400 |\n", + "| time_elapsed | 540 |\n", + "| total_timesteps | 148000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.48 |\n", + "| explained_variance | 0.84394395 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7399 |\n", + "| policy_loss | -0.115 |\n", + "| std | 0.953 |\n", + "| value_loss | 0.00974 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 7500 |\n", + "| time_elapsed | 546 |\n", + "| total_timesteps | 150000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.48 |\n", + "| explained_variance | 0.96859866 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7499 |\n", + "| policy_loss | -0.000996 |\n", + "| std | 0.953 |\n", + "| value_loss | 3.46e-05 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 7600 |\n", + "| time_elapsed | 553 |\n", + "| total_timesteps | 152000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.114243984 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7599 |\n", + "| policy_loss | -0.0376 |\n", + "| std | 0.949 |\n", + "| value_loss | 0.00137 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 7700 |\n", + "| time_elapsed | 560 |\n", + "| total_timesteps | 154000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.44 |\n", + "| explained_variance | -5.899102 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7699 |\n", + "| policy_loss | 0.106 |\n", + "| std | 0.942 |\n", + "| value_loss | 0.00164 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 7800 |\n", + "| time_elapsed | 570 |\n", + "| total_timesteps | 156000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.43601793 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7799 |\n", + "| policy_loss | 0.206 |\n", + "| std | 0.947 |\n", + "| value_loss | 0.00701 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 7900 |\n", + "| time_elapsed | 576 |\n", + "| total_timesteps | 158000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.9624703 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7899 |\n", + "| policy_loss | -0.00135 |\n", + "| std | 0.949 |\n", + "| value_loss | 3.39e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.6 |\n", + "| ep_rew_mean | -49.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 8000 |\n", + "| time_elapsed | 583 |\n", + "| total_timesteps | 160000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.8270585 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 7999 |\n", + "| policy_loss | 0.00402 |\n", + "| std | 0.949 |\n", + "| value_loss | 3.21e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 8100 |\n", + "| time_elapsed | 590 |\n", + "| total_timesteps | 162000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.47 |\n", + "| explained_variance | -2.4589894 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8099 |\n", + "| policy_loss | -0.012 |\n", + "| std | 0.951 |\n", + "| value_loss | 0.000215 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 8200 |\n", + "| time_elapsed | 600 |\n", + "| total_timesteps | 164000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.92311984 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8199 |\n", + "| policy_loss | 0.0122 |\n", + "| std | 0.948 |\n", + "| value_loss | 1.28e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 8300 |\n", + "| time_elapsed | 607 |\n", + "| total_timesteps | 166000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.45 |\n", + "| explained_variance | 0.49472392 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8299 |\n", + "| policy_loss | -0.0236 |\n", + "| std | 0.946 |\n", + "| value_loss | 0.000205 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 8400 |\n", + "| time_elapsed | 613 |\n", + "| total_timesteps | 168000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | -1.6017191 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8399 |\n", + "| policy_loss | -0.00243 |\n", + "| std | 0.947 |\n", + "| value_loss | 0.00038 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 8500 |\n", + "| time_elapsed | 620 |\n", + "| total_timesteps | 170000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.44 |\n", + "| explained_variance | 0.4432842 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8499 |\n", + "| policy_loss | -0.0109 |\n", + "| std | 0.944 |\n", + "| value_loss | 4.59e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 8600 |\n", + "| time_elapsed | 627 |\n", + "| total_timesteps | 172000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.45 |\n", + "| explained_variance | 0.2270329 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8599 |\n", + "| policy_loss | -0.0348 |\n", + "| std | 0.945 |\n", + "| value_loss | 6e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 8700 |\n", + "| time_elapsed | 637 |\n", + "| total_timesteps | 174000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.44 |\n", + "| explained_variance | 0.6000677 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8699 |\n", + "| policy_loss | 0.00449 |\n", + "| std | 0.944 |\n", + "| value_loss | 1.61e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 8800 |\n", + "| time_elapsed | 644 |\n", + "| total_timesteps | 176000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.44 |\n", + "| explained_variance | 0.35898274 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8799 |\n", + "| policy_loss | 0.0663 |\n", + "| std | 0.943 |\n", + "| value_loss | 0.000623 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 8900 |\n", + "| time_elapsed | 651 |\n", + "| total_timesteps | 178000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.46 |\n", + "| explained_variance | 0.23337007 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8899 |\n", + "| policy_loss | 0.0131 |\n", + "| std | 0.947 |\n", + "| value_loss | 3.51e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9000 |\n", + "| time_elapsed | 658 |\n", + "| total_timesteps | 180000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.45 |\n", + "| explained_variance | 0.7281234 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 8999 |\n", + "| policy_loss | 0.133 |\n", + "| std | 0.946 |\n", + "| value_loss | 0.00179 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9100 |\n", + "| time_elapsed | 665 |\n", + "| total_timesteps | 182000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.47 |\n", + "| explained_variance | -9.7073145 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9099 |\n", + "| policy_loss | -0.015 |\n", + "| std | 0.949 |\n", + "| value_loss | 3.28e-05 |\n", + "--------------------------------------\n", + "-----------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 9200 |\n", + "| time_elapsed | 675 |\n", + "| total_timesteps | 184000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.47 |\n", + "| explained_variance | -0.0025390387 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9199 |\n", + "| policy_loss | 0.113 |\n", + "| std | 0.951 |\n", + "| value_loss | 0.00119 |\n", + "-----------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 9300 |\n", + "| time_elapsed | 681 |\n", + "| total_timesteps | 186000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.44 |\n", + "| explained_variance | 0.6690495 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9299 |\n", + "| policy_loss | -0.00437 |\n", + "| std | 0.944 |\n", + "| value_loss | 4.59e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9400 |\n", + "| time_elapsed | 688 |\n", + "| total_timesteps | 188000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.43 |\n", + "| explained_variance | 0.7185387 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9399 |\n", + "| policy_loss | 0.00195 |\n", + "| std | 0.941 |\n", + "| value_loss | 6.26e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9500 |\n", + "| time_elapsed | 694 |\n", + "| total_timesteps | 190000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.43 |\n", + "| explained_variance | -0.9873673 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9499 |\n", + "| policy_loss | -0.0856 |\n", + "| std | 0.94 |\n", + "| value_loss | 0.000481 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9600 |\n", + "| time_elapsed | 701 |\n", + "| total_timesteps | 192000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.4 |\n", + "| explained_variance | 0.78831863 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9599 |\n", + "| policy_loss | -0.0148 |\n", + "| std | 0.934 |\n", + "| value_loss | 3.55e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.2 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 9700 |\n", + "| time_elapsed | 711 |\n", + "| total_timesteps | 194000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.4 |\n", + "| explained_variance | 0.95177764 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9699 |\n", + "| policy_loss | 0.0071 |\n", + "| std | 0.935 |\n", + "| value_loss | 2.33e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9800 |\n", + "| time_elapsed | 717 |\n", + "| total_timesteps | 196000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.37 |\n", + "| explained_variance | 0.9991041 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9799 |\n", + "| policy_loss | -0.0138 |\n", + "| std | 0.928 |\n", + "| value_loss | 1.32e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 9900 |\n", + "| time_elapsed | 724 |\n", + "| total_timesteps | 198000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.39 |\n", + "| explained_variance | 0.98675823 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9899 |\n", + "| policy_loss | -0.00244 |\n", + "| std | 0.933 |\n", + "| value_loss | 2.88e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10000 |\n", + "| time_elapsed | 731 |\n", + "| total_timesteps | 200000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.39 |\n", + "| explained_variance | 0.8423076 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 9999 |\n", + "| policy_loss | 0.0111 |\n", + "| std | 0.933 |\n", + "| value_loss | 0.000101 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10100 |\n", + "| time_elapsed | 737 |\n", + "| total_timesteps | 202000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.38 |\n", + "| explained_variance | 0.9848204 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10099 |\n", + "| policy_loss | -0.0111 |\n", + "| std | 0.929 |\n", + "| value_loss | 8.33e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 10200 |\n", + "| time_elapsed | 748 |\n", + "| total_timesteps | 204000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.35 |\n", + "| explained_variance | 0.9231719 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10199 |\n", + "| policy_loss | -0.00118 |\n", + "| std | 0.923 |\n", + "| value_loss | 1.12e-05 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10300 |\n", + "| time_elapsed | 754 |\n", + "| total_timesteps | 206000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.35 |\n", + "| explained_variance | 0.056429803 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10299 |\n", + "| policy_loss | 0.0131 |\n", + "| std | 0.923 |\n", + "| value_loss | 0.000336 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10400 |\n", + "| time_elapsed | 761 |\n", + "| total_timesteps | 208000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.35 |\n", + "| explained_variance | 0.9514538 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10399 |\n", + "| policy_loss | -0.00282 |\n", + "| std | 0.923 |\n", + "| value_loss | 4.15e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10500 |\n", + "| time_elapsed | 768 |\n", + "| total_timesteps | 210000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.34 |\n", + "| explained_variance | 0.94718796 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10499 |\n", + "| policy_loss | -0.00272 |\n", + "| std | 0.921 |\n", + "| value_loss | 5.42e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10600 |\n", + "| time_elapsed | 774 |\n", + "| total_timesteps | 212000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.35 |\n", + "| explained_variance | 0.8666384 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10599 |\n", + "| policy_loss | -0.0275 |\n", + "| std | 0.923 |\n", + "| value_loss | 5.03e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 10700 |\n", + "| time_elapsed | 784 |\n", + "| total_timesteps | 214000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.35 |\n", + "| explained_variance | -0.4472072 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10699 |\n", + "| policy_loss | -0.00202 |\n", + "| std | 0.923 |\n", + "| value_loss | 3e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10800 |\n", + "| time_elapsed | 790 |\n", + "| total_timesteps | 216000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.36 |\n", + "| explained_variance | 0.5958526 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10799 |\n", + "| policy_loss | 0.0985 |\n", + "| std | 0.923 |\n", + "| value_loss | 0.00219 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 10900 |\n", + "| time_elapsed | 797 |\n", + "| total_timesteps | 218000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.36 |\n", + "| explained_variance | 0.9833134 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10899 |\n", + "| policy_loss | 0.0262 |\n", + "| std | 0.924 |\n", + "| value_loss | 3.51e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11000 |\n", + "| time_elapsed | 803 |\n", + "| total_timesteps | 220000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.37 |\n", + "| explained_variance | 0.80022764 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 10999 |\n", + "| policy_loss | 0.0101 |\n", + "| std | 0.927 |\n", + "| value_loss | 1.26e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 11100 |\n", + "| time_elapsed | 809 |\n", + "| total_timesteps | 222000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.39 |\n", + "| explained_variance | 0.86710656 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11099 |\n", + "| policy_loss | -0.00484 |\n", + "| std | 0.932 |\n", + "| value_loss | 5.19e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11200 |\n", + "| time_elapsed | 819 |\n", + "| total_timesteps | 224000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.38 |\n", + "| explained_variance | 0.9757535 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11199 |\n", + "| policy_loss | 0.00393 |\n", + "| std | 0.928 |\n", + "| value_loss | 4.17e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11300 |\n", + "| time_elapsed | 826 |\n", + "| total_timesteps | 226000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.37 |\n", + "| explained_variance | 0.9639573 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11299 |\n", + "| policy_loss | -0.00267 |\n", + "| std | 0.928 |\n", + "| value_loss | 2.23e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11400 |\n", + "| time_elapsed | 833 |\n", + "| total_timesteps | 228000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.38 |\n", + "| explained_variance | -2.9878726 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11399 |\n", + "| policy_loss | -0.00966 |\n", + "| std | 0.929 |\n", + "| value_loss | 1.41e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11500 |\n", + "| time_elapsed | 839 |\n", + "| total_timesteps | 230000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.37 |\n", + "| explained_variance | 0.61145973 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11499 |\n", + "| policy_loss | 0.000605 |\n", + "| std | 0.927 |\n", + "| value_loss | 1.67e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 11600 |\n", + "| time_elapsed | 846 |\n", + "| total_timesteps | 232000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.34 |\n", + "| explained_variance | -0.48529482 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11599 |\n", + "| policy_loss | -0.00852 |\n", + "| std | 0.92 |\n", + "| value_loss | 0.000263 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11700 |\n", + "| time_elapsed | 856 |\n", + "| total_timesteps | 234000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.32 |\n", + "| explained_variance | 0.5111707 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11699 |\n", + "| policy_loss | 0.0447 |\n", + "| std | 0.916 |\n", + "| value_loss | 0.00013 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.1 |\n", + "| ep_rew_mean | -46.1 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11800 |\n", + "| time_elapsed | 863 |\n", + "| total_timesteps | 236000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.33 |\n", + "| explained_variance | -0.15370154 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11799 |\n", + "| policy_loss | -0.0161 |\n", + "| std | 0.918 |\n", + "| value_loss | 2.98e-05 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 11900 |\n", + "| time_elapsed | 869 |\n", + "| total_timesteps | 238000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.31 |\n", + "| explained_variance | 0.80145425 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11899 |\n", + "| policy_loss | -0.00846 |\n", + "| std | 0.914 |\n", + "| value_loss | 1.04e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 12000 |\n", + "| time_elapsed | 875 |\n", + "| total_timesteps | 240000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.31 |\n", + "| explained_variance | 0.7291146 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 11999 |\n", + "| policy_loss | 0.0131 |\n", + "| std | 0.914 |\n", + "| value_loss | 1.66e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12100 |\n", + "| time_elapsed | 882 |\n", + "| total_timesteps | 242000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.31 |\n", + "| explained_variance | 0.9325699 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12099 |\n", + "| policy_loss | -0.00958 |\n", + "| std | 0.913 |\n", + "| value_loss | 9.12e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 12200 |\n", + "| time_elapsed | 892 |\n", + "| total_timesteps | 244000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.29 |\n", + "| explained_variance | 0.9233826 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12199 |\n", + "| policy_loss | 0.00747 |\n", + "| std | 0.908 |\n", + "| value_loss | 5.35e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 12300 |\n", + "| time_elapsed | 898 |\n", + "| total_timesteps | 246000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.28 |\n", + "| explained_variance | 0.89778614 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12299 |\n", + "| policy_loss | -0.00221 |\n", + "| std | 0.906 |\n", + "| value_loss | 4.25e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12400 |\n", + "| time_elapsed | 905 |\n", + "| total_timesteps | 248000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.26 |\n", + "| explained_variance | 0.8887966 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12399 |\n", + "| policy_loss | 0.00342 |\n", + "| std | 0.902 |\n", + "| value_loss | 9.93e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12500 |\n", + "| time_elapsed | 911 |\n", + "| total_timesteps | 250000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.22 |\n", + "| explained_variance | 0.66576827 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12499 |\n", + "| policy_loss | -0.0226 |\n", + "| std | 0.894 |\n", + "| value_loss | 3.2e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12600 |\n", + "| time_elapsed | 917 |\n", + "| total_timesteps | 252000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.22 |\n", + "| explained_variance | 0.7417493 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12599 |\n", + "| policy_loss | -0.00908 |\n", + "| std | 0.893 |\n", + "| value_loss | 0.000403 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 12700 |\n", + "| time_elapsed | 927 |\n", + "| total_timesteps | 254000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.21 |\n", + "| explained_variance | -0.4401511 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12699 |\n", + "| policy_loss | -0.00291 |\n", + "| std | 0.891 |\n", + "| value_loss | 0.000273 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12800 |\n", + "| time_elapsed | 933 |\n", + "| total_timesteps | 256000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | 0.049697876 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12799 |\n", + "| policy_loss | -0.0232 |\n", + "| std | 0.887 |\n", + "| value_loss | 0.00016 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 12900 |\n", + "| time_elapsed | 940 |\n", + "| total_timesteps | 258000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | -1.4899552 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12899 |\n", + "| policy_loss | 0.0311 |\n", + "| std | 0.887 |\n", + "| value_loss | 8.18e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13000 |\n", + "| time_elapsed | 947 |\n", + "| total_timesteps | 260000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.8485774 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 12999 |\n", + "| policy_loss | -0.0228 |\n", + "| std | 0.881 |\n", + "| value_loss | 0.000253 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13100 |\n", + "| time_elapsed | 953 |\n", + "| total_timesteps | 262000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | -19.859615 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13099 |\n", + "| policy_loss | -0.113 |\n", + "| std | 0.882 |\n", + "| value_loss | 0.00415 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13200 |\n", + "| time_elapsed | 963 |\n", + "| total_timesteps | 264000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.2 |\n", + "| explained_variance | 0.9869409 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13199 |\n", + "| policy_loss | -0.0141 |\n", + "| std | 0.888 |\n", + "| value_loss | 1.01e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13300 |\n", + "| time_elapsed | 969 |\n", + "| total_timesteps | 266000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.2 |\n", + "| explained_variance | 0.91975826 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13299 |\n", + "| policy_loss | 0.00825 |\n", + "| std | 0.889 |\n", + "| value_loss | 1.69e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13400 |\n", + "| time_elapsed | 976 |\n", + "| total_timesteps | 268000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | 0.88386124 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13399 |\n", + "| policy_loss | -0.0196 |\n", + "| std | 0.887 |\n", + "| value_loss | 2.02e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13500 |\n", + "| time_elapsed | 982 |\n", + "| total_timesteps | 270000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.21 |\n", + "| explained_variance | 0.88700855 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13499 |\n", + "| policy_loss | -0.0174 |\n", + "| std | 0.892 |\n", + "| value_loss | 1.85e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13600 |\n", + "| time_elapsed | 989 |\n", + "| total_timesteps | 272000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.2 |\n", + "| explained_variance | 0.9246665 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13599 |\n", + "| policy_loss | -0.0265 |\n", + "| std | 0.889 |\n", + "| value_loss | 3.59e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13700 |\n", + "| time_elapsed | 999 |\n", + "| total_timesteps | 274000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.2 |\n", + "| explained_variance | 0.90511894 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13699 |\n", + "| policy_loss | -0.0152 |\n", + "| std | 0.889 |\n", + "| value_loss | 1.5e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13800 |\n", + "| time_elapsed | 1006 |\n", + "| total_timesteps | 276000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.21 |\n", + "| explained_variance | 0.96453655 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13799 |\n", + "| policy_loss | 0.00467 |\n", + "| std | 0.89 |\n", + "| value_loss | 6.43e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 13900 |\n", + "| time_elapsed | 1012 |\n", + "| total_timesteps | 278000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.21 |\n", + "| explained_variance | 0.9376099 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13899 |\n", + "| policy_loss | -0.00328 |\n", + "| std | 0.892 |\n", + "| value_loss | 1e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14000 |\n", + "| time_elapsed | 1019 |\n", + "| total_timesteps | 280000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.2 |\n", + "| explained_variance | 0.9059786 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 13999 |\n", + "| policy_loss | 0.00602 |\n", + "| std | 0.889 |\n", + "| value_loss | 7.84e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14100 |\n", + "| time_elapsed | 1025 |\n", + "| total_timesteps | 282000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.72370136 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14099 |\n", + "| policy_loss | -0.016 |\n", + "| std | 0.883 |\n", + "| value_loss | 2.04e-05 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14200 |\n", + "| time_elapsed | 1035 |\n", + "| total_timesteps | 284000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.92715 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14199 |\n", + "| policy_loss | 0.00173 |\n", + "| std | 0.885 |\n", + "| value_loss | 3e-05 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14300 |\n", + "| time_elapsed | 1042 |\n", + "| total_timesteps | 286000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | 0.86559874 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14299 |\n", + "| policy_loss | -0.29 |\n", + "| std | 0.885 |\n", + "| value_loss | 0.00812 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14400 |\n", + "| time_elapsed | 1048 |\n", + "| total_timesteps | 288000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.95699525 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14399 |\n", + "| policy_loss | -0.0155 |\n", + "| std | 0.884 |\n", + "| value_loss | 1.59e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.7 |\n", + "| ep_rew_mean | -46.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14500 |\n", + "| time_elapsed | 1055 |\n", + "| total_timesteps | 290000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.15 |\n", + "| explained_variance | 0.27279484 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14499 |\n", + "| policy_loss | -0.0854 |\n", + "| std | 0.878 |\n", + "| value_loss | 0.000576 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14600 |\n", + "| time_elapsed | 1062 |\n", + "| total_timesteps | 292000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.15 |\n", + "| explained_variance | 0.96234167 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14599 |\n", + "| policy_loss | -0.00485 |\n", + "| std | 0.878 |\n", + "| value_loss | 5.17e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14700 |\n", + "| time_elapsed | 1072 |\n", + "| total_timesteps | 294000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.9157329 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14699 |\n", + "| policy_loss | -0.000324 |\n", + "| std | 0.881 |\n", + "| value_loss | 6.58e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14800 |\n", + "| time_elapsed | 1078 |\n", + "| total_timesteps | 296000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.9132874 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14799 |\n", + "| policy_loss | -0.0059 |\n", + "| std | 0.883 |\n", + "| value_loss | 3.24e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 14900 |\n", + "| time_elapsed | 1085 |\n", + "| total_timesteps | 298000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.84724855 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14899 |\n", + "| policy_loss | -0.0291 |\n", + "| std | 0.884 |\n", + "| value_loss | 6.02e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15000 |\n", + "| time_elapsed | 1091 |\n", + "| total_timesteps | 300000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.98511446 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 14999 |\n", + "| policy_loss | 0.00702 |\n", + "| std | 0.884 |\n", + "| value_loss | 8.7e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15100 |\n", + "| time_elapsed | 1098 |\n", + "| total_timesteps | 302000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | 0.9877453 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15099 |\n", + "| policy_loss | 0.0733 |\n", + "| std | 0.886 |\n", + "| value_loss | 0.000876 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15200 |\n", + "| time_elapsed | 1108 |\n", + "| total_timesteps | 304000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.19 |\n", + "| explained_variance | 0.97673744 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15199 |\n", + "| policy_loss | 0.00857 |\n", + "| std | 0.886 |\n", + "| value_loss | 1.03e-05 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15300 |\n", + "| time_elapsed | 1114 |\n", + "| total_timesteps | 306000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.94397 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15299 |\n", + "| policy_loss | 0.00493 |\n", + "| std | 0.884 |\n", + "| value_loss | 1.62e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15400 |\n", + "| time_elapsed | 1121 |\n", + "| total_timesteps | 308000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.9985177 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15399 |\n", + "| policy_loss | -0.433 |\n", + "| std | 0.881 |\n", + "| value_loss | 0.0286 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15500 |\n", + "| time_elapsed | 1128 |\n", + "| total_timesteps | 310000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.17 |\n", + "| explained_variance | 0.8468628 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15499 |\n", + "| policy_loss | -0.153 |\n", + "| std | 0.882 |\n", + "| value_loss | 0.00212 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 15600 |\n", + "| time_elapsed | 1134 |\n", + "| total_timesteps | 312000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.18 |\n", + "| explained_variance | 0.9753182 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15599 |\n", + "| policy_loss | 0.000916 |\n", + "| std | 0.885 |\n", + "| value_loss | 9.08e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15700 |\n", + "| time_elapsed | 1144 |\n", + "| total_timesteps | 314000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.16 |\n", + "| explained_variance | 0.6507284 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15699 |\n", + "| policy_loss | -0.024 |\n", + "| std | 0.881 |\n", + "| value_loss | 0.000117 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15800 |\n", + "| time_elapsed | 1150 |\n", + "| total_timesteps | 316000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.15 |\n", + "| explained_variance | 0.9728615 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15799 |\n", + "| policy_loss | -0.0219 |\n", + "| std | 0.878 |\n", + "| value_loss | 0.000419 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.3 |\n", + "| ep_rew_mean | -49.3 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 15900 |\n", + "| time_elapsed | 1157 |\n", + "| total_timesteps | 318000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.13 |\n", + "| explained_variance | 0.9922751 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15899 |\n", + "| policy_loss | 0.0166 |\n", + "| std | 0.872 |\n", + "| value_loss | 3.5e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16000 |\n", + "| time_elapsed | 1163 |\n", + "| total_timesteps | 320000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.39260852 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 15999 |\n", + "| policy_loss | 0.0382 |\n", + "| std | 0.871 |\n", + "| value_loss | 0.000285 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 16100 |\n", + "| time_elapsed | 1170 |\n", + "| total_timesteps | 322000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.09 |\n", + "| explained_variance | 0.88883996 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16099 |\n", + "| policy_loss | -0.288 |\n", + "| std | 0.863 |\n", + "| value_loss | 0.00633 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16200 |\n", + "| time_elapsed | 1180 |\n", + "| total_timesteps | 324000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.1 |\n", + "| explained_variance | 0.43979698 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16199 |\n", + "| policy_loss | -0.00578 |\n", + "| std | 0.867 |\n", + "| value_loss | 6.38e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.8 |\n", + "| ep_rew_mean | -46.7 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16300 |\n", + "| time_elapsed | 1186 |\n", + "| total_timesteps | 326000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.8098929 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16299 |\n", + "| policy_loss | 0.000928 |\n", + "| std | 0.872 |\n", + "| value_loss | 2.55e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16400 |\n", + "| time_elapsed | 1193 |\n", + "| total_timesteps | 328000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.14 |\n", + "| explained_variance | 0.9836917 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16399 |\n", + "| policy_loss | -0.108 |\n", + "| std | 0.874 |\n", + "| value_loss | 0.00153 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16500 |\n", + "| time_elapsed | 1200 |\n", + "| total_timesteps | 330000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.13 |\n", + "| explained_variance | 0.9889623 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16499 |\n", + "| policy_loss | -0.00139 |\n", + "| std | 0.874 |\n", + "| value_loss | 5.57e-06 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16600 |\n", + "| time_elapsed | 1210 |\n", + "| total_timesteps | 332000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.012164831 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16599 |\n", + "| policy_loss | 4.24 |\n", + "| std | 0.871 |\n", + "| value_loss | 3.84 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 45.7 |\n", + "| ep_rew_mean | -45.6 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16700 |\n", + "| time_elapsed | 1217 |\n", + "| total_timesteps | 334000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.99397725 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16699 |\n", + "| policy_loss | 0.00754 |\n", + "| std | 0.872 |\n", + "| value_loss | 7.58e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.4 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16800 |\n", + "| time_elapsed | 1224 |\n", + "| total_timesteps | 336000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | -4.876895 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16799 |\n", + "| policy_loss | -0.00223 |\n", + "| std | 0.868 |\n", + "| value_loss | 5.49e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 16900 |\n", + "| time_elapsed | 1231 |\n", + "| total_timesteps | 338000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | 0.89987206 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16899 |\n", + "| policy_loss | -0.00284 |\n", + "| std | 0.869 |\n", + "| value_loss | 1.76e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17000 |\n", + "| time_elapsed | 1245 |\n", + "| total_timesteps | 340000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | 0.5528773 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 16999 |\n", + "| policy_loss | -0.00291 |\n", + "| std | 0.868 |\n", + "| value_loss | 9.32e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17100 |\n", + "| time_elapsed | 1255 |\n", + "| total_timesteps | 342000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.13 |\n", + "| explained_variance | 0.8135508 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17099 |\n", + "| policy_loss | 0.00777 |\n", + "| std | 0.872 |\n", + "| value_loss | 2.38e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17200 |\n", + "| time_elapsed | 1264 |\n", + "| total_timesteps | 344000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.16375178 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17199 |\n", + "| policy_loss | -0.0403 |\n", + "| std | 0.871 |\n", + "| value_loss | 0.000449 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17300 |\n", + "| time_elapsed | 1270 |\n", + "| total_timesteps | 346000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.12 |\n", + "| explained_variance | 0.22681582 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17299 |\n", + "| policy_loss | 0.0323 |\n", + "| std | 0.87 |\n", + "| value_loss | 0.000112 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.2 |\n", + "| ep_rew_mean | -49.2 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17400 |\n", + "| time_elapsed | 1277 |\n", + "| total_timesteps | 348000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | 0.9767354 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17399 |\n", + "| policy_loss | -0.00717 |\n", + "| std | 0.87 |\n", + "| value_loss | 5.74e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 271 |\n", + "| iterations | 17500 |\n", + "| time_elapsed | 1287 |\n", + "| total_timesteps | 350000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.1 |\n", + "| explained_variance | 0.9386433 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17499 |\n", + "| policy_loss | -0.00407 |\n", + "| std | 0.867 |\n", + "| value_loss | 1.6e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17600 |\n", + "| time_elapsed | 1293 |\n", + "| total_timesteps | 352000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.09 |\n", + "| explained_variance | 0.8396387 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17599 |\n", + "| policy_loss | -0.0163 |\n", + "| std | 0.865 |\n", + "| value_loss | 6.88e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.8 |\n", + "| ep_rew_mean | -46.8 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17700 |\n", + "| time_elapsed | 1299 |\n", + "| total_timesteps | 354000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | 0.9501319 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17699 |\n", + "| policy_loss | 0.0174 |\n", + "| std | 0.87 |\n", + "| value_loss | 3.56e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17800 |\n", + "| time_elapsed | 1306 |\n", + "| total_timesteps | 356000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.09 |\n", + "| explained_variance | 0.94616187 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17799 |\n", + "| policy_loss | 0.00346 |\n", + "| std | 0.865 |\n", + "| value_loss | 2.21e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 17900 |\n", + "| time_elapsed | 1312 |\n", + "| total_timesteps | 358000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.09 |\n", + "| explained_variance | 0.96140474 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17899 |\n", + "| policy_loss | -0.0017 |\n", + "| std | 0.865 |\n", + "| value_loss | 0.000215 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18000 |\n", + "| time_elapsed | 1322 |\n", + "| total_timesteps | 360000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.1 |\n", + "| explained_variance | 0.97701174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 17999 |\n", + "| policy_loss | 0.00294 |\n", + "| std | 0.868 |\n", + "| value_loss | 4.87e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18100 |\n", + "| time_elapsed | 1328 |\n", + "| total_timesteps | 362000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.1 |\n", + "| explained_variance | 0.5843673 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18099 |\n", + "| policy_loss | -0.0331 |\n", + "| std | 0.868 |\n", + "| value_loss | 0.00043 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.2 |\n", + "| ep_rew_mean | -49.2 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18200 |\n", + "| time_elapsed | 1334 |\n", + "| total_timesteps | 364000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.1 |\n", + "| explained_variance | 0.98297787 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18199 |\n", + "| policy_loss | 0.00959 |\n", + "| std | 0.868 |\n", + "| value_loss | 6.49e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18300 |\n", + "| time_elapsed | 1341 |\n", + "| total_timesteps | 366000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.11 |\n", + "| explained_variance | 0.9543413 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18299 |\n", + "| policy_loss | 0.00256 |\n", + "| std | 0.868 |\n", + "| value_loss | 6.71e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 18400 |\n", + "| time_elapsed | 1347 |\n", + "| total_timesteps | 368000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.08 |\n", + "| explained_variance | -5.475193 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18399 |\n", + "| policy_loss | 0.119 |\n", + "| std | 0.863 |\n", + "| value_loss | 0.00233 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18500 |\n", + "| time_elapsed | 1356 |\n", + "| total_timesteps | 370000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.09 |\n", + "| explained_variance | 0.9711382 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18499 |\n", + "| policy_loss | 0.00211 |\n", + "| std | 0.865 |\n", + "| value_loss | 3.28e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 272 |\n", + "| iterations | 18600 |\n", + "| time_elapsed | 1363 |\n", + "| total_timesteps | 372000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.07 |\n", + "| explained_variance | 0.3015197 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18599 |\n", + "| policy_loss | 0.0673 |\n", + "| std | 0.861 |\n", + "| value_loss | 0.000601 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 18700 |\n", + "| time_elapsed | 1369 |\n", + "| total_timesteps | 374000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.06 |\n", + "| explained_variance | 0.972879 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18699 |\n", + "| policy_loss | 0.00273 |\n", + "| std | 0.859 |\n", + "| value_loss | 8.61e-06 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 18800 |\n", + "| time_elapsed | 1375 |\n", + "| total_timesteps | 376000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.07 |\n", + "| explained_variance | 0.9947514 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18799 |\n", + "| policy_loss | 0.000101 |\n", + "| std | 0.86 |\n", + "| value_loss | 1.14e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 18900 |\n", + "| time_elapsed | 1381 |\n", + "| total_timesteps | 378000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.07 |\n", + "| explained_variance | 0.97353405 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18899 |\n", + "| policy_loss | 0.00465 |\n", + "| std | 0.861 |\n", + "| value_loss | 3.59e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19000 |\n", + "| time_elapsed | 1391 |\n", + "| total_timesteps | 380000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.05 |\n", + "| explained_variance | 0.91057456 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 18999 |\n", + "| policy_loss | 0.0117 |\n", + "| std | 0.857 |\n", + "| value_loss | 5.27e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19100 |\n", + "| time_elapsed | 1397 |\n", + "| total_timesteps | 382000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.03 |\n", + "| explained_variance | 0.9235318 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19099 |\n", + "| policy_loss | 0.00576 |\n", + "| std | 0.852 |\n", + "| value_loss | 8.68e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19200 |\n", + "| time_elapsed | 1403 |\n", + "| total_timesteps | 384000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.02 |\n", + "| explained_variance | 0.96858865 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19199 |\n", + "| policy_loss | 0.0038 |\n", + "| std | 0.85 |\n", + "| value_loss | 8.57e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19300 |\n", + "| time_elapsed | 1410 |\n", + "| total_timesteps | 386000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.03 |\n", + "| explained_variance | 0.86711633 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19299 |\n", + "| policy_loss | -0.0111 |\n", + "| std | 0.852 |\n", + "| value_loss | 1.94e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19400 |\n", + "| time_elapsed | 1416 |\n", + "| total_timesteps | 388000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.98 |\n", + "| explained_variance | 0.94285834 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19399 |\n", + "| policy_loss | 0.00256 |\n", + "| std | 0.842 |\n", + "| value_loss | 3.84e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19500 |\n", + "| time_elapsed | 1427 |\n", + "| total_timesteps | 390000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.99 |\n", + "| explained_variance | 0.86415195 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19499 |\n", + "| policy_loss | -0.0182 |\n", + "| std | 0.843 |\n", + "| value_loss | 3.04e-05 |\n", + "--------------------------------------\n", + "----------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19600 |\n", + "| time_elapsed | 1433 |\n", + "| total_timesteps | 392000 |\n", + "| train/ | |\n", + "| entropy_loss | -5 |\n", + "| explained_variance | 0.0044369698 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19599 |\n", + "| policy_loss | 2.2 |\n", + "| std | 0.845 |\n", + "| value_loss | 3.88 |\n", + "----------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19700 |\n", + "| time_elapsed | 1439 |\n", + "| total_timesteps | 394000 |\n", + "| train/ | |\n", + "| entropy_loss | -5 |\n", + "| explained_variance | 0.86026084 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19699 |\n", + "| policy_loss | -0.0129 |\n", + "| std | 0.845 |\n", + "| value_loss | 9.01e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 19800 |\n", + "| time_elapsed | 1445 |\n", + "| total_timesteps | 396000 |\n", + "| train/ | |\n", + "| entropy_loss | -5.01 |\n", + "| explained_variance | 0.8322771 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19799 |\n", + "| policy_loss | 0.00362 |\n", + "| std | 0.848 |\n", + "| value_loss | 3.95e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 19900 |\n", + "| time_elapsed | 1451 |\n", + "| total_timesteps | 398000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.99 |\n", + "| explained_variance | 0.9543173 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19899 |\n", + "| policy_loss | 0.00162 |\n", + "| std | 0.843 |\n", + "| value_loss | 8.87e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 20000 |\n", + "| time_elapsed | 1461 |\n", + "| total_timesteps | 400000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.99 |\n", + "| explained_variance | 0.96547365 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 19999 |\n", + "| policy_loss | -0.03 |\n", + "| std | 0.844 |\n", + "| value_loss | 3.86e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 20100 |\n", + "| time_elapsed | 1467 |\n", + "| total_timesteps | 402000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.98 |\n", + "| explained_variance | 0.9669047 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20099 |\n", + "| policy_loss | 0.00462 |\n", + "| std | 0.841 |\n", + "| value_loss | 5.21e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.8 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20200 |\n", + "| time_elapsed | 1473 |\n", + "| total_timesteps | 404000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.96 |\n", + "| explained_variance | 0.8856006 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20199 |\n", + "| policy_loss | -0.00941 |\n", + "| std | 0.838 |\n", + "| value_loss | 1.04e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20300 |\n", + "| time_elapsed | 1479 |\n", + "| total_timesteps | 406000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.96 |\n", + "| explained_variance | 0.96606714 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20299 |\n", + "| policy_loss | -0.0049 |\n", + "| std | 0.838 |\n", + "| value_loss | 3.18e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20400 |\n", + "| time_elapsed | 1486 |\n", + "| total_timesteps | 408000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.97 |\n", + "| explained_variance | 0.9416196 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20399 |\n", + "| policy_loss | 0.0102 |\n", + "| std | 0.839 |\n", + "| value_loss | 1.27e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 273 |\n", + "| iterations | 20500 |\n", + "| time_elapsed | 1496 |\n", + "| total_timesteps | 410000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.98 |\n", + "| explained_variance | -1.4714031 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20499 |\n", + "| policy_loss | 0.00969 |\n", + "| std | 0.842 |\n", + "| value_loss | 1.55e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20600 |\n", + "| time_elapsed | 1503 |\n", + "| total_timesteps | 412000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.97 |\n", + "| explained_variance | 0.91544884 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20599 |\n", + "| policy_loss | 0.00142 |\n", + "| std | 0.84 |\n", + "| value_loss | 1.74e-06 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20700 |\n", + "| time_elapsed | 1509 |\n", + "| total_timesteps | 414000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.94 |\n", + "| explained_variance | 0.906113 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20699 |\n", + "| policy_loss | -0.00503 |\n", + "| std | 0.833 |\n", + "| value_loss | 3.68e-06 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20800 |\n", + "| time_elapsed | 1516 |\n", + "| total_timesteps | 416000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.93 |\n", + "| explained_variance | 0.95149547 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20799 |\n", + "| policy_loss | 0.00178 |\n", + "| std | 0.831 |\n", + "| value_loss | 1.11e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 20900 |\n", + "| time_elapsed | 1522 |\n", + "| total_timesteps | 418000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.91 |\n", + "| explained_variance | 0.9828419 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20899 |\n", + "| policy_loss | 0.0151 |\n", + "| std | 0.827 |\n", + "| value_loss | 3.95e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21000 |\n", + "| time_elapsed | 1528 |\n", + "| total_timesteps | 420000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.9 |\n", + "| explained_variance | 0.9770458 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 20999 |\n", + "| policy_loss | -0.00239 |\n", + "| std | 0.825 |\n", + "| value_loss | 7.19e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21100 |\n", + "| time_elapsed | 1538 |\n", + "| total_timesteps | 422000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.9 |\n", + "| explained_variance | 0.92313987 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21099 |\n", + "| policy_loss | 0.00894 |\n", + "| std | 0.826 |\n", + "| value_loss | 9.61e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21200 |\n", + "| time_elapsed | 1544 |\n", + "| total_timesteps | 424000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.89 |\n", + "| explained_variance | 0.32365882 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21199 |\n", + "| policy_loss | -0.0463 |\n", + "| std | 0.822 |\n", + "| value_loss | 9.98e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21300 |\n", + "| time_elapsed | 1551 |\n", + "| total_timesteps | 426000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.88 |\n", + "| explained_variance | 0.7403059 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21299 |\n", + "| policy_loss | 0.00897 |\n", + "| std | 0.821 |\n", + "| value_loss | 9.72e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21400 |\n", + "| time_elapsed | 1557 |\n", + "| total_timesteps | 428000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.87 |\n", + "| explained_variance | 0.8968396 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21399 |\n", + "| policy_loss | -0.00422 |\n", + "| std | 0.819 |\n", + "| value_loss | 5.15e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 21500 |\n", + "| time_elapsed | 1563 |\n", + "| total_timesteps | 430000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.87 |\n", + "| explained_variance | 0.9448255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21499 |\n", + "| policy_loss | -0.00374 |\n", + "| std | 0.818 |\n", + "| value_loss | 1.92e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21600 |\n", + "| time_elapsed | 1572 |\n", + "| total_timesteps | 432000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.87 |\n", + "| explained_variance | 0.850035 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21599 |\n", + "| policy_loss | -9.31e-05 |\n", + "| std | 0.818 |\n", + "| value_loss | 6.13e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 274 |\n", + "| iterations | 21700 |\n", + "| time_elapsed | 1578 |\n", + "| total_timesteps | 434000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.84 |\n", + "| explained_variance | 0.48841304 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21699 |\n", + "| policy_loss | -0.0312 |\n", + "| std | 0.812 |\n", + "| value_loss | 6.78e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.9 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 21800 |\n", + "| time_elapsed | 1584 |\n", + "| total_timesteps | 436000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.84 |\n", + "| explained_variance | 0.97507805 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21799 |\n", + "| policy_loss | -0.00284 |\n", + "| std | 0.812 |\n", + "| value_loss | 3.46e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 21900 |\n", + "| time_elapsed | 1589 |\n", + "| total_timesteps | 438000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.84 |\n", + "| explained_variance | 0.68833864 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21899 |\n", + "| policy_loss | 0.0115 |\n", + "| std | 0.813 |\n", + "| value_loss | 9.05e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22000 |\n", + "| time_elapsed | 1595 |\n", + "| total_timesteps | 440000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.83 |\n", + "| explained_variance | 0.98591065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 21999 |\n", + "| policy_loss | 0.00962 |\n", + "| std | 0.811 |\n", + "| value_loss | 4.56e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.7 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22100 |\n", + "| time_elapsed | 1605 |\n", + "| total_timesteps | 442000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.83 |\n", + "| explained_variance | 0.82283175 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22099 |\n", + "| policy_loss | 0.0108 |\n", + "| std | 0.811 |\n", + "| value_loss | 1.54e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22200 |\n", + "| time_elapsed | 1611 |\n", + "| total_timesteps | 444000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.8 |\n", + "| explained_variance | 0.59894145 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22199 |\n", + "| policy_loss | 0.0302 |\n", + "| std | 0.804 |\n", + "| value_loss | 0.000191 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22300 |\n", + "| time_elapsed | 1617 |\n", + "| total_timesteps | 446000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.78 |\n", + "| explained_variance | 0.9134196 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22299 |\n", + "| policy_loss | 0.00497 |\n", + "| std | 0.802 |\n", + "| value_loss | 4.45e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22400 |\n", + "| time_elapsed | 1623 |\n", + "| total_timesteps | 448000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.79 |\n", + "| explained_variance | 0.9829938 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22399 |\n", + "| policy_loss | -0.0215 |\n", + "| std | 0.803 |\n", + "| value_loss | 2.6e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 22500 |\n", + "| time_elapsed | 1629 |\n", + "| total_timesteps | 450000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.78 |\n", + "| explained_variance | 0.9305882 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22499 |\n", + "| policy_loss | -0.000147 |\n", + "| std | 0.802 |\n", + "| value_loss | 7.5e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 22600 |\n", + "| time_elapsed | 1635 |\n", + "| total_timesteps | 452000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.77 |\n", + "| explained_variance | 0.35571432 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22599 |\n", + "| policy_loss | -0.00716 |\n", + "| std | 0.799 |\n", + "| value_loss | 2.19e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 275 |\n", + "| iterations | 22700 |\n", + "| time_elapsed | 1645 |\n", + "| total_timesteps | 454000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.77 |\n", + "| explained_variance | 0.9946183 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22699 |\n", + "| policy_loss | 0.00128 |\n", + "| std | 0.799 |\n", + "| value_loss | 1.1e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 22800 |\n", + "| time_elapsed | 1651 |\n", + "| total_timesteps | 456000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.76 |\n", + "| explained_variance | 0.9850599 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22799 |\n", + "| policy_loss | 0.00015 |\n", + "| std | 0.798 |\n", + "| value_loss | 3.45e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 22900 |\n", + "| time_elapsed | 1657 |\n", + "| total_timesteps | 458000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.75 |\n", + "| explained_variance | -1.0031595 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22899 |\n", + "| policy_loss | 0.000758 |\n", + "| std | 0.796 |\n", + "| value_loss | 8.14e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.2 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23000 |\n", + "| time_elapsed | 1663 |\n", + "| total_timesteps | 460000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.75 |\n", + "| explained_variance | 0.1282289 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 22999 |\n", + "| policy_loss | -0.0544 |\n", + "| std | 0.795 |\n", + "| value_loss | 0.000587 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23100 |\n", + "| time_elapsed | 1670 |\n", + "| total_timesteps | 462000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.75 |\n", + "| explained_variance | 0.84313476 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23099 |\n", + "| policy_loss | -0.0114 |\n", + "| std | 0.795 |\n", + "| value_loss | 8.02e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.5 |\n", + "| ep_rew_mean | -46.4 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23200 |\n", + "| time_elapsed | 1679 |\n", + "| total_timesteps | 464000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.74 |\n", + "| explained_variance | 0.71710217 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23199 |\n", + "| policy_loss | -0.0132 |\n", + "| std | 0.793 |\n", + "| value_loss | 0.000272 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 45.3 |\n", + "| ep_rew_mean | -45.2 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23300 |\n", + "| time_elapsed | 1685 |\n", + "| total_timesteps | 466000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.73 |\n", + "| explained_variance | 0.9658966 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23299 |\n", + "| policy_loss | -0.00875 |\n", + "| std | 0.792 |\n", + "| value_loss | 1.17e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23400 |\n", + "| time_elapsed | 1691 |\n", + "| total_timesteps | 468000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.72 |\n", + "| explained_variance | 0.98442066 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23399 |\n", + "| policy_loss | 0.000132 |\n", + "| std | 0.791 |\n", + "| value_loss | 1.35e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23500 |\n", + "| time_elapsed | 1697 |\n", + "| total_timesteps | 470000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.71 |\n", + "| explained_variance | -0.20414686 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23499 |\n", + "| policy_loss | 0.0214 |\n", + "| std | 0.788 |\n", + "| value_loss | 4.31e-05 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 23600 |\n", + "| time_elapsed | 1703 |\n", + "| total_timesteps | 472000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.68 |\n", + "| explained_variance | 0.8207843 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23599 |\n", + "| policy_loss | -0.0187 |\n", + "| std | 0.781 |\n", + "| value_loss | 4.95e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23700 |\n", + "| time_elapsed | 1713 |\n", + "| total_timesteps | 474000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.67 |\n", + "| explained_variance | 0.9646553 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23699 |\n", + "| policy_loss | -0.00441 |\n", + "| std | 0.78 |\n", + "| value_loss | 4.12e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23800 |\n", + "| time_elapsed | 1719 |\n", + "| total_timesteps | 476000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.64 |\n", + "| explained_variance | 0.29265028 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23799 |\n", + "| policy_loss | -0.0266 |\n", + "| std | 0.775 |\n", + "| value_loss | 8.52e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 23900 |\n", + "| time_elapsed | 1725 |\n", + "| total_timesteps | 478000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.63 |\n", + "| explained_variance | 0.53161466 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23899 |\n", + "| policy_loss | 0.0194 |\n", + "| std | 0.773 |\n", + "| value_loss | 0.00012 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.7 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24000 |\n", + "| time_elapsed | 1731 |\n", + "| total_timesteps | 480000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.61 |\n", + "| explained_variance | -0.45772743 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 23999 |\n", + "| policy_loss | 0.0221 |\n", + "| std | 0.769 |\n", + "| value_loss | 0.000243 |\n", + "---------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24100 |\n", + "| time_elapsed | 1737 |\n", + "| total_timesteps | 482000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.6 |\n", + "| explained_variance | 0.984889 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24099 |\n", + "| policy_loss | 0.00232 |\n", + "| std | 0.767 |\n", + "| value_loss | 2.45e-06 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 276 |\n", + "| iterations | 24200 |\n", + "| time_elapsed | 1747 |\n", + "| total_timesteps | 484000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.6 |\n", + "| explained_variance | 0.38117242 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24199 |\n", + "| policy_loss | -0.00147 |\n", + "| std | 0.766 |\n", + "| value_loss | 2.92e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24300 |\n", + "| time_elapsed | 1753 |\n", + "| total_timesteps | 486000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.57 |\n", + "| explained_variance | 0.8432429 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24299 |\n", + "| policy_loss | -0.0173 |\n", + "| std | 0.761 |\n", + "| value_loss | 7.87e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.3 |\n", + "| ep_rew_mean | -49.3 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24400 |\n", + "| time_elapsed | 1759 |\n", + "| total_timesteps | 488000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.55 |\n", + "| explained_variance | 0.74071956 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24399 |\n", + "| policy_loss | -0.000541 |\n", + "| std | 0.757 |\n", + "| value_loss | 5.11e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24500 |\n", + "| time_elapsed | 1765 |\n", + "| total_timesteps | 490000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.53 |\n", + "| explained_variance | 0.93212646 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24499 |\n", + "| policy_loss | -0.0138 |\n", + "| std | 0.752 |\n", + "| value_loss | 1.72e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24600 |\n", + "| time_elapsed | 1771 |\n", + "| total_timesteps | 492000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.49 |\n", + "| explained_variance | 0.83804965 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24599 |\n", + "| policy_loss | -0.0364 |\n", + "| std | 0.746 |\n", + "| value_loss | 7.15e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24700 |\n", + "| time_elapsed | 1777 |\n", + "| total_timesteps | 494000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.5 |\n", + "| explained_variance | 0.99318516 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24699 |\n", + "| policy_loss | -0.00357 |\n", + "| std | 0.746 |\n", + "| value_loss | 3.81e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24800 |\n", + "| time_elapsed | 1787 |\n", + "| total_timesteps | 496000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.49 |\n", + "| explained_variance | 0.9355085 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24799 |\n", + "| policy_loss | -0.000438 |\n", + "| std | 0.745 |\n", + "| value_loss | 1.37e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 24900 |\n", + "| time_elapsed | 1793 |\n", + "| total_timesteps | 498000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.48 |\n", + "| explained_variance | 0.9021735 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24899 |\n", + "| policy_loss | -0.000913 |\n", + "| std | 0.744 |\n", + "| value_loss | 2.17e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 25000 |\n", + "| time_elapsed | 1799 |\n", + "| total_timesteps | 500000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.47 |\n", + "| explained_variance | 0.9590338 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 24999 |\n", + "| policy_loss | -0.000665 |\n", + "| std | 0.742 |\n", + "| value_loss | 8.6e-07 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25100 |\n", + "| time_elapsed | 1805 |\n", + "| total_timesteps | 502000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.46 |\n", + "| explained_variance | -0.09868252 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25099 |\n", + "| policy_loss | 0.00935 |\n", + "| std | 0.74 |\n", + "| value_loss | 3.3e-05 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.8 |\n", + "| ep_rew_mean | -49.7 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25200 |\n", + "| time_elapsed | 1811 |\n", + "| total_timesteps | 504000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.45 |\n", + "| explained_variance | 0.9393065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25199 |\n", + "| policy_loss | -0.00298 |\n", + "| std | 0.739 |\n", + "| value_loss | 2.64e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 277 |\n", + "| iterations | 25300 |\n", + "| time_elapsed | 1820 |\n", + "| total_timesteps | 506000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.46 |\n", + "| explained_variance | 0.9661807 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25299 |\n", + "| policy_loss | -0.00921 |\n", + "| std | 0.739 |\n", + "| value_loss | 5.25e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25400 |\n", + "| time_elapsed | 1826 |\n", + "| total_timesteps | 508000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.45 |\n", + "| explained_variance | 0.98033226 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25399 |\n", + "| policy_loss | -0.0115 |\n", + "| std | 0.738 |\n", + "| value_loss | 1.33e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25500 |\n", + "| time_elapsed | 1832 |\n", + "| total_timesteps | 510000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.45 |\n", + "| explained_variance | 0.98172903 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25499 |\n", + "| policy_loss | -0.00525 |\n", + "| std | 0.737 |\n", + "| value_loss | 4.09e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25600 |\n", + "| time_elapsed | 1838 |\n", + "| total_timesteps | 512000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.44 |\n", + "| explained_variance | 0.9630763 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25599 |\n", + "| policy_loss | 0.00446 |\n", + "| std | 0.737 |\n", + "| value_loss | 3.84e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25700 |\n", + "| time_elapsed | 1844 |\n", + "| total_timesteps | 514000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.42 |\n", + "| explained_variance | 0.74551255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25699 |\n", + "| policy_loss | -0.00379 |\n", + "| std | 0.733 |\n", + "| value_loss | 4.06e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25800 |\n", + "| time_elapsed | 1851 |\n", + "| total_timesteps | 516000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.42 |\n", + "| explained_variance | 0.88611174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25799 |\n", + "| policy_loss | -0.00438 |\n", + "| std | 0.733 |\n", + "| value_loss | 4.34e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.2 |\n", + "| ep_rew_mean | -46.2 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 25900 |\n", + "| time_elapsed | 1860 |\n", + "| total_timesteps | 518000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.4 |\n", + "| explained_variance | 0.97400296 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25899 |\n", + "| policy_loss | -0.0016 |\n", + "| std | 0.73 |\n", + "| value_loss | 3.13e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.4 |\n", + "| ep_rew_mean | -47.4 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26000 |\n", + "| time_elapsed | 1866 |\n", + "| total_timesteps | 520000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.43 |\n", + "| explained_variance | 0.9903519 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 25999 |\n", + "| policy_loss | -0.00792 |\n", + "| std | 0.734 |\n", + "| value_loss | 4.45e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26100 |\n", + "| time_elapsed | 1872 |\n", + "| total_timesteps | 522000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.41 |\n", + "| explained_variance | 0.96013033 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26099 |\n", + "| policy_loss | -0.00215 |\n", + "| std | 0.733 |\n", + "| value_loss | 2.28e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26200 |\n", + "| time_elapsed | 1878 |\n", + "| total_timesteps | 524000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.42 |\n", + "| explained_variance | 0.9256018 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26199 |\n", + "| policy_loss | 0.00556 |\n", + "| std | 0.733 |\n", + "| value_loss | 6.28e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 26300 |\n", + "| time_elapsed | 1884 |\n", + "| total_timesteps | 526000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.4 |\n", + "| explained_variance | 0.96353793 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26299 |\n", + "| policy_loss | -0.00559 |\n", + "| std | 0.73 |\n", + "| value_loss | 2.37e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26400 |\n", + "| time_elapsed | 1894 |\n", + "| total_timesteps | 528000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.4 |\n", + "| explained_variance | 0.9796733 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26399 |\n", + "| policy_loss | -0.00667 |\n", + "| std | 0.73 |\n", + "| value_loss | 1.02e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26500 |\n", + "| time_elapsed | 1900 |\n", + "| total_timesteps | 530000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.4 |\n", + "| explained_variance | 0.9652436 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26499 |\n", + "| policy_loss | -0.00465 |\n", + "| std | 0.73 |\n", + "| value_loss | 2.56e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26600 |\n", + "| time_elapsed | 1906 |\n", + "| total_timesteps | 532000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.39 |\n", + "| explained_variance | 0.85467994 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26599 |\n", + "| policy_loss | -0.000304 |\n", + "| std | 0.728 |\n", + "| value_loss | 8.97e-06 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 26700 |\n", + "| time_elapsed | 1912 |\n", + "| total_timesteps | 534000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.36 |\n", + "| explained_variance | 0.99323 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26699 |\n", + "| policy_loss | 0.00249 |\n", + "| std | 0.724 |\n", + "| value_loss | 4.39e-07 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 26800 |\n", + "| time_elapsed | 1919 |\n", + "| total_timesteps | 536000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.36 |\n", + "| explained_variance | 0.95387536 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26799 |\n", + "| policy_loss | 0.0109 |\n", + "| std | 0.723 |\n", + "| value_loss | 1.75e-05 |\n", + "--------------------------------------\n", + "-----------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 26900 |\n", + "| time_elapsed | 1928 |\n", + "| total_timesteps | 538000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.35 |\n", + "| explained_variance | -0.0017001629 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26899 |\n", + "| policy_loss | 3.99 |\n", + "| std | 0.72 |\n", + "| value_loss | 11.5 |\n", + "-----------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27000 |\n", + "| time_elapsed | 1934 |\n", + "| total_timesteps | 540000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.8643065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 26999 |\n", + "| policy_loss | -0.0105 |\n", + "| std | 0.718 |\n", + "| value_loss | 9.68e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27100 |\n", + "| time_elapsed | 1941 |\n", + "| total_timesteps | 542000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.33 |\n", + "| explained_variance | 0.9878161 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27099 |\n", + "| policy_loss | 0.00164 |\n", + "| std | 0.718 |\n", + "| value_loss | 9.34e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27200 |\n", + "| time_elapsed | 1947 |\n", + "| total_timesteps | 544000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.32 |\n", + "| explained_variance | 0.9731284 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27199 |\n", + "| policy_loss | -0.000907 |\n", + "| std | 0.716 |\n", + "| value_loss | 1.15e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27300 |\n", + "| time_elapsed | 1952 |\n", + "| total_timesteps | 546000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.32 |\n", + "| explained_variance | 0.8868263 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27299 |\n", + "| policy_loss | -0.000882 |\n", + "| std | 0.715 |\n", + "| value_loss | 1.92e-06 |\n", + "-------------------------------------\n", + "----------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27400 |\n", + "| time_elapsed | 1958 |\n", + "| total_timesteps | 548000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.31 |\n", + "| explained_variance | 0.0024583936 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27399 |\n", + "| policy_loss | 2.55 |\n", + "| std | 0.715 |\n", + "| value_loss | 3.87 |\n", + "----------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27500 |\n", + "| time_elapsed | 1968 |\n", + "| total_timesteps | 550000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.29 |\n", + "| explained_variance | 0.9876905 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27499 |\n", + "| policy_loss | -0.00128 |\n", + "| std | 0.71 |\n", + "| value_loss | 2.36e-06 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27600 |\n", + "| time_elapsed | 1974 |\n", + "| total_timesteps | 552000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.29 |\n", + "| explained_variance | 0.905852 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27599 |\n", + "| policy_loss | 0.00117 |\n", + "| std | 0.711 |\n", + "| value_loss | 1.32e-06 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27700 |\n", + "| time_elapsed | 1980 |\n", + "| total_timesteps | 554000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.28 |\n", + "| explained_variance | 0.88651055 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27699 |\n", + "| policy_loss | -0.00138 |\n", + "| std | 0.709 |\n", + "| value_loss | 2.35e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 27800 |\n", + "| time_elapsed | 1985 |\n", + "| total_timesteps | 556000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.29 |\n", + "| explained_variance | 0.9799976 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27799 |\n", + "| policy_loss | 0.000998 |\n", + "| std | 0.711 |\n", + "| value_loss | 3.69e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 27900 |\n", + "| time_elapsed | 1991 |\n", + "| total_timesteps | 558000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.29 |\n", + "| explained_variance | 0.58545244 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27899 |\n", + "| policy_loss | 0.000947 |\n", + "| std | 0.712 |\n", + "| value_loss | 6.65e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 28000 |\n", + "| time_elapsed | 2001 |\n", + "| total_timesteps | 560000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.26 |\n", + "| explained_variance | 0.62959266 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 27999 |\n", + "| policy_loss | 0.0204 |\n", + "| std | 0.707 |\n", + "| value_loss | 0.000115 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.4 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 28100 |\n", + "| time_elapsed | 2007 |\n", + "| total_timesteps | 562000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.26 |\n", + "| explained_variance | 0.42031288 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28099 |\n", + "| policy_loss | -0.00396 |\n", + "| std | 0.705 |\n", + "| value_loss | 2.14e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28200 |\n", + "| time_elapsed | 2013 |\n", + "| total_timesteps | 564000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.96632874 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28199 |\n", + "| policy_loss | -0.0043 |\n", + "| std | 0.704 |\n", + "| value_loss | 5.77e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28300 |\n", + "| time_elapsed | 2019 |\n", + "| total_timesteps | 566000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | -0.96487904 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28299 |\n", + "| policy_loss | -0.000124 |\n", + "| std | 0.704 |\n", + "| value_loss | 0.000174 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28400 |\n", + "| time_elapsed | 2025 |\n", + "| total_timesteps | 568000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.24 |\n", + "| explained_variance | 0.97085166 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28399 |\n", + "| policy_loss | 0.000454 |\n", + "| std | 0.703 |\n", + "| value_loss | 3.15e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28500 |\n", + "| time_elapsed | 2035 |\n", + "| total_timesteps | 570000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.25 |\n", + "| explained_variance | 0.95976454 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28499 |\n", + "| policy_loss | -0.0113 |\n", + "| std | 0.704 |\n", + "| value_loss | 9.16e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28600 |\n", + "| time_elapsed | 2041 |\n", + "| total_timesteps | 572000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.23 |\n", + "| explained_variance | 0.9088546 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28599 |\n", + "| policy_loss | 0.00286 |\n", + "| std | 0.701 |\n", + "| value_loss | 3.36e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28700 |\n", + "| time_elapsed | 2047 |\n", + "| total_timesteps | 574000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.2 |\n", + "| explained_variance | 0.97843605 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28699 |\n", + "| policy_loss | 8.56e-06 |\n", + "| std | 0.695 |\n", + "| value_loss | 5.1e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28800 |\n", + "| time_elapsed | 2053 |\n", + "| total_timesteps | 576000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.2 |\n", + "| explained_variance | 0.93301547 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28799 |\n", + "| policy_loss | -0.00356 |\n", + "| std | 0.695 |\n", + "| value_loss | 3.79e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 28900 |\n", + "| time_elapsed | 2059 |\n", + "| total_timesteps | 578000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.2 |\n", + "| explained_variance | 0.8788929 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28899 |\n", + "| policy_loss | -0.00445 |\n", + "| std | 0.695 |\n", + "| value_loss | 1.31e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29000 |\n", + "| time_elapsed | 2066 |\n", + "| total_timesteps | 580000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.19 |\n", + "| explained_variance | 0.9705282 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 28999 |\n", + "| policy_loss | -0.0101 |\n", + "| std | 0.694 |\n", + "| value_loss | 7.94e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29100 |\n", + "| time_elapsed | 2075 |\n", + "| total_timesteps | 582000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.17 |\n", + "| explained_variance | 0.98360187 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29099 |\n", + "| policy_loss | 0.00022 |\n", + "| std | 0.69 |\n", + "| value_loss | 3.07e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29200 |\n", + "| time_elapsed | 2082 |\n", + "| total_timesteps | 584000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.17 |\n", + "| explained_variance | 0.9926867 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29199 |\n", + "| policy_loss | 0.00738 |\n", + "| std | 0.69 |\n", + "| value_loss | 5.01e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29300 |\n", + "| time_elapsed | 2088 |\n", + "| total_timesteps | 586000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.17 |\n", + "| explained_variance | 0.90506065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29299 |\n", + "| policy_loss | 0.00392 |\n", + "| std | 0.689 |\n", + "| value_loss | 2.7e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29400 |\n", + "| time_elapsed | 2094 |\n", + "| total_timesteps | 588000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.15 |\n", + "| explained_variance | 0.80868274 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29399 |\n", + "| policy_loss | -0.00462 |\n", + "| std | 0.686 |\n", + "| value_loss | 3.01e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29500 |\n", + "| time_elapsed | 2101 |\n", + "| total_timesteps | 590000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.16 |\n", + "| explained_variance | 0.963571 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29499 |\n", + "| policy_loss | -0.000898 |\n", + "| std | 0.689 |\n", + "| value_loss | 2.56e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29600 |\n", + "| time_elapsed | 2110 |\n", + "| total_timesteps | 592000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.16 |\n", + "| explained_variance | 0.9684437 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29599 |\n", + "| policy_loss | -0.00116 |\n", + "| std | 0.688 |\n", + "| value_loss | 2.12e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29700 |\n", + "| time_elapsed | 2117 |\n", + "| total_timesteps | 594000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.13 |\n", + "| explained_variance | 0.9956419 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29699 |\n", + "| policy_loss | 0.00424 |\n", + "| std | 0.683 |\n", + "| value_loss | 2.75e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29800 |\n", + "| time_elapsed | 2123 |\n", + "| total_timesteps | 596000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.12 |\n", + "| explained_variance | 0.9595566 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29799 |\n", + "| policy_loss | 0.00185 |\n", + "| std | 0.681 |\n", + "| value_loss | 4.19e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 29900 |\n", + "| time_elapsed | 2129 |\n", + "| total_timesteps | 598000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.11 |\n", + "| explained_variance | 0.96305496 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29899 |\n", + "| policy_loss | -0.000112 |\n", + "| std | 0.68 |\n", + "| value_loss | 6.34e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30000 |\n", + "| time_elapsed | 2135 |\n", + "| total_timesteps | 600000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.11 |\n", + "| explained_variance | 0.9809834 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 29999 |\n", + "| policy_loss | 6.45e-06 |\n", + "| std | 0.68 |\n", + "| value_loss | 4.58e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30100 |\n", + "| time_elapsed | 2145 |\n", + "| total_timesteps | 602000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.11 |\n", + "| explained_variance | 0.8857182 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30099 |\n", + "| policy_loss | 0.000569 |\n", + "| std | 0.68 |\n", + "| value_loss | 2.5e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30200 |\n", + "| time_elapsed | 2151 |\n", + "| total_timesteps | 604000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.11 |\n", + "| explained_variance | 0.99272937 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30199 |\n", + "| policy_loss | 0.00266 |\n", + "| std | 0.679 |\n", + "| value_loss | 1.14e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30300 |\n", + "| time_elapsed | 2158 |\n", + "| total_timesteps | 606000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.09 |\n", + "| explained_variance | 0.78605455 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30299 |\n", + "| policy_loss | -0.00613 |\n", + "| std | 0.676 |\n", + "| value_loss | 3.41e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30400 |\n", + "| time_elapsed | 2164 |\n", + "| total_timesteps | 608000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.1 |\n", + "| explained_variance | 0.9736405 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30399 |\n", + "| policy_loss | -0.000302 |\n", + "| std | 0.677 |\n", + "| value_loss | 3.92e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30500 |\n", + "| time_elapsed | 2171 |\n", + "| total_timesteps | 610000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.09 |\n", + "| explained_variance | 0.83784264 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30499 |\n", + "| policy_loss | -0.00721 |\n", + "| std | 0.676 |\n", + "| value_loss | 6.27e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30600 |\n", + "| time_elapsed | 2181 |\n", + "| total_timesteps | 612000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.09 |\n", + "| explained_variance | 0.94835174 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30599 |\n", + "| policy_loss | 0.00533 |\n", + "| std | 0.675 |\n", + "| value_loss | 2.49e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30700 |\n", + "| time_elapsed | 2187 |\n", + "| total_timesteps | 614000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.08 |\n", + "| explained_variance | 0.7656373 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30699 |\n", + "| policy_loss | 0.00183 |\n", + "| std | 0.674 |\n", + "| value_loss | 1.24e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30800 |\n", + "| time_elapsed | 2193 |\n", + "| total_timesteps | 616000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.07 |\n", + "| explained_variance | 0.96022564 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30799 |\n", + "| policy_loss | 0.00383 |\n", + "| std | 0.671 |\n", + "| value_loss | 2.81e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 30900 |\n", + "| time_elapsed | 2199 |\n", + "| total_timesteps | 618000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.07 |\n", + "| explained_variance | 0.79067695 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30899 |\n", + "| policy_loss | -0.00153 |\n", + "| std | 0.672 |\n", + "| value_loss | 1.15e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31000 |\n", + "| time_elapsed | 2206 |\n", + "| total_timesteps | 620000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.05 |\n", + "| explained_variance | 0.9759757 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 30999 |\n", + "| policy_loss | 0.000628 |\n", + "| std | 0.668 |\n", + "| value_loss | 1.7e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31100 |\n", + "| time_elapsed | 2216 |\n", + "| total_timesteps | 622000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.06 |\n", + "| explained_variance | 0.9674546 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31099 |\n", + "| policy_loss | 0.000612 |\n", + "| std | 0.67 |\n", + "| value_loss | 1.2e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31200 |\n", + "| time_elapsed | 2222 |\n", + "| total_timesteps | 624000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.06 |\n", + "| explained_variance | 0.8752002 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31199 |\n", + "| policy_loss | 0.00171 |\n", + "| std | 0.67 |\n", + "| value_loss | 7.12e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31300 |\n", + "| time_elapsed | 2229 |\n", + "| total_timesteps | 626000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.92981267 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31299 |\n", + "| policy_loss | -0.000362 |\n", + "| std | 0.666 |\n", + "| value_loss | 5.16e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31400 |\n", + "| time_elapsed | 2235 |\n", + "| total_timesteps | 628000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.53731203 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31399 |\n", + "| policy_loss | -0.0292 |\n", + "| std | 0.665 |\n", + "| value_loss | 6.17e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 31500 |\n", + "| time_elapsed | 2241 |\n", + "| total_timesteps | 630000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.9594686 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31499 |\n", + "| policy_loss | -8.82e-05 |\n", + "| std | 0.667 |\n", + "| value_loss | 2.16e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31600 |\n", + "| time_elapsed | 2251 |\n", + "| total_timesteps | 632000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.97942924 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31599 |\n", + "| policy_loss | 0.0021 |\n", + "| std | 0.667 |\n", + "| value_loss | 1.62e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31700 |\n", + "| time_elapsed | 2257 |\n", + "| total_timesteps | 634000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.8932128 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31699 |\n", + "| policy_loss | -0.00245 |\n", + "| std | 0.667 |\n", + "| value_loss | 9.37e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 31800 |\n", + "| time_elapsed | 2263 |\n", + "| total_timesteps | 636000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.01 |\n", + "| explained_variance | 0.83874965 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31799 |\n", + "| policy_loss | 0.00601 |\n", + "| std | 0.663 |\n", + "| value_loss | 4.34e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 31900 |\n", + "| time_elapsed | 2269 |\n", + "| total_timesteps | 638000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.01 |\n", + "| explained_variance | 0.9876569 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31899 |\n", + "| policy_loss | -0.00323 |\n", + "| std | 0.662 |\n", + "| value_loss | 1.62e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32000 |\n", + "| time_elapsed | 2276 |\n", + "| total_timesteps | 640000 |\n", + "| train/ | |\n", + "| entropy_loss | -4 |\n", + "| explained_variance | 0.9645224 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 31999 |\n", + "| policy_loss | -0.00171 |\n", + "| std | 0.66 |\n", + "| value_loss | 1.17e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 32100 |\n", + "| time_elapsed | 2286 |\n", + "| total_timesteps | 642000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.99 |\n", + "| explained_variance | 0.9889257 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32099 |\n", + "| policy_loss | 0.00162 |\n", + "| std | 0.659 |\n", + "| value_loss | 8.48e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 32200 |\n", + "| time_elapsed | 2292 |\n", + "| total_timesteps | 644000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.98 |\n", + "| explained_variance | 0.8465643 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32199 |\n", + "| policy_loss | 8.41e-05 |\n", + "| std | 0.657 |\n", + "| value_loss | 7.93e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32300 |\n", + "| time_elapsed | 2298 |\n", + "| total_timesteps | 646000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.97 |\n", + "| explained_variance | 0.9307914 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32299 |\n", + "| policy_loss | 0.00455 |\n", + "| std | 0.655 |\n", + "| value_loss | 3.1e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32400 |\n", + "| time_elapsed | 2305 |\n", + "| total_timesteps | 648000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.98 |\n", + "| explained_variance | 0.9708448 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32399 |\n", + "| policy_loss | 0.00827 |\n", + "| std | 0.657 |\n", + "| value_loss | 6.87e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32500 |\n", + "| time_elapsed | 2311 |\n", + "| total_timesteps | 650000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.98 |\n", + "| explained_variance | 0.9863495 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32499 |\n", + "| policy_loss | 0.000939 |\n", + "| std | 0.657 |\n", + "| value_loss | 4.68e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 32600 |\n", + "| time_elapsed | 2321 |\n", + "| total_timesteps | 652000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.98 |\n", + "| explained_variance | 0.6925158 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32599 |\n", + "| policy_loss | 0.00332 |\n", + "| std | 0.657 |\n", + "| value_loss | 1.72e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 32700 |\n", + "| time_elapsed | 2327 |\n", + "| total_timesteps | 654000 |\n", + "| train/ | |\n", + "| entropy_loss | -4 |\n", + "| explained_variance | 0.9290561 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32699 |\n", + "| policy_loss | 0.00131 |\n", + "| std | 0.66 |\n", + "| value_loss | 1.3e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32800 |\n", + "| time_elapsed | 2334 |\n", + "| total_timesteps | 656000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.01 |\n", + "| explained_variance | 0.96650255 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32799 |\n", + "| policy_loss | 0.000317 |\n", + "| std | 0.661 |\n", + "| value_loss | 2.75e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 32900 |\n", + "| time_elapsed | 2340 |\n", + "| total_timesteps | 658000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.99 |\n", + "| explained_variance | 0.92857796 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32899 |\n", + "| policy_loss | -0.000708 |\n", + "| std | 0.658 |\n", + "| value_loss | 9.73e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 33000 |\n", + "| time_elapsed | 2347 |\n", + "| total_timesteps | 660000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.99 |\n", + "| explained_variance | 0.81251585 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 32999 |\n", + "| policy_loss | -0.000237 |\n", + "| std | 0.659 |\n", + "| value_loss | 2.52e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33100 |\n", + "| time_elapsed | 2357 |\n", + "| total_timesteps | 662000 |\n", + "| train/ | |\n", + "| entropy_loss | -4 |\n", + "| explained_variance | 0.8767034 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33099 |\n", + "| policy_loss | -0.00235 |\n", + "| std | 0.661 |\n", + "| value_loss | 2.81e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33200 |\n", + "| time_elapsed | 2363 |\n", + "| total_timesteps | 664000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.01 |\n", + "| explained_variance | 0.9719703 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33199 |\n", + "| policy_loss | 0.000635 |\n", + "| std | 0.661 |\n", + "| value_loss | 2.4e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33300 |\n", + "| time_elapsed | 2370 |\n", + "| total_timesteps | 666000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.02 |\n", + "| explained_variance | 0.9966027 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33299 |\n", + "| policy_loss | 0.00241 |\n", + "| std | 0.664 |\n", + "| value_loss | 6.73e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 33400 |\n", + "| time_elapsed | 2377 |\n", + "| total_timesteps | 668000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.9685441 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33399 |\n", + "| policy_loss | -0.000859 |\n", + "| std | 0.665 |\n", + "| value_loss | 9.94e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 33500 |\n", + "| time_elapsed | 2384 |\n", + "| total_timesteps | 670000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.02 |\n", + "| explained_variance | 0.46576625 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33499 |\n", + "| policy_loss | -0.00206 |\n", + "| std | 0.664 |\n", + "| value_loss | 5.21e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33600 |\n", + "| time_elapsed | 2394 |\n", + "| total_timesteps | 672000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.02 |\n", + "| explained_variance | 0.79574704 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33599 |\n", + "| policy_loss | 0.00271 |\n", + "| std | 0.663 |\n", + "| value_loss | 1.14e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33700 |\n", + "| time_elapsed | 2401 |\n", + "| total_timesteps | 674000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.9673645 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33699 |\n", + "| policy_loss | -0.00301 |\n", + "| std | 0.665 |\n", + "| value_loss | 1.64e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33800 |\n", + "| time_elapsed | 2408 |\n", + "| total_timesteps | 676000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.9559499 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33799 |\n", + "| policy_loss | 0.00404 |\n", + "| std | 0.665 |\n", + "| value_loss | 2.61e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 33900 |\n", + "| time_elapsed | 2415 |\n", + "| total_timesteps | 678000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.03 |\n", + "| explained_variance | 0.8689276 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33899 |\n", + "| policy_loss | -0.00229 |\n", + "| std | 0.665 |\n", + "| value_loss | 3.25e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.7 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34000 |\n", + "| time_elapsed | 2421 |\n", + "| total_timesteps | 680000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.92665327 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 33999 |\n", + "| policy_loss | -8.24e-05 |\n", + "| std | 0.666 |\n", + "| value_loss | 7.25e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.7 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34100 |\n", + "| time_elapsed | 2432 |\n", + "| total_timesteps | 682000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.04 |\n", + "| explained_variance | 0.9745406 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34099 |\n", + "| policy_loss | -0.00209 |\n", + "| std | 0.666 |\n", + "| value_loss | 1.28e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34200 |\n", + "| time_elapsed | 2438 |\n", + "| total_timesteps | 684000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.02 |\n", + "| explained_variance | 0.8974001 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34199 |\n", + "| policy_loss | -0.00629 |\n", + "| std | 0.662 |\n", + "| value_loss | 6.24e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34300 |\n", + "| time_elapsed | 2445 |\n", + "| total_timesteps | 686000 |\n", + "| train/ | |\n", + "| entropy_loss | -4.02 |\n", + "| explained_variance | 0.9367453 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34299 |\n", + "| policy_loss | -0.000219 |\n", + "| std | 0.663 |\n", + "| value_loss | 3.84e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34400 |\n", + "| time_elapsed | 2452 |\n", + "| total_timesteps | 688000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.99 |\n", + "| explained_variance | 0.9830403 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34399 |\n", + "| policy_loss | -0.000379 |\n", + "| std | 0.658 |\n", + "| value_loss | 8.19e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34500 |\n", + "| time_elapsed | 2458 |\n", + "| total_timesteps | 690000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.96 |\n", + "| explained_variance | 0.7310099 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34499 |\n", + "| policy_loss | -0.013 |\n", + "| std | 0.652 |\n", + "| value_loss | 2.52e-05 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34600 |\n", + "| time_elapsed | 2468 |\n", + "| total_timesteps | 692000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.94 |\n", + "| explained_variance | 0.975126 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34599 |\n", + "| policy_loss | 0.00412 |\n", + "| std | 0.65 |\n", + "| value_loss | 3.43e-06 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.9 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34700 |\n", + "| time_elapsed | 2475 |\n", + "| total_timesteps | 694000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.95 |\n", + "| explained_variance | -0.6563065 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34699 |\n", + "| policy_loss | 0.00318 |\n", + "| std | 0.651 |\n", + "| value_loss | 5.93e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34800 |\n", + "| time_elapsed | 2481 |\n", + "| total_timesteps | 696000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.92 |\n", + "| explained_variance | 0.97628003 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34799 |\n", + "| policy_loss | 0.000984 |\n", + "| std | 0.647 |\n", + "| value_loss | 1.03e-06 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 34900 |\n", + "| time_elapsed | 2488 |\n", + "| total_timesteps | 698000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.92 |\n", + "| explained_variance | 0.857736 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34899 |\n", + "| policy_loss | 0.000848 |\n", + "| std | 0.646 |\n", + "| value_loss | 8.62e-07 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35000 |\n", + "| time_elapsed | 2494 |\n", + "| total_timesteps | 700000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.9 |\n", + "| explained_variance | 0.9739769 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 34999 |\n", + "| policy_loss | 8.95e-05 |\n", + "| std | 0.643 |\n", + "| value_loss | 6.12e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35100 |\n", + "| time_elapsed | 2505 |\n", + "| total_timesteps | 702000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.91 |\n", + "| explained_variance | 0.9625768 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35099 |\n", + "| policy_loss | 0.000541 |\n", + "| std | 0.645 |\n", + "| value_loss | 2.43e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35200 |\n", + "| time_elapsed | 2511 |\n", + "| total_timesteps | 704000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.9 |\n", + "| explained_variance | 0.76877356 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35199 |\n", + "| policy_loss | -0.00575 |\n", + "| std | 0.642 |\n", + "| value_loss | 1.08e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35300 |\n", + "| time_elapsed | 2517 |\n", + "| total_timesteps | 706000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.89 |\n", + "| explained_variance | 0.84682435 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35299 |\n", + "| policy_loss | -0.0021 |\n", + "| std | 0.641 |\n", + "| value_loss | 3.52e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35400 |\n", + "| time_elapsed | 2524 |\n", + "| total_timesteps | 708000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.89 |\n", + "| explained_variance | 0.7140837 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35399 |\n", + "| policy_loss | -0.00864 |\n", + "| std | 0.642 |\n", + "| value_loss | 1.03e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35500 |\n", + "| time_elapsed | 2530 |\n", + "| total_timesteps | 710000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.9 |\n", + "| explained_variance | 0.9013965 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35499 |\n", + "| policy_loss | -0.00133 |\n", + "| std | 0.644 |\n", + "| value_loss | 5.98e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35600 |\n", + "| time_elapsed | 2541 |\n", + "| total_timesteps | 712000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.91 |\n", + "| explained_variance | 0.91648865 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35599 |\n", + "| policy_loss | -0.00166 |\n", + "| std | 0.644 |\n", + "| value_loss | 8.45e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35700 |\n", + "| time_elapsed | 2547 |\n", + "| total_timesteps | 714000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.89 |\n", + "| explained_variance | 0.78630555 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35699 |\n", + "| policy_loss | -0.0023 |\n", + "| std | 0.642 |\n", + "| value_loss | 3.13e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35800 |\n", + "| time_elapsed | 2554 |\n", + "| total_timesteps | 716000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.89 |\n", + "| explained_variance | 0.98644364 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35799 |\n", + "| policy_loss | -0.00116 |\n", + "| std | 0.642 |\n", + "| value_loss | 5.94e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 35900 |\n", + "| time_elapsed | 2561 |\n", + "| total_timesteps | 718000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.88 |\n", + "| explained_variance | 0.9824021 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35899 |\n", + "| policy_loss | -0.00393 |\n", + "| std | 0.639 |\n", + "| value_loss | 4.01e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36000 |\n", + "| time_elapsed | 2572 |\n", + "| total_timesteps | 720000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.87 |\n", + "| explained_variance | 0.9410251 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 35999 |\n", + "| policy_loss | -0.000505 |\n", + "| std | 0.638 |\n", + "| value_loss | 2.79e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36100 |\n", + "| time_elapsed | 2578 |\n", + "| total_timesteps | 722000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.87 |\n", + "| explained_variance | 0.9754824 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36099 |\n", + "| policy_loss | 0.000673 |\n", + "| std | 0.638 |\n", + "| value_loss | 3.42e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 36200 |\n", + "| time_elapsed | 2585 |\n", + "| total_timesteps | 724000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.86 |\n", + "| explained_variance | 0.84805125 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36199 |\n", + "| policy_loss | -0.00034 |\n", + "| std | 0.636 |\n", + "| value_loss | 2.15e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.3 |\n", + "| ep_rew_mean | -49.3 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 36300 |\n", + "| time_elapsed | 2592 |\n", + "| total_timesteps | 726000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.86 |\n", + "| explained_variance | 0.98801094 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36299 |\n", + "| policy_loss | -0.00244 |\n", + "| std | 0.637 |\n", + "| value_loss | 7.71e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 36400 |\n", + "| time_elapsed | 2598 |\n", + "| total_timesteps | 728000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.86 |\n", + "| explained_variance | 0.64739573 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36399 |\n", + "| policy_loss | -0.00118 |\n", + "| std | 0.636 |\n", + "| value_loss | 4.25e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36500 |\n", + "| time_elapsed | 2608 |\n", + "| total_timesteps | 730000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.84 |\n", + "| explained_variance | 0.9897441 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36499 |\n", + "| policy_loss | 0.00103 |\n", + "| std | 0.633 |\n", + "| value_loss | 8.12e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36600 |\n", + "| time_elapsed | 2615 |\n", + "| total_timesteps | 732000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.84 |\n", + "| explained_variance | 0.98654985 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36599 |\n", + "| policy_loss | -0.00401 |\n", + "| std | 0.634 |\n", + "| value_loss | 2.09e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36700 |\n", + "| time_elapsed | 2622 |\n", + "| total_timesteps | 734000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.84 |\n", + "| explained_variance | 0.98241895 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36699 |\n", + "| policy_loss | -0.00083 |\n", + "| std | 0.633 |\n", + "| value_loss | 1.19e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36800 |\n", + "| time_elapsed | 2629 |\n", + "| total_timesteps | 736000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.83 |\n", + "| explained_variance | 0.8045112 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36799 |\n", + "| policy_loss | 0.00678 |\n", + "| std | 0.632 |\n", + "| value_loss | 8.17e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 36900 |\n", + "| time_elapsed | 2636 |\n", + "| total_timesteps | 738000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.83 |\n", + "| explained_variance | 0.6432221 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36899 |\n", + "| policy_loss | -0.00117 |\n", + "| std | 0.632 |\n", + "| value_loss | 2.46e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37000 |\n", + "| time_elapsed | 2646 |\n", + "| total_timesteps | 740000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.81 |\n", + "| explained_variance | 0.89308023 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 36999 |\n", + "| policy_loss | -0.00348 |\n", + "| std | 0.629 |\n", + "| value_loss | 2.19e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.5 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37100 |\n", + "| time_elapsed | 2653 |\n", + "| total_timesteps | 742000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.81 |\n", + "| explained_variance | 0.97850627 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37099 |\n", + "| policy_loss | -0.000425 |\n", + "| std | 0.629 |\n", + "| value_loss | 4.43e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37200 |\n", + "| time_elapsed | 2660 |\n", + "| total_timesteps | 744000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.78 |\n", + "| explained_variance | 0.9655469 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37199 |\n", + "| policy_loss | 7.71e-05 |\n", + "| std | 0.624 |\n", + "| value_loss | 1.54e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37300 |\n", + "| time_elapsed | 2666 |\n", + "| total_timesteps | 746000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.78 |\n", + "| explained_variance | 0.92692417 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37299 |\n", + "| policy_loss | -0.00408 |\n", + "| std | 0.624 |\n", + "| value_loss | 5.12e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37400 |\n", + "| time_elapsed | 2673 |\n", + "| total_timesteps | 748000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.78 |\n", + "| explained_variance | 0.85534066 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37399 |\n", + "| policy_loss | -0.00534 |\n", + "| std | 0.624 |\n", + "| value_loss | 6.73e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37500 |\n", + "| time_elapsed | 2684 |\n", + "| total_timesteps | 750000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.77 |\n", + "| explained_variance | 0.91903675 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37499 |\n", + "| policy_loss | -0.00187 |\n", + "| std | 0.623 |\n", + "| value_loss | 2.31e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37600 |\n", + "| time_elapsed | 2690 |\n", + "| total_timesteps | 752000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.76 |\n", + "| explained_variance | 0.9927211 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37599 |\n", + "| policy_loss | 0.00225 |\n", + "| std | 0.62 |\n", + "| value_loss | 1.23e-06 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37700 |\n", + "| time_elapsed | 2697 |\n", + "| total_timesteps | 754000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.76 |\n", + "| explained_variance | 0.961677 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37699 |\n", + "| policy_loss | 0.00138 |\n", + "| std | 0.621 |\n", + "| value_loss | 1.04e-06 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37800 |\n", + "| time_elapsed | 2703 |\n", + "| total_timesteps | 756000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.76 |\n", + "| explained_variance | 0.8840703 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37799 |\n", + "| policy_loss | -0.00133 |\n", + "| std | 0.621 |\n", + "| value_loss | 5.31e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 37900 |\n", + "| time_elapsed | 2710 |\n", + "| total_timesteps | 758000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.76 |\n", + "| explained_variance | 0.9751732 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37899 |\n", + "| policy_loss | 0.0017 |\n", + "| std | 0.621 |\n", + "| value_loss | 8.42e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.9 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38000 |\n", + "| time_elapsed | 2720 |\n", + "| total_timesteps | 760000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.76 |\n", + "| explained_variance | 0.90713525 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 37999 |\n", + "| policy_loss | -0.000299 |\n", + "| std | 0.62 |\n", + "| value_loss | 2.18e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38100 |\n", + "| time_elapsed | 2727 |\n", + "| total_timesteps | 762000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.75 |\n", + "| explained_variance | 0.97773933 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38099 |\n", + "| policy_loss | 0.00071 |\n", + "| std | 0.62 |\n", + "| value_loss | 7.3e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.8 |\n", + "| ep_rew_mean | -46.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38200 |\n", + "| time_elapsed | 2733 |\n", + "| total_timesteps | 764000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.74 |\n", + "| explained_variance | 0.85500395 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38199 |\n", + "| policy_loss | -0.0188 |\n", + "| std | 0.616 |\n", + "| value_loss | 0.000115 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38300 |\n", + "| time_elapsed | 2740 |\n", + "| total_timesteps | 766000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.73 |\n", + "| explained_variance | 0.9707148 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38299 |\n", + "| policy_loss | -0.00259 |\n", + "| std | 0.616 |\n", + "| value_loss | 7.38e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38400 |\n", + "| time_elapsed | 2751 |\n", + "| total_timesteps | 768000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.73 |\n", + "| explained_variance | 0.9092056 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38399 |\n", + "| policy_loss | -0.00333 |\n", + "| std | 0.615 |\n", + "| value_loss | 4.96e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38500 |\n", + "| time_elapsed | 2757 |\n", + "| total_timesteps | 770000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.73 |\n", + "| explained_variance | 0.98466456 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38499 |\n", + "| policy_loss | 0.0108 |\n", + "| std | 0.616 |\n", + "| value_loss | 1.65e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38600 |\n", + "| time_elapsed | 2764 |\n", + "| total_timesteps | 772000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.73 |\n", + "| explained_variance | 0.4182393 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38599 |\n", + "| policy_loss | 0.0304 |\n", + "| std | 0.615 |\n", + "| value_loss | 0.000218 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38700 |\n", + "| time_elapsed | 2771 |\n", + "| total_timesteps | 774000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.72 |\n", + "| explained_variance | 0.86738527 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38699 |\n", + "| policy_loss | -0.00272 |\n", + "| std | 0.615 |\n", + "| value_loss | 3.13e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.6 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 38800 |\n", + "| time_elapsed | 2777 |\n", + "| total_timesteps | 776000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.69 |\n", + "| explained_variance | 0.92607296 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38799 |\n", + "| policy_loss | -0.00675 |\n", + "| std | 0.61 |\n", + "| value_loss | 1.55e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.2 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 38900 |\n", + "| time_elapsed | 2788 |\n", + "| total_timesteps | 778000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.68 |\n", + "| explained_variance | 0.3575865 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38899 |\n", + "| policy_loss | 0.105 |\n", + "| std | 0.61 |\n", + "| value_loss | 0.00223 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.2 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39000 |\n", + "| time_elapsed | 2795 |\n", + "| total_timesteps | 780000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.69 |\n", + "| explained_variance | 0.9598239 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 38999 |\n", + "| policy_loss | -0.00267 |\n", + "| std | 0.61 |\n", + "| value_loss | 5.62e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 39100 |\n", + "| time_elapsed | 2802 |\n", + "| total_timesteps | 782000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.69 |\n", + "| explained_variance | 0.8572078 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39099 |\n", + "| policy_loss | 0.00538 |\n", + "| std | 0.61 |\n", + "| value_loss | 5.86e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 39200 |\n", + "| time_elapsed | 2809 |\n", + "| total_timesteps | 784000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.68 |\n", + "| explained_variance | 0.86322427 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39199 |\n", + "| policy_loss | -0.000962 |\n", + "| std | 0.61 |\n", + "| value_loss | 1.02e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 39300 |\n", + "| time_elapsed | 2815 |\n", + "| total_timesteps | 786000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.67 |\n", + "| explained_variance | 0.98482674 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39299 |\n", + "| policy_loss | -0.00257 |\n", + "| std | 0.607 |\n", + "| value_loss | 2.04e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39400 |\n", + "| time_elapsed | 2826 |\n", + "| total_timesteps | 788000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.64 |\n", + "| explained_variance | 0.98814607 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39399 |\n", + "| policy_loss | -0.0132 |\n", + "| std | 0.603 |\n", + "| value_loss | 1.93e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 45.7 |\n", + "| ep_rew_mean | -45.6 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39500 |\n", + "| time_elapsed | 2832 |\n", + "| total_timesteps | 790000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.63 |\n", + "| explained_variance | 0.75104976 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39499 |\n", + "| policy_loss | -0.0149 |\n", + "| std | 0.601 |\n", + "| value_loss | 4.16e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39600 |\n", + "| time_elapsed | 2839 |\n", + "| total_timesteps | 792000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.61 |\n", + "| explained_variance | 0.9826381 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39599 |\n", + "| policy_loss | -0.00721 |\n", + "| std | 0.599 |\n", + "| value_loss | 8.65e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.9 |\n", + "| ep_rew_mean | -46.9 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39700 |\n", + "| time_elapsed | 2846 |\n", + "| total_timesteps | 794000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.59 |\n", + "| explained_variance | 0.91662145 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39699 |\n", + "| policy_loss | 0.00448 |\n", + "| std | 0.596 |\n", + "| value_loss | 5.06e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 39800 |\n", + "| time_elapsed | 2852 |\n", + "| total_timesteps | 796000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.59 |\n", + "| explained_variance | 0.97679144 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39799 |\n", + "| policy_loss | 0.00058 |\n", + "| std | 0.595 |\n", + "| value_loss | 1.8e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 39900 |\n", + "| time_elapsed | 2863 |\n", + "| total_timesteps | 798000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.58 |\n", + "| explained_variance | 0.99432683 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39899 |\n", + "| policy_loss | -0.00551 |\n", + "| std | 0.593 |\n", + "| value_loss | 2.67e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.2 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40000 |\n", + "| time_elapsed | 2870 |\n", + "| total_timesteps | 800000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.58 |\n", + "| explained_variance | 0.98825186 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 39999 |\n", + "| policy_loss | -0.002 |\n", + "| std | 0.594 |\n", + "| value_loss | 1.8e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.4 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40100 |\n", + "| time_elapsed | 2876 |\n", + "| total_timesteps | 802000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | 0.06861681 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40099 |\n", + "| policy_loss | 0.0233 |\n", + "| std | 0.592 |\n", + "| value_loss | 8.39e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40200 |\n", + "| time_elapsed | 2883 |\n", + "| total_timesteps | 804000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | 0.9904497 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40199 |\n", + "| policy_loss | -0.00605 |\n", + "| std | 0.592 |\n", + "| value_loss | 8.11e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.1 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40300 |\n", + "| time_elapsed | 2889 |\n", + "| total_timesteps | 806000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.55 |\n", + "| explained_variance | 0.95933515 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40299 |\n", + "| policy_loss | 0.00178 |\n", + "| std | 0.591 |\n", + "| value_loss | 6.86e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40400 |\n", + "| time_elapsed | 2899 |\n", + "| total_timesteps | 808000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.56 |\n", + "| explained_variance | 0.97932297 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40399 |\n", + "| policy_loss | 0.0017 |\n", + "| std | 0.591 |\n", + "| value_loss | 1.05e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40500 |\n", + "| time_elapsed | 2905 |\n", + "| total_timesteps | 810000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | 0.9931614 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40499 |\n", + "| policy_loss | 9.6e-05 |\n", + "| std | 0.593 |\n", + "| value_loss | 2.44e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40600 |\n", + "| time_elapsed | 2911 |\n", + "| total_timesteps | 812000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.58 |\n", + "| explained_variance | 0.72751546 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40599 |\n", + "| policy_loss | 0.00225 |\n", + "| std | 0.595 |\n", + "| value_loss | 4.93e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.6 |\n", + "| ep_rew_mean | -47.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 40700 |\n", + "| time_elapsed | 2917 |\n", + "| total_timesteps | 814000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.58 |\n", + "| explained_variance | 0.9609206 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40699 |\n", + "| policy_loss | 0.00484 |\n", + "| std | 0.595 |\n", + "| value_loss | 1.85e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 40800 |\n", + "| time_elapsed | 2923 |\n", + "| total_timesteps | 816000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.59 |\n", + "| explained_variance | 0.9776916 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40799 |\n", + "| policy_loss | -0.00179 |\n", + "| std | 0.596 |\n", + "| value_loss | 6.04e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 278 |\n", + "| iterations | 40900 |\n", + "| time_elapsed | 2932 |\n", + "| total_timesteps | 818000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.59 |\n", + "| explained_variance | 0.95068985 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40899 |\n", + "| policy_loss | 0.00321 |\n", + "| std | 0.596 |\n", + "| value_loss | 4.92e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41000 |\n", + "| time_elapsed | 2938 |\n", + "| total_timesteps | 820000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.59 |\n", + "| explained_variance | 0.94147617 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 40999 |\n", + "| policy_loss | 0.00213 |\n", + "| std | 0.596 |\n", + "| value_loss | 3.03e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41100 |\n", + "| time_elapsed | 2944 |\n", + "| total_timesteps | 822000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | -0.12963355 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41099 |\n", + "| policy_loss | 0.0255 |\n", + "| std | 0.593 |\n", + "| value_loss | 0.000291 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41200 |\n", + "| time_elapsed | 2950 |\n", + "| total_timesteps | 824000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | 0.5466497 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41199 |\n", + "| policy_loss | -0.000548 |\n", + "| std | 0.593 |\n", + "| value_loss | 1.48e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41300 |\n", + "| time_elapsed | 2956 |\n", + "| total_timesteps | 826000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.57 |\n", + "| explained_variance | 0.9527324 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41299 |\n", + "| policy_loss | 0.000149 |\n", + "| std | 0.593 |\n", + "| value_loss | 1.14e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41400 |\n", + "| time_elapsed | 2966 |\n", + "| total_timesteps | 828000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.53 |\n", + "| explained_variance | 0.8947157 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41399 |\n", + "| policy_loss | -0.000977 |\n", + "| std | 0.588 |\n", + "| value_loss | 3.34e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41500 |\n", + "| time_elapsed | 2972 |\n", + "| total_timesteps | 830000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.52 |\n", + "| explained_variance | 0.9557045 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41499 |\n", + "| policy_loss | -0.00129 |\n", + "| std | 0.586 |\n", + "| value_loss | 2.02e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.8 |\n", + "| ep_rew_mean | -48.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41600 |\n", + "| time_elapsed | 2978 |\n", + "| total_timesteps | 832000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.51 |\n", + "| explained_variance | 0.86566734 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41599 |\n", + "| policy_loss | -0.00558 |\n", + "| std | 0.585 |\n", + "| value_loss | 8.27e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41700 |\n", + "| time_elapsed | 2984 |\n", + "| total_timesteps | 834000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.9876392 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41699 |\n", + "| policy_loss | -0.00139 |\n", + "| std | 0.583 |\n", + "| value_loss | 1.51e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41800 |\n", + "| time_elapsed | 2990 |\n", + "| total_timesteps | 836000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.94977826 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41799 |\n", + "| policy_loss | -0.00302 |\n", + "| std | 0.583 |\n", + "| value_loss | 4.13e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 41900 |\n", + "| time_elapsed | 2996 |\n", + "| total_timesteps | 838000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | -0.17895353 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41899 |\n", + "| policy_loss | -0.000211 |\n", + "| std | 0.583 |\n", + "| value_loss | 8.77e-06 |\n", + "---------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42000 |\n", + "| time_elapsed | 3005 |\n", + "| total_timesteps | 840000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | -16.5786 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 41999 |\n", + "| policy_loss | -0.00635 |\n", + "| std | 0.582 |\n", + "| value_loss | 1.32e-05 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42100 |\n", + "| time_elapsed | 3011 |\n", + "| total_timesteps | 842000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.51 |\n", + "| explained_variance | 0.9696886 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42099 |\n", + "| policy_loss | -0.000619 |\n", + "| std | 0.585 |\n", + "| value_loss | 9.09e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42200 |\n", + "| time_elapsed | 3017 |\n", + "| total_timesteps | 844000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.9301016 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42199 |\n", + "| policy_loss | -0.00234 |\n", + "| std | 0.583 |\n", + "| value_loss | 2.62e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42300 |\n", + "| time_elapsed | 3023 |\n", + "| total_timesteps | 846000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.5389772 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42299 |\n", + "| policy_loss | 0.00977 |\n", + "| std | 0.583 |\n", + "| value_loss | 2.91e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.4 |\n", + "| ep_rew_mean | -49.4 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42400 |\n", + "| time_elapsed | 3029 |\n", + "| total_timesteps | 848000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.5 |\n", + "| explained_variance | 0.97215235 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42399 |\n", + "| policy_loss | 0.000407 |\n", + "| std | 0.583 |\n", + "| value_loss | 4.69e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42500 |\n", + "| time_elapsed | 3039 |\n", + "| total_timesteps | 850000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.49 |\n", + "| explained_variance | 0.98842454 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42499 |\n", + "| policy_loss | -0.00272 |\n", + "| std | 0.581 |\n", + "| value_loss | 3.38e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.7 |\n", + "| ep_rew_mean | -46.6 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42600 |\n", + "| time_elapsed | 3045 |\n", + "| total_timesteps | 852000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.46 |\n", + "| explained_variance | 0.96813923 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42599 |\n", + "| policy_loss | -0.000448 |\n", + "| std | 0.577 |\n", + "| value_loss | 1.41e-05 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42700 |\n", + "| time_elapsed | 3051 |\n", + "| total_timesteps | 854000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.46 |\n", + "| explained_variance | 0.98248726 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42699 |\n", + "| policy_loss | -0.000134 |\n", + "| std | 0.577 |\n", + "| value_loss | 4.33e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 42800 |\n", + "| time_elapsed | 3058 |\n", + "| total_timesteps | 856000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.48 |\n", + "| explained_variance | 0.39102143 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42799 |\n", + "| policy_loss | 0.013 |\n", + "| std | 0.579 |\n", + "| value_loss | 0.000239 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 42900 |\n", + "| time_elapsed | 3063 |\n", + "| total_timesteps | 858000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.48 |\n", + "| explained_variance | 0.9779079 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42899 |\n", + "| policy_loss | 0.000379 |\n", + "| std | 0.58 |\n", + "| value_loss | 1.16e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 43000 |\n", + "| time_elapsed | 3073 |\n", + "| total_timesteps | 860000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.44 |\n", + "| explained_variance | 0.61227715 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 42999 |\n", + "| policy_loss | 0.0209 |\n", + "| std | 0.575 |\n", + "| value_loss | 7.89e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 43100 |\n", + "| time_elapsed | 3079 |\n", + "| total_timesteps | 862000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.42 |\n", + "| explained_variance | 0.9107274 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43099 |\n", + "| policy_loss | 0.00381 |\n", + "| std | 0.571 |\n", + "| value_loss | 1.34e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 43200 |\n", + "| time_elapsed | 3085 |\n", + "| total_timesteps | 864000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.42 |\n", + "| explained_variance | 0.8499373 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43199 |\n", + "| policy_loss | 8.45e-05 |\n", + "| std | 0.572 |\n", + "| value_loss | 4.13e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43300 |\n", + "| time_elapsed | 3091 |\n", + "| total_timesteps | 866000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.41 |\n", + "| explained_variance | 0.9820314 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43299 |\n", + "| policy_loss | -0.000137 |\n", + "| std | 0.57 |\n", + "| value_loss | 1.03e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43400 |\n", + "| time_elapsed | 3098 |\n", + "| total_timesteps | 868000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.42 |\n", + "| explained_variance | 0.9914655 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43399 |\n", + "| policy_loss | 0.00113 |\n", + "| std | 0.571 |\n", + "| value_loss | 1.02e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43500 |\n", + "| time_elapsed | 3104 |\n", + "| total_timesteps | 870000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.9956513 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43499 |\n", + "| policy_loss | -0.00204 |\n", + "| std | 0.568 |\n", + "| value_loss | 1.7e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 279 |\n", + "| iterations | 43600 |\n", + "| time_elapsed | 3115 |\n", + "| total_timesteps | 872000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.41 |\n", + "| explained_variance | 0.3942523 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43599 |\n", + "| policy_loss | 0.00775 |\n", + "| std | 0.57 |\n", + "| value_loss | 1.4e-05 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43700 |\n", + "| time_elapsed | 3121 |\n", + "| total_timesteps | 874000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.42 |\n", + "| explained_variance | 0.9855038 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43699 |\n", + "| policy_loss | 0.00744 |\n", + "| std | 0.571 |\n", + "| value_loss | 6.56e-06 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.2 |\n", + "| ep_rew_mean | -47.2 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43800 |\n", + "| time_elapsed | 3127 |\n", + "| total_timesteps | 876000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.41 |\n", + "| explained_variance | 0.008309901 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43799 |\n", + "| policy_loss | 1.97 |\n", + "| std | 0.572 |\n", + "| value_loss | 3.85 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.7 |\n", + "| ep_rew_mean | -47.7 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 43900 |\n", + "| time_elapsed | 3133 |\n", + "| total_timesteps | 878000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.41 |\n", + "| explained_variance | 0.77352124 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43899 |\n", + "| policy_loss | -0.00231 |\n", + "| std | 0.571 |\n", + "| value_loss | 7.97e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.7 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44000 |\n", + "| time_elapsed | 3139 |\n", + "| total_timesteps | 880000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.42 |\n", + "| explained_variance | 0.27089834 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 43999 |\n", + "| policy_loss | 0.0011 |\n", + "| std | 0.572 |\n", + "| value_loss | 3.99e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44100 |\n", + "| time_elapsed | 3148 |\n", + "| total_timesteps | 882000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.41 |\n", + "| explained_variance | 0.8952299 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44099 |\n", + "| policy_loss | -0.000954 |\n", + "| std | 0.571 |\n", + "| value_loss | 1.77e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44200 |\n", + "| time_elapsed | 3154 |\n", + "| total_timesteps | 884000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.99033046 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44199 |\n", + "| policy_loss | 0.00178 |\n", + "| std | 0.568 |\n", + "| value_loss | 1.12e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44300 |\n", + "| time_elapsed | 3160 |\n", + "| total_timesteps | 886000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.8583008 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44299 |\n", + "| policy_loss | -0.0018 |\n", + "| std | 0.568 |\n", + "| value_loss | 2.16e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.3 |\n", + "| ep_rew_mean | -48.3 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44400 |\n", + "| time_elapsed | 3167 |\n", + "| total_timesteps | 888000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.37 |\n", + "| explained_variance | 0.99091053 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44399 |\n", + "| policy_loss | 0.00132 |\n", + "| std | 0.564 |\n", + "| value_loss | 3.64e-06 |\n", + "--------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44500 |\n", + "| time_elapsed | 3173 |\n", + "| total_timesteps | 890000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.37 |\n", + "| explained_variance | 0.002827227 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44499 |\n", + "| policy_loss | 1.55 |\n", + "| std | 0.565 |\n", + "| value_loss | 3.89 |\n", + "---------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44600 |\n", + "| time_elapsed | 3182 |\n", + "| total_timesteps | 892000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.37 |\n", + "| explained_variance | 0.06923187 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44599 |\n", + "| policy_loss | -0.0339 |\n", + "| std | 0.565 |\n", + "| value_loss | 0.000171 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.3 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44700 |\n", + "| time_elapsed | 3188 |\n", + "| total_timesteps | 894000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.37 |\n", + "| explained_variance | -21.586582 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44699 |\n", + "| policy_loss | -0.0592 |\n", + "| std | 0.567 |\n", + "| value_loss | 0.00248 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.8 |\n", + "| ep_rew_mean | -47.8 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44800 |\n", + "| time_elapsed | 3194 |\n", + "| total_timesteps | 896000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.9866412 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44799 |\n", + "| policy_loss | 0.000868 |\n", + "| std | 0.568 |\n", + "| value_loss | 1.84e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 44900 |\n", + "| time_elapsed | 3200 |\n", + "| total_timesteps | 898000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.9051938 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44899 |\n", + "| policy_loss | 0.00217 |\n", + "| std | 0.567 |\n", + "| value_loss | 7.65e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45000 |\n", + "| time_elapsed | 3206 |\n", + "| total_timesteps | 900000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.39 |\n", + "| explained_variance | 0.6483333 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 44999 |\n", + "| policy_loss | 0.000492 |\n", + "| std | 0.568 |\n", + "| value_loss | 8.77e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45100 |\n", + "| time_elapsed | 3216 |\n", + "| total_timesteps | 902000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.36 |\n", + "| explained_variance | 0.9926092 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45099 |\n", + "| policy_loss | 0.000675 |\n", + "| std | 0.563 |\n", + "| value_loss | 6.19e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45200 |\n", + "| time_elapsed | 3223 |\n", + "| total_timesteps | 904000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.36 |\n", + "| explained_variance | 0.99412566 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45199 |\n", + "| policy_loss | 0.00288 |\n", + "| std | 0.564 |\n", + "| value_loss | 1.07e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45300 |\n", + "| time_elapsed | 3229 |\n", + "| total_timesteps | 906000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | 0.98885244 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45299 |\n", + "| policy_loss | 0.00442 |\n", + "| std | 0.562 |\n", + "| value_loss | 2.03e-06 |\n", + "--------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45400 |\n", + "| time_elapsed | 3235 |\n", + "| total_timesteps | 908000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.94177 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45399 |\n", + "| policy_loss | -0.00149 |\n", + "| std | 0.559 |\n", + "| value_loss | 4.01e-06 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45500 |\n", + "| time_elapsed | 3241 |\n", + "| total_timesteps | 910000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.9495938 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45499 |\n", + "| policy_loss | -0.00533 |\n", + "| std | 0.558 |\n", + "| value_loss | 1.06e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45600 |\n", + "| time_elapsed | 3248 |\n", + "| total_timesteps | 912000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.91553783 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45599 |\n", + "| policy_loss | -0.000339 |\n", + "| std | 0.558 |\n", + "| value_loss | 3.84e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45700 |\n", + "| time_elapsed | 3257 |\n", + "| total_timesteps | 914000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.32 |\n", + "| explained_variance | 0.39803714 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45699 |\n", + "| policy_loss | 0.0227 |\n", + "| std | 0.558 |\n", + "| value_loss | 0.000166 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45800 |\n", + "| time_elapsed | 3264 |\n", + "| total_timesteps | 916000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.84038657 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45799 |\n", + "| policy_loss | -0.00389 |\n", + "| std | 0.558 |\n", + "| value_loss | 1.08e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 45900 |\n", + "| time_elapsed | 3270 |\n", + "| total_timesteps | 918000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.9584281 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45899 |\n", + "| policy_loss | -0.00227 |\n", + "| std | 0.559 |\n", + "| value_loss | 1.63e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46000 |\n", + "| time_elapsed | 3276 |\n", + "| total_timesteps | 920000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.83016056 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 45999 |\n", + "| policy_loss | 0.00417 |\n", + "| std | 0.559 |\n", + "| value_loss | 3.25e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46100 |\n", + "| time_elapsed | 3283 |\n", + "| total_timesteps | 922000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.32 |\n", + "| explained_variance | 0.6802498 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46099 |\n", + "| policy_loss | 0.00223 |\n", + "| std | 0.558 |\n", + "| value_loss | 3.24e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46200 |\n", + "| time_elapsed | 3292 |\n", + "| total_timesteps | 924000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.32 |\n", + "| explained_variance | 0.99392444 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46199 |\n", + "| policy_loss | 0.0017 |\n", + "| std | 0.557 |\n", + "| value_loss | 9.97e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.2 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46300 |\n", + "| time_elapsed | 3299 |\n", + "| total_timesteps | 926000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.32 |\n", + "| explained_variance | 0.11546546 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46299 |\n", + "| policy_loss | -0.01 |\n", + "| std | 0.558 |\n", + "| value_loss | 8.48e-05 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.7 |\n", + "| ep_rew_mean | -48.6 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46400 |\n", + "| time_elapsed | 3305 |\n", + "| total_timesteps | 928000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | 0.9007289 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46399 |\n", + "| policy_loss | 0.000824 |\n", + "| std | 0.561 |\n", + "| value_loss | 4.22e-07 |\n", + "-------------------------------------\n", + "---------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.2 |\n", + "| ep_rew_mean | -49.1 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46500 |\n", + "| time_elapsed | 3311 |\n", + "| total_timesteps | 930000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | -0.48946536 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46499 |\n", + "| policy_loss | 0.000401 |\n", + "| std | 0.561 |\n", + "| value_loss | 1.02e-06 |\n", + "---------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46600 |\n", + "| time_elapsed | 3318 |\n", + "| total_timesteps | 932000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.35 |\n", + "| explained_variance | 0.9060283 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46599 |\n", + "| policy_loss | 0.000685 |\n", + "| std | 0.561 |\n", + "| value_loss | 9.11e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 50 |\n", + "| ep_rew_mean | -50 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46700 |\n", + "| time_elapsed | 3327 |\n", + "| total_timesteps | 934000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.33 |\n", + "| explained_variance | 0.6672769 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46699 |\n", + "| policy_loss | -0.00052 |\n", + "| std | 0.559 |\n", + "| value_loss | 1.51e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46800 |\n", + "| time_elapsed | 3334 |\n", + "| total_timesteps | 936000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.31 |\n", + "| explained_variance | 0.7833716 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46799 |\n", + "| policy_loss | 0.000618 |\n", + "| std | 0.556 |\n", + "| value_loss | 9.54e-07 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 46900 |\n", + "| time_elapsed | 3340 |\n", + "| total_timesteps | 938000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.29 |\n", + "| explained_variance | 0.8197125 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46899 |\n", + "| policy_loss | -0.00295 |\n", + "| std | 0.554 |\n", + "| value_loss | 2.3e-05 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47000 |\n", + "| time_elapsed | 3347 |\n", + "| total_timesteps | 940000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.29 |\n", + "| explained_variance | 0.98894083 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 46999 |\n", + "| policy_loss | -8.07e-05 |\n", + "| std | 0.553 |\n", + "| value_loss | 1.47e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47100 |\n", + "| time_elapsed | 3353 |\n", + "| total_timesteps | 942000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.29 |\n", + "| explained_variance | 0.9561706 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47099 |\n", + "| policy_loss | -0.00176 |\n", + "| std | 0.553 |\n", + "| value_loss | 3.15e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.7 |\n", + "| ep_rew_mean | -49.7 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47200 |\n", + "| time_elapsed | 3362 |\n", + "| total_timesteps | 944000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.3 |\n", + "| explained_variance | 0.97445196 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47199 |\n", + "| policy_loss | -0.00119 |\n", + "| std | 0.555 |\n", + "| value_loss | 2.77e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.3 |\n", + "| ep_rew_mean | -47.2 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47300 |\n", + "| time_elapsed | 3369 |\n", + "| total_timesteps | 946000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.75822085 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47299 |\n", + "| policy_loss | 0.00987 |\n", + "| std | 0.551 |\n", + "| value_loss | 0.000115 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 46.6 |\n", + "| ep_rew_mean | -46.6 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47400 |\n", + "| time_elapsed | 3375 |\n", + "| total_timesteps | 948000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.9148364 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47399 |\n", + "| policy_loss | 0.00772 |\n", + "| std | 0.549 |\n", + "| value_loss | 8.81e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47500 |\n", + "| time_elapsed | 3381 |\n", + "| total_timesteps | 950000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.96930254 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47499 |\n", + "| policy_loss | -9.47e-06 |\n", + "| std | 0.551 |\n", + "| value_loss | 1.26e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 47600 |\n", + "| time_elapsed | 3387 |\n", + "| total_timesteps | 952000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.7950085 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47599 |\n", + "| policy_loss | -0.00454 |\n", + "| std | 0.55 |\n", + "| value_loss | 8.08e-06 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47700 |\n", + "| time_elapsed | 3397 |\n", + "| total_timesteps | 954000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.781541 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47699 |\n", + "| policy_loss | 0.000465 |\n", + "| std | 0.551 |\n", + "| value_loss | 2.91e-06 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47800 |\n", + "| time_elapsed | 3403 |\n", + "| total_timesteps | 956000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.9763136 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47799 |\n", + "| policy_loss | 0.000196 |\n", + "| std | 0.55 |\n", + "| value_loss | 1.38e-06 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 47900 |\n", + "| time_elapsed | 3410 |\n", + "| total_timesteps | 958000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.28 |\n", + "| explained_variance | 0.961331 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47899 |\n", + "| policy_loss | -0.00369 |\n", + "| std | 0.552 |\n", + "| value_loss | 3.65e-06 |\n", + "------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48000 |\n", + "| time_elapsed | 3416 |\n", + "| total_timesteps | 960000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.94155157 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 47999 |\n", + "| policy_loss | -0.00508 |\n", + "| std | 0.55 |\n", + "| value_loss | 6.79e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48100 |\n", + "| time_elapsed | 3422 |\n", + "| total_timesteps | 962000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.95676875 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48099 |\n", + "| policy_loss | 0.000654 |\n", + "| std | 0.551 |\n", + "| value_loss | 1.65e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 48200 |\n", + "| time_elapsed | 3431 |\n", + "| total_timesteps | 964000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.94901574 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48199 |\n", + "| policy_loss | 3.81e-06 |\n", + "| std | 0.55 |\n", + "| value_loss | 3.41e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.1 |\n", + "| ep_rew_mean | -47 |\n", + "| time/ | |\n", + "| fps | 280 |\n", + "| iterations | 48300 |\n", + "| time_elapsed | 3437 |\n", + "| total_timesteps | 966000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.26 |\n", + "| explained_variance | 0.9897378 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48299 |\n", + "| policy_loss | 0.00123 |\n", + "| std | 0.55 |\n", + "| value_loss | 2.62e-07 |\n", + "-------------------------------------\n", + "------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48400 |\n", + "| time_elapsed | 3443 |\n", + "| total_timesteps | 968000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.27 |\n", + "| explained_variance | 0.967348 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48399 |\n", + "| policy_loss | 0.00125 |\n", + "| std | 0.552 |\n", + "| value_loss | 4.61e-07 |\n", + "------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48500 |\n", + "| time_elapsed | 3450 |\n", + "| total_timesteps | 970000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.25 |\n", + "| explained_variance | 0.9456974 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48499 |\n", + "| policy_loss | -0.00673 |\n", + "| std | 0.548 |\n", + "| value_loss | 5.76e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48600 |\n", + "| time_elapsed | 3456 |\n", + "| total_timesteps | 972000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.24 |\n", + "| explained_variance | 0.81393284 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48599 |\n", + "| policy_loss | 0.0018 |\n", + "| std | 0.548 |\n", + "| value_loss | 4.43e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 47.9 |\n", + "| ep_rew_mean | -47.9 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48700 |\n", + "| time_elapsed | 3462 |\n", + "| total_timesteps | 974000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.23 |\n", + "| explained_variance | 0.97745365 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48699 |\n", + "| policy_loss | 0.00484 |\n", + "| std | 0.547 |\n", + "| value_loss | 3.19e-06 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.4 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48800 |\n", + "| time_elapsed | 3472 |\n", + "| total_timesteps | 976000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.23 |\n", + "| explained_variance | 0.9833449 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48799 |\n", + "| policy_loss | 0.00261 |\n", + "| std | 0.546 |\n", + "| value_loss | 3.5e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -48.9 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 48900 |\n", + "| time_elapsed | 3478 |\n", + "| total_timesteps | 978000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.23 |\n", + "| explained_variance | 0.96366274 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48899 |\n", + "| policy_loss | 0.000591 |\n", + "| std | 0.546 |\n", + "| value_loss | 4.17e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.6 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49000 |\n", + "| time_elapsed | 3483 |\n", + "| total_timesteps | 980000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.22 |\n", + "| explained_variance | 0.9828003 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 48999 |\n", + "| policy_loss | -0.00217 |\n", + "| std | 0.545 |\n", + "| value_loss | 1.52e-06 |\n", + "-------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49100 |\n", + "| time_elapsed | 3489 |\n", + "| total_timesteps | 982000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.23 |\n", + "| explained_variance | 0.9969808 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49099 |\n", + "| policy_loss | 7.52e-05 |\n", + "| std | 0.546 |\n", + "| value_loss | 1.92e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49200 |\n", + "| time_elapsed | 3495 |\n", + "| total_timesteps | 984000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.22 |\n", + "| explained_variance | 0.98948133 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49199 |\n", + "| policy_loss | -0.00204 |\n", + "| std | 0.545 |\n", + "| value_loss | 8.27e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49300 |\n", + "| time_elapsed | 3505 |\n", + "| total_timesteps | 986000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.21 |\n", + "| explained_variance | 0.97018635 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49299 |\n", + "| policy_loss | -0.000292 |\n", + "| std | 0.544 |\n", + "| value_loss | 4.47e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.1 |\n", + "| ep_rew_mean | -48.1 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49400 |\n", + "| time_elapsed | 3511 |\n", + "| total_timesteps | 988000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.21 |\n", + "| explained_variance | 0.9661637 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49399 |\n", + "| policy_loss | -0.00206 |\n", + "| std | 0.544 |\n", + "| value_loss | 9.4e-06 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49500 |\n", + "| time_elapsed | 3518 |\n", + "| total_timesteps | 990000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.2 |\n", + "| explained_variance | 0.82379425 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49499 |\n", + "| policy_loss | 0.00211 |\n", + "| std | 0.543 |\n", + "| value_loss | 2.47e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49 |\n", + "| ep_rew_mean | -49 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49600 |\n", + "| time_elapsed | 3524 |\n", + "| total_timesteps | 992000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.19 |\n", + "| explained_variance | 0.99219644 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49599 |\n", + "| policy_loss | -0.00165 |\n", + "| std | 0.542 |\n", + "| value_loss | 4.1e-07 |\n", + "--------------------------------------\n", + "-------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 49.5 |\n", + "| ep_rew_mean | -49.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49700 |\n", + "| time_elapsed | 3530 |\n", + "| total_timesteps | 994000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.2 |\n", + "| explained_variance | 0.9896941 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49699 |\n", + "| policy_loss | 0.000546 |\n", + "| std | 0.543 |\n", + "| value_loss | 1.62e-07 |\n", + "-------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48.5 |\n", + "| ep_rew_mean | -48.5 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49800 |\n", + "| time_elapsed | 3540 |\n", + "| total_timesteps | 996000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.18 |\n", + "| explained_variance | 0.99164146 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49799 |\n", + "| policy_loss | 0.000225 |\n", + "| std | 0.54 |\n", + "| value_loss | 4.13e-07 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 49900 |\n", + "| time_elapsed | 3545 |\n", + "| total_timesteps | 998000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.18 |\n", + "| explained_variance | 0.92336273 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49899 |\n", + "| policy_loss | -0.00245 |\n", + "| std | 0.54 |\n", + "| value_loss | 8.85e-06 |\n", + "--------------------------------------\n", + "--------------------------------------\n", + "| rollout/ | |\n", + "| ep_len_mean | 48 |\n", + "| ep_rew_mean | -48 |\n", + "| time/ | |\n", + "| fps | 281 |\n", + "| iterations | 50000 |\n", + "| time_elapsed | 3551 |\n", + "| total_timesteps | 1000000 |\n", + "| train/ | |\n", + "| entropy_loss | -3.18 |\n", + "| explained_variance | 0.95652837 |\n", + "| learning_rate | 0.0007 |\n", + "| n_updates | 49999 |\n", + "| policy_loss | -0.00401 |\n", + "| std | 0.54 |\n", + "| value_loss | 3.22e-06 |\n", + "--------------------------------------\n", + "argv[0]=--background_color_red=0.8745098114013672\n", + "argv[1]=--background_color_green=0.21176470816135406\n", + "argv[2]=--background_color_blue=0.1764705926179886\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/stable_baselines3/common/evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.\n", + " warnings.warn(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mean reward = -45.00 +/- 15.00\n", + "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n", + "create a model card and push everything to the hub. It might take up to 1min.\n", + "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n", + "Saving video to /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n", + "MoviePy - Building video /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4.\n", + "MoviePy - Writing video /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " \r" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MoviePy - Done !\n", + "MoviePy - video ready /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n", + " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n", + " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n", + " libavutil 58. 29.100 / 58. 29.100\n", + " libavcodec 60. 31.102 / 60. 31.102\n", + " libavformat 60. 16.100 / 60. 16.100\n", + " libavdevice 60. 3.100 / 60. 3.100\n", + " libavfilter 9. 12.100 / 9. 12.100\n", + " libswscale 7. 5.100 / 7. 5.100\n", + " libswresample 4. 12.100 / 4. 12.100\n", + " libpostproc 57. 3.100 / 57. 3.100\n", + "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf61.1.100\n", + " Duration: 00:00:40.00, start: 0.000000, bitrate: 190 kb/s\n", + " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 720x480, 187 kb/s, 25 fps, 25 tbr, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc61.3.100 libx264\n", + "Stream mapping:\n", + " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n", + "Press [q] to stop, [?] for help\n", + "[libx264 @ 0x5615de034a80] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n", + "[libx264 @ 0x5615de034a80] profile High, level 3.0, 4:2:0, 8-bit\n", + "[libx264 @ 0x5615de034a80] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n", + "Output #0, mp4, to '/tmp/tmp2wmkgvgp/replay.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2avc1mp41\n", + " encoder : Lavf60.16.100\n", + " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 720x480, q=2-31, 25 fps, 12800 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc60.31.102 libx264\n", + " Side data:\n", + " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n", + "[out#0/mp4 @ 0x5615ddfb0140] video:896kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.371167%\n", + "frame= 1000 fps=740 q=-1.0 Lsize= 908kB time=00:00:39.88 bitrate= 186.5kbits/s speed=29.5x \n", + "[libx264 @ 0x5615de034a80] frame I:4 Avg QP:17.50 size: 7558\n", + "[libx264 @ 0x5615de034a80] frame P:287 Avg QP:25.06 size: 1464\n", + "[libx264 @ 0x5615de034a80] frame B:709 Avg QP:25.16 size: 657\n", + "[libx264 @ 0x5615de034a80] consecutive B-frames: 2.6% 5.0% 10.8% 81.6%\n", + "[libx264 @ 0x5615de034a80] mb I I16..4: 3.1% 79.9% 17.0%\n", + "[libx264 @ 0x5615de034a80] mb P I16..4: 0.2% 1.6% 2.0% P16..4: 2.4% 1.4% 0.7% 0.0% 0.0% skip:91.7%\n", + "[libx264 @ 0x5615de034a80] mb B I16..4: 0.1% 0.2% 0.3% B16..8: 3.9% 1.3% 0.5% direct: 0.2% skip:93.5% L0:55.0% L1:42.9% BI: 2.2%\n", + "[libx264 @ 0x5615de034a80] 8x8 transform intra:46.3% inter:10.9%\n", + "[libx264 @ 0x5615de034a80] coded y,uvDC,uvAC intra: 32.2% 3.7% 0.9% inter: 0.9% 0.0% 0.0%\n", + "[libx264 @ 0x5615de034a80] i16 v,h,dc,p: 54% 24% 18% 4%\n", + "[libx264 @ 0x5615de034a80] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 41% 12% 44% 1% 1% 0% 1% 0% 1%\n", + "[libx264 @ 0x5615de034a80] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 19% 28% 4% 5% 5% 7% 3% 5%\n", + "[libx264 @ 0x5615de034a80] i8c dc,h,v,p: 93% 3% 4% 0%\n", + "[libx264 @ 0x5615de034a80] Weighted P-Frames: Y:0.0% UV:0.0%\n", + "[libx264 @ 0x5615de034a80] ref P L0: 48.4% 5.0% 27.9% 18.8%\n", + "[libx264 @ 0x5615de034a80] ref B L0: 78.2% 14.5% 7.3%\n", + "[libx264 @ 0x5615de034a80] ref B L1: 96.4% 3.6%\n", + "[libx264 @ 0x5615de034a80] kb/s:183.28\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Pushing repo turbo-maikol/a2c-PandaPickAndPlace-v3 to the Hugging\n", + "Face Hub\u001b[0m\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Processing Files (0 / 0) : | | 0.00B / 0.00B \n", + "\u001b[A\n", + "Processing Files (1 / 1) : 0%| | 1.26kB / 1.17MB, ???B/s \n", + "\u001b[A\n", + "\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (1 / 6) : 47%|████▋ | 545kB / 1.17MB, 680kB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (1 / 6) : 93%|█████████▎| 1.09MB / 1.17MB, 1.09MB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (6 / 6) : 100%|██████████| 1.17MB / 1.17MB, 837kB/s \n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\u001b[A\n", + "\n", + "\u001b[A\u001b[A\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Processing Files (6 / 6) : 100%|██████████| 1.17MB / 1.17MB, 586kB/s \n", + "New Data Upload : 100%|██████████| 1.17MB / 1.17MB, 586kB/s \n", + " ...ckAndPlace-v3/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n", + " ...ickAndPlace-v3/policy.optimizer.pth: 100%|██████████| 55.8kB / 55.8kB \n", + " ...a2c-PandaPickAndPlace-v3/policy.pth: 100%|██████████| 53.7kB / 53.7kB \n", + " ...mkgvgp/a2c-PandaPickAndPlace-v3.zip: 100%|██████████| 129kB / 129kB \n", + " /tmp/tmp2wmkgvgp/replay.mp4 : 100%|██████████| 930kB / 930kB \n", + " /tmp/tmp2wmkgvgp/vec_normalize.pkl : 100%|██████████| 2.95kB / 2.95kB \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n", + "https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3/tree/main/\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3/commit/457722bba273248332eadc56aa52d5aad99a7844', commit_message='Initial commit', commit_description='', oid='457722bba273248332eadc56aa52d5aad99a7844', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/a2c-PandaPickAndPlace-v3'), pr_revision=None, pr_num=None)" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } ], - "metadata": { - "id": "G3xy3Nf3c2O1" - } + "source": [ + "# 1 2 3\n", + "env_id_new = \"PandaPickAndPlace-v3\"\n", + "env_new = make_vec_env(env_id_new, n_envs=4)\n", + "env_new = VecNormalize(env_new, norm_obs=True, norm_reward=True, clip_obs=10)\n", + "# 4\n", + "model_new = A2C(\"MultiInputPolicy\", env_new, verbose=1) # Create the A2C model and try to find the best parameters\n", + "# 5\n", + "model_new.learn(1_000_000)\n", + "# 6\n", + "model_name_new = f\"new-{env_id_new}\"\n", + "model_new.save(model_name_new)\n", + "env_new.save(\"vec_normalize_new.pkl\")\n", + "\n", + "\n", + "# 7\n", + "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n", + "# Load the saved statistics\n", + "eval_env_new = DummyVecEnv([lambda: gym.make(f\"{env_id_new}\")])\n", + "eval_env_new = VecNormalize.load(\"vec_normalize_new.pkl\", eval_env_new)\n", + "# We need to override the render_mode\n", + "eval_env_new.render_mode = \"rgb_array\"\n", + "# do not update them at test time\n", + "eval_env_new.training = False\n", + "# reward normalization is not needed at test time\n", + "eval_env_new.norm_reward = False\n", + "# Load the agent\n", + "model = A2C.load(model_name_new)\n", + "\n", + "mean_reward, std_reward = evaluate_policy(model, eval_env_new)\n", + "\n", + "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n", + "\n", + "\n", + "# 8\n", + "package_to_hub(\n", + " model=model,\n", + " model_name=f\"a2c-{env_id_new}\",\n", + " model_architecture=\"A2C\",\n", + " env_id=env_id_new,\n", + " eval_env=eval_env_new,\n", + " repo_id=f\"turbo-maikol/a2c-{env_id_new}\", # Change the username\n", + " commit_message=\"Initial commit\",\n", + ")" + ] }, { "cell_type": "markdown", - "source": [ - "### Solution (optional)" - ], "metadata": { "id": "sKGbFXZq9ikN" - } + }, + "source": [ + "### Solution (optional)" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "J-cC-Feg9iMm" + }, + "outputs": [], "source": [ "# 1 - 2\n", "env_id = \"PandaPickAndPlace-v3\"\n", @@ -735,15 +19577,15 @@ " verbose=1)\n", "# 5\n", "model.learn(1_000_000)" - ], - "metadata": { - "id": "J-cC-Feg9iMm" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-UnlKLmpg80p" + }, + "outputs": [], "source": [ "# 6\n", "model_name = \"a2c-PandaPickAndPlace-v3\";\n", @@ -779,22 +19621,48 @@ " repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n", " commit_message=\"Initial commit\",\n", ")" - ], - "metadata": { - "id": "-UnlKLmpg80p" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "usatLaZ8dM4P" + }, "source": [ "See you on Unit 7! 🔥\n", "## Keep learning, stay awesome 🤗" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [ + "tF42HvI7-gs5" ], - "metadata": { - "id": "usatLaZ8dM4P" - } + "include_colab_link": true, + "private_outputs": true, + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "venv-u6", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.18" } - ] + }, + "nbformat": 4, + "nbformat_minor": 0 } diff --git a/notebooks/unit8/unit8_part1.ipynb b/notebooks/unit8/unit8_part1.ipynb index 653385b..3586798 100644 --- a/notebooks/unit8/unit8_part1.ipynb +++ b/notebooks/unit8/unit8_part1.ipynb @@ -3,8 +3,8 @@ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" @@ -60,6 +60,9 @@ }, { "cell_type": "markdown", + "metadata": { + "id": "T6lIPYFghhYL" + }, "source": [ "## Objectives of this notebook 🏆\n", "\n", @@ -69,13 +72,13 @@ "- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n", "\n", "\n" - ], - "metadata": { - "id": "T6lIPYFghhYL" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "Wp-rD6Fuhq31" + }, "source": [ "## This notebook is from the Deep Reinforcement Learning Course\n", "\"Deep\n", @@ -90,82 +93,79 @@ "\n", "\n", "The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5" - ], - "metadata": { - "id": "Wp-rD6Fuhq31" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "rasqqGQlhujA" + }, "source": [ "## Prerequisites 🏗️\n", "Before diving into the notebook, you need to:\n", "\n", "🔲 📚 Study [PPO by reading Unit 8](https://huggingface.co/deep-rl-course/unit8/introduction) 🤗 " - ], - "metadata": { - "id": "rasqqGQlhujA" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "PUFfMGOih3CW" + }, "source": [ "To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push one model, we don't ask for a minimal result but we **advise you to try different hyperparameters settings to get better results**.\n", "\n", "If you don't find your model, **go to the bottom of the page and click on the refresh button**\n", "\n", "For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process" - ], - "metadata": { - "id": "PUFfMGOih3CW" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "PU4FVzaoM6fC" + }, "source": [ "## Set the GPU 💪\n", "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "PU4FVzaoM6fC" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "KV0NyFdQM9ZG" + }, "source": [ "- `Hardware Accelerator > GPU`\n", "\n", "\"GPU" - ], - "metadata": { - "id": "KV0NyFdQM9ZG" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "bTpYcVZVMzUI" + }, "source": [ "## Create a virtual display 🔽\n", "\n", "During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n", "\n", "Hence the following cell will install the librairies and create and run a virtual screen 🖥" - ], - "metadata": { - "id": "bTpYcVZVMzUI" - } + ] }, { "cell_type": "code", - "source": [ - "!pip install setuptools==65.5.0" - ], + "execution_count": null, "metadata": { "id": "Fd731S8-NuJA" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!pip install setuptools==65.5.0" + ] }, { "cell_type": "code", @@ -186,18 +186,18 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ww5PQH1gNLI4" + }, + "outputs": [], "source": [ "# Virtual display\n", "from pyvirtualdisplay import Display\n", "\n", "virtual_display = Display(visible=0, size=(1400, 900))\n", "virtual_display.start()" - ], - "metadata": { - "id": "ww5PQH1gNLI4" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", @@ -211,17 +211,14 @@ }, { "cell_type": "code", - "source": [ - "!pip install gym==0.22\n", - "!pip install imageio-ffmpeg\n", - "!pip install huggingface_hub\n", - "!pip install gym[box2d]==0.22" - ], + "execution_count": null, "metadata": { "id": "9xZQFTPcsKUK" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "pip install gym==0.22 imageio-ffmpeg huggingface_hub gym[box2d]==0.22" + ] }, { "cell_type": "markdown", @@ -266,7 +263,17 @@ }, "outputs": [], "source": [ - "### Your code here:" + "### Your code here:\n", + "from ppo import " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# EXECUTED CELLS TO UPLOAD MY MODEL TO HUGGING FACE" ] }, { @@ -307,7 +314,10 @@ "import imageio\n", "\n", "from wasabi import Printer\n", - "msg = Printer()" + "msg = Printer()\n", + "\n", + "%load_ext autoreload\n", + "%autoreload 2" ] }, { @@ -319,18 +329,6 @@ "- Add new argument in `parse_args()` function to define the repo-id where we want to push the model." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "iHQiqQEFn0QH" - }, - "outputs": [], - "source": [ - "# Adding HuggingFace argument\n", - "parser.add_argument(\"--repo-id\", type=str, default=\"ThomasSimonini/ppo-CartPole-v1\", help=\"id of the model repository from the Hugging Face Hub {username/repo_name}\")" - ] - }, { "cell_type": "markdown", "metadata": { @@ -452,17 +450,17 @@ " \"\"\"\n", " episode_rewards = []\n", " for episode in range(n_eval_episodes):\n", - " state = env.reset()\n", + " state,_ = env.reset()\n", " step = 0\n", " done = False\n", " total_rewards_ep = 0\n", " \n", " while done is False:\n", " state = torch.Tensor(state).to(device)\n", - " action, _, _, _ = policy.get_action_and_value(state)\n", - " new_state, reward, done, info = env.step(action.cpu().numpy())\n", + " action, _, _, _ = policy.get_action_value(state)\n", + " new_state, reward, trunc, term, info = env.step(action.cpu().numpy())\n", " total_rewards_ep += reward \n", - " if done:\n", + " if trunc or term:\n", " break\n", " state = new_state\n", " episode_rewards.append(total_rewards_ep)\n", @@ -474,16 +472,16 @@ "\n", "def record_video(env, policy, out_directory, fps=30):\n", " images = [] \n", - " done = False\n", - " state = env.reset()\n", - " img = env.render(mode='rgb_array')\n", + " trunc, term = False, False\n", + " state, _= env.reset()\n", + " img = env.render()\n", " images.append(img)\n", - " while not done:\n", + " while not trunc or term:\n", " state = torch.Tensor(state).to(device)\n", " # Take the action (index) that have the maximum expected future reward given that state\n", - " action, _, _, _ = policy.get_action_and_value(state)\n", - " state, reward, done, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n", - " img = env.render(mode='rgb_array')\n", + " action, _, _, _ = policy.get_action_value(state)\n", + " state, reward, trunc, term, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n", + " img = env.render()\n", " images.append(img)\n", " imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)\n", "\n", @@ -603,6 +601,36 @@ "- Finally, we call this function at the end of the PPO training" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "args_repo_id = \"turbo-maikol/rl-course-unit8-ppo-LunarLander-v2\"\n", + "args_env_id = \"LunarLander-v3\"\n", + "run_name = \"LunarLander-HF\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from src.utils.model_utils import load_agent\n", + "from src.config import Configuration\n", + "\n", + "CONFIG = Configuration(\n", + " MODELS=\"../../rl-module/models\",\n", + " exp_name=\"lunar-lander-hf-V2\",\n", + " env_id = args_env_id\n", + ")\n", + "agent = load_agent(CONFIG)\n", + "\n", + "device = CONFIG.device" + ] + }, { "cell_type": "code", "execution_count": null, @@ -611,17 +639,26 @@ }, "outputs": [], "source": [ + "import gymnasium as gym\n", + "import torch\n", "# Create the evaluation environment\n", - "eval_env = gym.make(args.env_id)\n", + "eval_env = gym.make(args_env_id, render_mode=\"rgb_array\")\n", "\n", - "package_to_hub(repo_id = args.repo_id,\n", + "package_to_hub(repo_id = args_repo_id,\n", " model = agent, # The model we want to save\n", - " hyperparameters = args,\n", - " eval_env = gym.make(args.env_id),\n", + " hyperparameters = CONFIG,\n", + " eval_env = eval_env,\n", " logs= f\"runs/{run_name}\",\n", " )" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "----" + ] + }, { "cell_type": "markdown", "metadata": { @@ -647,7 +684,7 @@ "import time\n", "from distutils.util import strtobool\n", "\n", - "import gym\n", + "import gymnasium as gym\n", "import numpy as np\n", "import torch\n", "import torch.nn as nn\n", @@ -840,7 +877,7 @@ " \n", " while done is False:\n", " state = torch.Tensor(state).to(device)\n", - " action, _, _, _ = policy.get_action_and_value(state)\n", + " action, _, _, _ = policy.get_action_value(state)\n", " new_state, reward, done, info = env.step(action.cpu().numpy())\n", " total_rewards_ep += reward \n", " if done:\n", @@ -862,7 +899,7 @@ " while not done:\n", " state = torch.Tensor(state).to(device)\n", " # Take the action (index) that have the maximum expected future reward given that state\n", - " action, _, _, _ = policy.get_action_and_value(state)\n", + " action, _, _, _ = policy.get_action_value(state)\n", " state, reward, done, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n", " img = env.render(mode='rgb_array')\n", " images.append(img)\n", @@ -1013,7 +1050,7 @@ " def get_value(self, x):\n", " return self.critic(x)\n", "\n", - " def get_action_and_value(self, x, action=None):\n", + " def get_action_value(self, x, action=None):\n", " logits = self.actor(x)\n", " probs = Categorical(logits=logits)\n", " if action is None:\n", @@ -1023,7 +1060,7 @@ "\n", "if __name__ == \"__main__\":\n", " args = parse_args()\n", - " run_name = f\"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}\"\n", + " run_name = f\"{args_env_id}__{args.exp_name}__{args.seed}__{int(time.time())}\"\n", " if args.track:\n", " import wandb\n", "\n", @@ -1052,7 +1089,7 @@ "\n", " # env setup\n", " envs = gym.vector.SyncVectorEnv(\n", - " [make_env(args.env_id, args.seed + i, i, args.capture_video, run_name) for i in range(args.num_envs)]\n", + " [make_env(args_env_id, args.seed + i, i, args.capture_video, run_name) for i in range(args.num_envs)]\n", " )\n", " assert isinstance(envs.single_action_space, gym.spaces.Discrete), \"only discrete action space is supported\"\n", "\n", @@ -1088,7 +1125,7 @@ "\n", " # ALGO LOGIC: action logic\n", " with torch.no_grad():\n", - " action, logprob, _, value = agent.get_action_and_value(next_obs)\n", + " action, logprob, _, value = agent.get_action_value(next_obs)\n", " values[step] = value.flatten()\n", " actions[step] = action\n", " logprobs[step] = logprob\n", @@ -1150,7 +1187,7 @@ " end = start + args.minibatch_size\n", " mb_inds = b_inds[start:end]\n", "\n", - " _, newlogprob, entropy, newvalue = agent.get_action_and_value(b_obs[mb_inds], b_actions.long()[mb_inds])\n", + " _, newlogprob, entropy, newvalue = agent.get_action_value(b_obs[mb_inds], b_actions.long()[mb_inds])\n", " logratio = newlogprob - b_logprobs[mb_inds]\n", " ratio = logratio.exp()\n", "\n", @@ -1216,12 +1253,12 @@ " writer.close()\n", "\n", " # Create the evaluation environment\n", - " eval_env = gym.make(args.env_id)\n", + " eval_env = gym.make(args_env_id)\n", "\n", - " package_to_hub(repo_id = args.repo_id,\n", + " package_to_hub(repo_id = args_repo_id,\n", " model = agent, # The model we want to save\n", " hyperparameters = args,\n", - " eval_env = gym.make(args.env_id),\n", + " eval_env = gym.make(args_env_id),\n", " logs= f\"runs/{run_name}\",\n", " )\n", " " @@ -1290,21 +1327,21 @@ }, { "cell_type": "markdown", - "source": [ - "\"PPO\"/" - ], "metadata": { "id": "Sq0My0LOjPYR" - } + }, + "source": [ + "\"PPO\"/" + ] }, { "cell_type": "markdown", - "source": [ - "\"PPO\"/" - ], "metadata": { "id": "A8C-Q5ZyjUe3" - } + }, + "source": [ + "\"PPO\"/" + ] }, { "cell_type": "markdown", @@ -1319,14 +1356,14 @@ }, { "cell_type": "code", - "source": [ - "!python ppo.py --env-id=\"LunarLander-v2\" --repo-id=\"YOUR_REPO_ID\" --total-timesteps=50000" - ], + "execution_count": null, "metadata": { "id": "KXLih6mKseBs" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "!python ppo.py --env-id=\"LunarLander-v2\" --repo-id=\"YOUR_REPO_ID\" --total-timesteps=50000" + ] }, { "cell_type": "markdown", @@ -1350,22 +1387,32 @@ } ], "metadata": { + "accelerator": "GPU", "colab": { - "private_outputs": true, - "provenance": [], "history_visible": true, - "include_colab_link": true + "include_colab_link": true, + "private_outputs": true, + "provenance": [] }, "gpuClass": "standard", "kernelspec": { - "display_name": "Python 3", + "display_name": "venv", + "language": "python", "name": "python3" }, "language_info": { - "name": "python" - }, - "accelerator": "GPU" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.18" + } }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} diff --git a/notebooks/unit8/unit8_part2.ipynb b/notebooks/unit8/unit8_part2.ipynb index 7c38b10..59eb35b 100644 --- a/notebooks/unit8/unit8_part2.ipynb +++ b/notebooks/unit8/unit8_part2.ipynb @@ -3,8 +3,8 @@ { "cell_type": "markdown", "metadata": { - "id": "view-in-github", - "colab_type": "text" + "colab_type": "text", + "id": "view-in-github" }, "source": [ "\"Open" @@ -244,21 +244,9 @@ "source": [ "# install python libraries\n", "# thanks toinsson\n", - "!pip install faster-fifo==1.4.2\n", - "!pip install vizdoom" + "!pip install faster-fifo==1.4.2 vizdoom sample-factory==2.1.1" ] }, - { - "cell_type": "code", - "source": [ - "!pip install sample-factory==2.1.1" - ], - "metadata": { - "id": "alxUt7Au-O8e" - }, - "execution_count": null, - "outputs": [] - }, { "cell_type": "markdown", "metadata": { @@ -270,7 +258,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": { "id": "bCgZbeiavcDU" }, @@ -358,11 +346,210 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "id": "y_TeicMvyKHP" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[33m[2025-08-29 19:52:59,093][32845] Environment doom_basic already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,095][32845] Environment doom_two_colors_easy already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,096][32845] Environment doom_two_colors_hard already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,098][32845] Environment doom_dm already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,098][32845] Environment doom_dwango5 already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,099][32845] Environment doom_my_way_home_flat_actions already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,100][32845] Environment doom_defend_the_center_flat_actions already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,100][32845] Environment doom_my_way_home already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,101][32845] Environment doom_deadly_corridor already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,102][32845] Environment doom_defend_the_center already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,103][32845] Environment doom_defend_the_line already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering_supreme already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,105][32845] Environment doom_battle already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,106][32845] Environment doom_battle2 already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,106][32845] Environment doom_duel_bots already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,107][32845] Environment doom_deathmatch_bots already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,107][32845] Environment doom_duel already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,108][32845] Environment doom_deathmatch_full already registered, overwriting...\u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,109][32845] Environment doom_benchmark already registered, overwriting...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:52:59,109][32845] register_encoder_factory: \u001b[0m\n", + "\u001b[33m[2025-08-29 19:52:59,191][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json\u001b[0m\n", + "\u001b[36m[2025-08-29 19:52:59,209][32845] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:52:59,224][32845] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:52:59,225][32845] Weights and Biases integration disabled\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:52:59,235][32845] Environment var CUDA_VISIBLE_DEVICES is 0\n", + "\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,426][43033] Doom resolution: 160x120, resize resolution: (128, 72)\u001b[0m\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.num_agents to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.num_agents` for environment variables or `env.get_wrapper_attr('num_agents')` that will search the reminding wrappers.\u001b[0m\n", + " logger.warn(\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.is_multiagent to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.is_multiagent` for environment variables or `env.get_wrapper_attr('is_multiagent')` that will search the reminding wrappers.\u001b[0m\n", + " logger.warn(\n", + "\u001b[36m[2025-08-29 19:53:01,428][43033] Env info: EnvInfo(obs_space=Dict('obs': Box(0, 255, (3, 72, 128), uint8)), action_space=Discrete(5), num_agents=1, gpu_actions=False, gpu_observations=True, action_splits=None, all_discrete=None, frameskip=4, reward_shaping_scheme=None, env_info_protocol_version=1)\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,760][32845] Starting experiment with the following configuration:\n", + "help=False\n", + "algo=APPO\n", + "env=doom_health_gathering_supreme\n", + "experiment=default_experiment\n", + "train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir\n", + "restart_behavior=resume\n", + "device=gpu\n", + "seed=None\n", + "num_policies=1\n", + "async_rl=True\n", + "serial_mode=False\n", + "batched_sampling=False\n", + "num_batches_to_accumulate=2\n", + "worker_num_splits=2\n", + "policy_workers_per_policy=1\n", + "max_policy_lag=1000\n", + "num_workers=10\n", + "num_envs_per_worker=8\n", + "batch_size=16384\n", + "num_batches_per_epoch=1\n", + "num_epochs=1\n", + "rollout=64\n", + "recurrence=32\n", + "shuffle_minibatches=False\n", + "gamma=0.99\n", + "reward_scale=1.0\n", + "reward_clip=1000.0\n", + "value_bootstrap=False\n", + "normalize_returns=True\n", + "exploration_loss_coeff=0.001\n", + "value_loss_coeff=0.5\n", + "kl_loss_coeff=0.0\n", + "exploration_loss=symmetric_kl\n", + "gae_lambda=0.95\n", + "ppo_clip_ratio=0.2\n", + "ppo_clip_value=0.2\n", + "with_vtrace=False\n", + "vtrace_rho=1.0\n", + "vtrace_c=1.0\n", + "optimizer=adam\n", + "adam_eps=1e-06\n", + "adam_beta1=0.9\n", + "adam_beta2=0.999\n", + "max_grad_norm=4.0\n", + "learning_rate=0.0002\n", + "lr_schedule=constant\n", + "lr_schedule_kl_threshold=0.008\n", + "lr_adaptive_min=1e-06\n", + "lr_adaptive_max=0.01\n", + "obs_subtract_mean=0.0\n", + "obs_scale=255.0\n", + "normalize_input=True\n", + "normalize_input_keys=None\n", + "decorrelate_experience_max_seconds=0\n", + "decorrelate_envs_on_one_worker=True\n", + "actor_worker_gpus=[]\n", + "set_workers_cpu_affinity=True\n", + "force_envs_single_thread=False\n", + "default_niceness=0\n", + "log_to_file=True\n", + "experiment_summaries_interval=10\n", + "flush_summaries_interval=30\n", + "stats_avg=100\n", + "summaries_use_frameskip=True\n", + "heartbeat_interval=20\n", + "heartbeat_reporting_interval=600\n", + "train_for_env_steps=30000000\n", + "train_for_seconds=10000000000\n", + "save_every_sec=120\n", + "keep_checkpoints=2\n", + "load_checkpoint_kind=latest\n", + "save_milestones_sec=-1\n", + "save_best_every_sec=5\n", + "save_best_metric=reward\n", + "save_best_after=100000\n", + "benchmark=False\n", + "encoder_mlp_layers=[512, 512]\n", + "encoder_conv_architecture=convnet_simple\n", + "encoder_conv_mlp_layers=[512]\n", + "use_rnn=True\n", + "rnn_size=512\n", + "rnn_type=gru\n", + "rnn_num_layers=1\n", + "decoder_mlp_layers=[]\n", + "nonlinearity=elu\n", + "policy_initialization=orthogonal\n", + "policy_init_gain=1.0\n", + "actor_critic_share_weights=True\n", + "adaptive_stddev=True\n", + "continuous_tanh_scale=0.0\n", + "initial_stddev=1.0\n", + "use_env_info_cache=False\n", + "env_gpu_actions=False\n", + "env_gpu_observations=True\n", + "env_frameskip=4\n", + "env_framestack=1\n", + "pixel_format=CHW\n", + "use_record_episode_statistics=False\n", + "with_wandb=False\n", + "wandb_user=None\n", + "wandb_project=sample_factory\n", + "wandb_group=None\n", + "wandb_job_type=SF\n", + "wandb_tags=[]\n", + "with_pbt=False\n", + "pbt_mix_policies_in_one_env=True\n", + "pbt_period_env_steps=5000000\n", + "pbt_start_mutation=20000000\n", + "pbt_replace_fraction=0.3\n", + "pbt_mutation_rate=0.15\n", + "pbt_replace_reward_gap=0.1\n", + "pbt_replace_reward_gap_absolute=1e-06\n", + "pbt_optimize_gamma=False\n", + "pbt_target_objective=true_objective\n", + "pbt_perturb_min=1.1\n", + "pbt_perturb_max=1.5\n", + "num_agents=-1\n", + "num_humans=0\n", + "num_bots=-1\n", + "start_bot_difficulty=None\n", + "timelimit=None\n", + "res_w=128\n", + "res_h=72\n", + "wide_aspect_ratio=False\n", + "eval_env_frameskip=1\n", + "fps=35\n", + "command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000\n", + "cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}\n", + "git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042\n", + "git_repo_name=git@github.com:huggingface/deep-rl-class.git\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,762][32845] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,831][32845] Rollout worker 0 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,832][32845] Rollout worker 1 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,832][32845] Rollout worker 2 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,833][32845] Rollout worker 3 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,833][32845] Rollout worker 4 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,834][32845] Rollout worker 5 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,836][32845] Rollout worker 6 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,836][32845] Rollout worker 7 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,837][32845] Rollout worker 8 uses device cpu\u001b[0m\n", + "\u001b[36m[2025-08-29 19:53:01,837][32845] Rollout worker 9 uses device cpu\u001b[0m\n" + ] + }, + { + "ename": "KeyboardInterrupt", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[31m---------------------------------------------------------------------------\u001b[39m", + "\u001b[31mKeyboardInterrupt\u001b[39m Traceback (most recent call last)", + "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[3]\u001b[39m\u001b[32m, line 31\u001b[39m\n\u001b[32m 6\u001b[39m env = \u001b[33m\"\u001b[39m\u001b[33mdoom_health_gathering_supreme\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 7\u001b[39m cfg = parse_vizdoom_cfg(argv=[\n\u001b[32m 8\u001b[39m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33m--env=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00menv\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m,\n\u001b[32m 9\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 28\u001b[39m \n\u001b[32m 29\u001b[39m ])\n\u001b[32m---> \u001b[39m\u001b[32m31\u001b[39m status = \u001b[43mrun_rl\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m)\u001b[49m\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/train.py:37\u001b[39m, in \u001b[36mrun_rl\u001b[39m\u001b[34m(cfg)\u001b[39m\n\u001b[32m 32\u001b[39m cfg, runner = make_runner(cfg)\n\u001b[32m 34\u001b[39m \u001b[38;5;66;03m# here we can register additional message or summary handlers\u001b[39;00m\n\u001b[32m 35\u001b[39m \u001b[38;5;66;03m# see sf_examples/dmlab/train_dmlab.py for example\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m37\u001b[39m status = \u001b[43mrunner\u001b[49m\u001b[43m.\u001b[49m\u001b[43minit\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 38\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m status == ExperimentStatus.SUCCESS:\n\u001b[32m 39\u001b[39m status = runner.run()\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/runners/runner_parallel.py:21\u001b[39m, in \u001b[36mParallelRunner.init\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m 20\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34minit\u001b[39m(\u001b[38;5;28mself\u001b[39m) -> StatusCode:\n\u001b[32m---> \u001b[39m\u001b[32m21\u001b[39m status = \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43minit\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 22\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m status != ExperimentStatus.SUCCESS:\n\u001b[32m 23\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m status\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/runners/runner.py:557\u001b[39m, in \u001b[36mRunner.init\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m 554\u001b[39m \u001b[38;5;28mself\u001b[39m._save_cfg()\n\u001b[32m 555\u001b[39m save_git_diff(experiment_dir(\u001b[38;5;28mself\u001b[39m.cfg))\n\u001b[32m--> \u001b[39m\u001b[32m557\u001b[39m \u001b[38;5;28mself\u001b[39m.buffer_mgr = \u001b[43mBufferMgr\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43menv_info\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 559\u001b[39m \u001b[38;5;28mself\u001b[39m._observers_call(AlgoObserver.on_init, \u001b[38;5;28mself\u001b[39m)\n\u001b[32m 561\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ExperimentStatus.SUCCESS\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:215\u001b[39m, in \u001b[36mBufferMgr.__init__\u001b[39m\u001b[34m(self, cfg, env_info)\u001b[39m\n\u001b[32m 208\u001b[39m num_buffers = \u001b[38;5;28mmax\u001b[39m(\n\u001b[32m 209\u001b[39m num_buffers,\n\u001b[32m 210\u001b[39m \u001b[38;5;28mself\u001b[39m.max_batches_to_accumulate * \u001b[38;5;28mself\u001b[39m.trajectories_per_training_iteration * cfg.num_policies,\n\u001b[32m 211\u001b[39m )\n\u001b[32m 213\u001b[39m \u001b[38;5;28mself\u001b[39m.traj_buffer_queues[device] = get_queue(cfg.serial_mode)\n\u001b[32m--> \u001b[39m\u001b[32m215\u001b[39m \u001b[38;5;28mself\u001b[39m.traj_tensors_torch[device] = \u001b[43malloc_trajectory_tensors\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 216\u001b[39m \u001b[43m \u001b[49m\u001b[43menv_info\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 217\u001b[39m \u001b[43m \u001b[49m\u001b[43mnum_buffers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 218\u001b[39m \u001b[43m \u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrollout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 219\u001b[39m \u001b[43m \u001b[49m\u001b[43mrnn_size\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 220\u001b[39m \u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 221\u001b[39m \u001b[43m \u001b[49m\u001b[43mshare\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 222\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 223\u001b[39m \u001b[38;5;28mself\u001b[39m.policy_output_tensors_torch[device], output_names, output_sizes = alloc_policy_output_tensors(\n\u001b[32m 224\u001b[39m cfg, env_info, rnn_size, device, share\n\u001b[32m 225\u001b[39m )\n\u001b[32m 226\u001b[39m \u001b[38;5;28mself\u001b[39m.output_names, \u001b[38;5;28mself\u001b[39m.output_sizes = output_names, output_sizes\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:91\u001b[39m, in \u001b[36malloc_trajectory_tensors\u001b[39m\u001b[34m(env_info, num_traj, rollout, rnn_size, device, share)\u001b[39m\n\u001b[32m 89\u001b[39m \u001b[38;5;66;03m# we need to allocate an extra rollout step here to calculate the value estimates for the last step\u001b[39;00m\n\u001b[32m 90\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m space_name, space \u001b[38;5;129;01min\u001b[39;00m obs_space.spaces.items():\n\u001b[32m---> \u001b[39m\u001b[32m91\u001b[39m tensors[\u001b[33m\"\u001b[39m\u001b[33mobs\u001b[39m\u001b[33m\"\u001b[39m][space_name] = \u001b[43minit_tensor\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43mnum_traj\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrollout\u001b[49m\u001b[43m \u001b[49m\u001b[43m+\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mspace\u001b[49m\u001b[43m.\u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mspace\u001b[49m\u001b[43m.\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mshare\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 92\u001b[39m tensors[\u001b[33m\"\u001b[39m\u001b[33mrnn_states\u001b[39m\u001b[33m\"\u001b[39m] = init_tensor([num_traj, rollout + \u001b[32m1\u001b[39m], torch.float32, [rnn_size], device, share)\n\u001b[32m 94\u001b[39m num_actions, num_action_distribution_parameters = action_info(env_info)\n", + "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:43\u001b[39m, in \u001b[36minit_tensor\u001b[39m\u001b[34m(leading_dimensions, tensor_type, tensor_shape, device, share)\u001b[39m\n\u001b[32m 40\u001b[39m tensor_shape = [x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m tensor_shape \u001b[38;5;28;01mif\u001b[39;00m x]\n\u001b[32m 42\u001b[39m final_shape = leading_dimensions + \u001b[38;5;28mlist\u001b[39m(tensor_shape)\n\u001b[32m---> \u001b[39m\u001b[32m43\u001b[39m t = \u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfinal_shape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[43mtensor_type\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 45\u001b[39m \u001b[38;5;66;03m# fill with magic values to make it easy to spot if we ever use unintialized data\u001b[39;00m\n\u001b[32m 46\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m t.is_floating_point():\n", + "\u001b[31mKeyboardInterrupt\u001b[39m: " + ] + } + ], "source": [ "## Start the training, this should take around 15 minutes\n", "register_vizdoom_components()\n", @@ -370,7 +557,29 @@ "# The scenario we train on today is health gathering\n", "# other scenarios include \"doom_basic\", \"doom_two_colors_easy\", \"doom_dm\", \"doom_dwango5\", \"doom_my_way_home\", \"doom_deadly_corridor\", \"doom_defend_the_center\", \"doom_defend_the_line\"\n", "env = \"doom_health_gathering_supreme\"\n", - "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=8\", \"--num_envs_per_worker=4\", \"--train_for_env_steps=4000000\"])\n", + "cfg = parse_vizdoom_cfg(argv=[\n", + " f\"--env={env}\",\n", + "\n", + " # Parallelism / speed\n", + " \"--num_workers=10\", # more CPU workers if you have cores\n", + " \"--num_envs_per_worker=8\", # more envs per worker (GPU permitting)\n", + "\n", + " # Training length\n", + " \"--train_for_env_steps=30000000\", # 20M steps → better convergence\n", + "\n", + " # Rollouts\n", + " \"--rollout=64\", # longer rollouts = better advantage estimates\n", + "\n", + " # PPO / optimizer\n", + " \"--batch_size=16384\", # bigger batch for more stable updates\n", + " \"--learning_rate=0.0002\", # slightly higher than doom default\n", + " \"--ppo_clip_ratio=0.2\", # more conservative clipping\n", + "\n", + " # Model / memory\n", + " \"--recurrence=32\", # add LSTM memory (important for Doom)\n", + " \"--use_rnn=True\",\n", + "\n", + "])\n", "\n", "status = run_rl(cfg)" ] @@ -386,11 +595,184 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "import numpy \n", + "torch.serialization.add_safe_globals([\n", + " numpy.core.multiarray.scalar,\n", + " numpy.dtype,\n", + " numpy.dtypes.Float64DType\n", + "])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, "metadata": { "id": "MGSA4Kg5_i0j" }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[33m[2025-08-29 19:09:28,003][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,004][15827] Overriding arg 'num_workers' with value 1 passed from command line\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,004][15827] Adding new argument 'no_render'=True that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,005][15827] Adding new argument 'save_video'=True that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,006][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,006][15827] Adding new argument 'video_name'=None that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,008][15827] Adding new argument 'push_to_hub'=False that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,008][15827] Adding new argument 'hf_repository'=None that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,009][15827] Adding new argument 'policy_index'=0 that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,010][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,011][15827] Adding new argument 'train_script'=None that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,011][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file!\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,012][15827] Using frameskip 1 and render_action_repeat=4 for evaluation\u001b[0m\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.num_agents to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.num_agents` for environment variables or `env.get_wrapper_attr('num_agents')` that will search the reminding wrappers.\u001b[0m\n", + " logger.warn(\n", + "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.is_multiagent to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.is_multiagent` for environment variables or `env.get_wrapper_attr('is_multiagent')` that will search the reminding wrappers.\u001b[0m\n", + " logger.warn(\n", + "\u001b[36m[2025-08-29 19:09:28,068][15827] RunningMeanStd input shape: (3, 72, 128)\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,070][15827] RunningMeanStd input shape: (1,)\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,078][15827] ConvEncoder: input_channels=3\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,110][15827] Conv encoder output size: 512\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,111][15827] Policy head output size: 512\u001b[0m\n", + "\u001b[33m[2025-08-29 19:09:28,147][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...\u001b[0m\n", + "[W][05112.308343] pw.conf | [ conf.c: 1031 try_load_conf()] can't load config client-rt.conf: No such file or directory\n", + "[E][05112.308453] pw.conf | [ conf.c: 1060 pw_conf_load_conf_for_context()] can't load config client-rt.conf: No such file or directory\n", + "[ALSOFT] (EE) Failed to create PipeWire event context (errno: 2)\n", + "\u001b[36m[2025-08-29 19:09:28,678][15827] Num frames 100...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:28,901][15827] Num frames 200...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:29,082][15827] Num frames 300...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:29,271][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:29,273][15827] Avg episode reward: 3.840, avg true_objective: 3.840\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:29,303][15827] Num frames 400...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:29,488][15827] Num frames 500...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:29,669][15827] Num frames 600...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:29,866][15827] Num frames 700...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:30,068][15827] Num frames 800...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:30,279][15827] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:30,281][15827] Avg episode reward: 5.320, avg true_objective: 4.320\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:30,350][15827] Num frames 900...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:30,519][15827] Num frames 1000...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:30,721][15827] Num frames 1100...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:30,932][15827] Num frames 1200...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:31,091][15827] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:31,093][15827] Avg episode reward: 4.827, avg true_objective: 4.160\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:31,207][15827] Num frames 1300...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:31,409][15827] Num frames 1400...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:31,677][15827] Num frames 1500...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:31,882][15827] Num frames 1600...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:32,000][15827] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:32,002][15827] Avg episode reward: 4.580, avg true_objective: 4.080\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:32,154][15827] Num frames 1700...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:32,364][15827] Num frames 1800...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:32,596][15827] Avg episode rewards: #0: 4.176, true rewards: #0: 3.776\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:32,597][15827] Avg episode reward: 4.176, avg true_objective: 3.776\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:32,628][15827] Num frames 1900...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:32,853][15827] Num frames 2000...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:33,054][15827] Num frames 2100...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:33,264][15827] Num frames 2200...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:33,433][15827] Num frames 2300...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:33,602][15827] Avg episode rewards: #0: 4.393, true rewards: #0: 3.893\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:33,603][15827] Avg episode reward: 4.393, avg true_objective: 3.893\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:33,741][15827] Num frames 2400...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:33,951][15827] Num frames 2500...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:34,199][15827] Num frames 2600...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:34,376][15827] Num frames 2700...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:34,540][15827] Avg episode rewards: #0: 4.549, true rewards: #0: 3.977\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:34,541][15827] Avg episode reward: 4.549, avg true_objective: 3.977\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:34,566][15827] Num frames 2800...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:34,788][15827] Num frames 2900...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:34,990][15827] Num frames 3000...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,103][15827] Num frames 3100...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,292][15827] Num frames 3200...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:35,394][15827] Avg episode rewards: #0: 4.665, true rewards: #0: 4.040\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:35,396][15827] Avg episode reward: 4.665, avg true_objective: 4.040\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,502][15827] Num frames 3300...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,645][15827] Num frames 3400...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,752][15827] Num frames 3500...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:35,878][15827] Num frames 3600...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:35,951][15827] Avg episode rewards: #0: 4.573, true rewards: #0: 4.018\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:35,952][15827] Avg episode reward: 4.573, avg true_objective: 4.018\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:36,061][15827] Num frames 3700...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:36,168][15827] Num frames 3800...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:36,298][15827] Num frames 3900...\u001b[0m\n", + "\u001b[36m[2025-08-29 19:09:36,417][15827] Num frames 4000...\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:36,468][15827] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000\u001b[0m\n", + "\u001b[37m\u001b[1m[2025-08-29 19:09:36,469][15827] Avg episode reward: 4.500, avg true_objective: 4.000\u001b[0m\n", + "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n", + " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n", + " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n", + " libavutil 58. 29.100 / 58. 29.100\n", + " libavcodec 60. 31.102 / 60. 31.102\n", + " libavformat 60. 16.100 / 60. 16.100\n", + " libavdevice 60. 3.100 / 60. 3.100\n", + " libavfilter 9. 12.100 / 9. 12.100\n", + " libswscale 7. 5.100 / 7. 5.100\n", + " libswresample 4. 12.100 / 4. 12.100\n", + " libpostproc 57. 3.100 / 57. 3.100\n", + "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/sf2_mique/replay.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2mp41\n", + " encoder : Lavf59.27.100\n", + " Duration: 00:01:54.57, start: 0.000000, bitrate: 1373 kb/s\n", + " Stream #0:0[0x1](und): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 240x180 [SAR 1:1 DAR 4:3], 1372 kb/s, 35 fps, 35 tbr, 17920 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + "Stream mapping:\n", + " Stream #0:0 -> #0:0 (mpeg4 (native) -> h264 (libx264))\n", + "Press [q] to stop, [?] for help\n", + "[libx264 @ 0x55ba4d6002c0] using SAR=1/1\n", + "[libx264 @ 0x55ba4d6002c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n", + "[libx264 @ 0x55ba4d6002c0] profile High, level 1.3, 4:2:0, 8-bit\n", + "[libx264 @ 0x55ba4d6002c0] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n", + "Output #0, mp4, to '/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4':\n", + " Metadata:\n", + " major_brand : isom\n", + " minor_version : 512\n", + " compatible_brands: isomiso2mp41\n", + " encoder : Lavf60.16.100\n", + " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 240x180 [SAR 1:1 DAR 4:3], q=2-31, 35 fps, 17920 tbn (default)\n", + " Metadata:\n", + " handler_name : VideoHandler\n", + " vendor_id : [0][0][0][0]\n", + " encoder : Lavc60.31.102 libx264\n", + " Side data:\n", + " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n", + "[out#0/mp4 @ 0x55ba4d5e8500] video:5518kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.778290%\n", + "frame= 4010 fps=1218 q=-1.0 Lsize= 5561kB time=00:01:54.48 bitrate= 397.9kbits/s speed=34.8x \n", + "[libx264 @ 0x55ba4d6002c0] frame I:27 Avg QP:22.74 size: 5791\n", + "[libx264 @ 0x55ba4d6002c0] frame P:1674 Avg QP:25.95 size: 1916\n", + "[libx264 @ 0x55ba4d6002c0] frame B:2309 Avg QP:28.24 size: 990\n", + "[libx264 @ 0x55ba4d6002c0] consecutive B-frames: 19.7% 7.2% 9.8% 63.2%\n", + "[libx264 @ 0x55ba4d6002c0] mb I I16..4: 13.1% 76.4% 10.4%\n", + "[libx264 @ 0x55ba4d6002c0] mb P I16..4: 2.7% 9.7% 3.2% P16..4: 41.7% 24.5% 10.6% 0.0% 0.0% skip: 7.7%\n", + "[libx264 @ 0x55ba4d6002c0] mb B I16..4: 0.2% 1.8% 1.2% B16..8: 45.8% 14.1% 3.3% direct: 6.6% skip:27.1% L0:51.0% L1:38.8% BI:10.2%\n", + "[libx264 @ 0x55ba4d6002c0] 8x8 transform intra:62.0% inter:65.3%\n", + "[libx264 @ 0x55ba4d6002c0] coded y,uvDC,uvAC intra: 60.9% 71.5% 40.1% inter: 35.6% 12.3% 2.4%\n", + "[libx264 @ 0x55ba4d6002c0] i16 v,h,dc,p: 62% 5% 32% 1%\n", + "[libx264 @ 0x55ba4d6002c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 9% 34% 4% 4% 4% 6% 5% 7%\n", + "[libx264 @ 0x55ba4d6002c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 66% 5% 10% 3% 4% 3% 4% 2% 3%\n", + "[libx264 @ 0x55ba4d6002c0] i8c dc,h,v,p: 56% 18% 23% 2%\n", + "[libx264 @ 0x55ba4d6002c0] Weighted P-Frames: Y:9.7% UV:0.7%\n", + "[libx264 @ 0x55ba4d6002c0] ref P L0: 61.8% 15.0% 13.9% 8.1% 1.2%\n", + "[libx264 @ 0x55ba4d6002c0] ref B L0: 85.0% 11.8% 3.2%\n", + "[libx264 @ 0x55ba4d6002c0] ref B L1: 95.5% 4.5%\n", + "[libx264 @ 0x55ba4d6002c0] kb/s:394.53\n", + "\u001b[36m[2025-08-29 19:09:41,440][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!\u001b[0m\n" + ] + } + ], "source": [ "from sample_factory.enjoy import enjoy\n", "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=10\"], evaluation=True)\n", @@ -417,7 +799,7 @@ "from base64 import b64encode\n", "from IPython.display import HTML\n", "\n", - "mp4 = open('/content/train_dir/default_experiment/replay.mp4','rb').read()\n", + "mp4 = open('/train_dir/default_experiment/replay.mp4','rb').read()\n", "data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n", "HTML(\"\"\"\n", "