diff --git a/notebooks/bonus-unit1/bonus-unit1.ipynb b/notebooks/bonus-unit1/bonus-unit1.ipynb
index 93db85a..5725765 100644
--- a/notebooks/bonus-unit1/bonus-unit1.ipynb
+++ b/notebooks/bonus-unit1/bonus-unit1.ipynb
@@ -199,9 +199,17 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Python 3.10.11\n"
+ ]
+ }
+ ],
"source": [
"# Colab's Current Python Version (Incompatible with ML-Agents)\n",
"!python --version"
@@ -600,7 +608,7 @@
},
"outputs": [],
"source": [
- "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy2\" --repo-id=\"ThomasSimonini/ppo-Huggy\" --commit-message=\"Huggy\""
+ "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy\" --repo-id=\"turbo-maikol/rl-course-bu1\" --commit-message=\"Huggy\""
]
},
{
@@ -691,11 +699,21 @@
},
"gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "rl-env-bu1",
+ "language": "python",
"name": "python3"
},
"language_info": {
- "name": "python"
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.11"
}
},
"nbformat": 4,
diff --git a/notebooks/bonus-unit1/bonus_unit1.ipynb b/notebooks/bonus-unit1/bonus_unit1.ipynb
deleted file mode 100644
index a85452b..0000000
--- a/notebooks/bonus-unit1/bonus_unit1.ipynb
+++ /dev/null
@@ -1,695 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "view-in-github"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "2D3NL_e4crQv"
- },
- "source": [
- "# Bonus Unit 1: Let's train Huggy the Dog 🐶 to fetch a stick"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "FMYrDriDujzX"
- },
- "source": [
- "
\n",
- "\n",
- "In this notebook, we'll reinforce what we learned in the first Unit by **teaching Huggy the Dog to fetch the stick and then play with it directly in your browser**\n",
- "\n",
- "⬇️ Here is an example of what **you will achieve at the end of the unit.** ⬇️ (launch ▶ to see)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "PnVhs1yYNyUF"
- },
- "outputs": [],
- "source": [
- "%%html\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "x7oR6R-ZIbeS"
- },
- "source": [
- "### The environment 🎮\n",
- "\n",
- "- Huggy the Dog, an environment created by [Thomas Simonini](https://twitter.com/ThomasSimonini) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit)\n",
- "\n",
- "### The library used 📚\n",
- "\n",
- "- [MLAgents](https://github.com/Unity-Technologies/ml-agents)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "60yACvZwO0Cy"
- },
- "source": [
- "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "Oks-ETYdO2Dc"
- },
- "source": [
- "## Objectives of this notebook 🏆\n",
- "\n",
- "At the end of the notebook, you will:\n",
- "\n",
- "- Understand **the state space, action space and reward function used to train Huggy**.\n",
- "- **Train your own Huggy** to fetch the stick.\n",
- "- Be able to play **with your trained Huggy directly in your browser**.\n",
- "\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "mUlVrqnBv2o1"
- },
- "source": [
- "## This notebook is from Deep Reinforcement Learning Course\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "pAMjaQpHwB_s"
- },
- "source": [
- "In this free course, you will:\n",
- "\n",
- "- 📖 Study Deep Reinforcement Learning in **theory and practice**.\n",
- "- 🧑💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.\n",
- "- 🤖 Train **agents in unique environments**\n",
- "\n",
- "And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course\n",
- "\n",
- "Don’t forget to **sign up to the course** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**\n",
- "\n",
- "\n",
- "The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "6r7Hl0uywFSO"
- },
- "source": [
- "## Prerequisites 🏗️\n",
- "\n",
- "Before diving into the notebook, you need to:\n",
- "\n",
- "🔲 📚 **Develop an understanding of the foundations of Reinforcement learning** (MC, TD, Rewards hypothesis...) by doing Unit 1\n",
- "\n",
- "🔲 📚 **Read the introduction to Huggy** by doing Bonus Unit 1"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "DssdIjk_8vZE"
- },
- "source": [
- "## Set the GPU 💪\n",
- "- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
- "\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "sTfCXHy68xBv"
- },
- "source": [
- "- `Hardware Accelerator > GPU`\n",
- "\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Clone the repository 🔽\n",
- "\n",
- "- We need to clone the repository, that contains **ML-Agents.**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "%%capture\n",
- "# Clone the repository (can take 3min)\n",
- "!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Setup the Virtual Environment 🔽\n",
- "- In order for the **ML-Agents** to run successfully in Colab, Colab's Python version must meet the library's Python requirements.\n",
- "\n",
- "- We can check for the supported Python version under the `python_requires` parameter in the `setup.py` files. These files are required to set up the **ML-Agents** library for use and can be found in the following locations:\n",
- " - `/content/ml-agents/ml-agents/setup.py`\n",
- " - `/content/ml-agents/ml-agents-envs/setup.py`\n",
- "\n",
- "- Colab's Current Python version(can be checked using `!python --version`) doesn't match the library's `python_requires` parameter, as a result installation may silently fail and lead to errors like these, when executing the same commands later:\n",
- " - `/bin/bash: line 1: mlagents-learn: command not found`\n",
- " - `/bin/bash: line 1: mlagents-push-to-hf: command not found`\n",
- "\n",
- "- To resolve this, we'll create a virtual environment with a Python version compatible with the **ML-Agents** library.\n",
- "\n",
- "`Note:` *For future compatibility, always check the `python_requires` parameter in the installation files and set your virtual environment to the maximum supported Python version in the given below script if the Colab's Python version is not compatible*"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Colab's Current Python Version (Incompatible with ML-Agents)\n",
- "!python --version"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Install virtualenv and create a virtual environment\n",
- "!pip install virtualenv\n",
- "!virtualenv myenv\n",
- "\n",
- "# Download and install Miniconda\n",
- "!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n",
- "!chmod +x Miniconda3-latest-Linux-x86_64.sh\n",
- "!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local\n",
- "\n",
- "# Activate Miniconda and install Python ver 3.10.12\n",
- "!source /usr/local/bin/activate\n",
- "!conda install -q -y --prefix /usr/local python=3.10.12 ujson # Specify the version here\n",
- "\n",
- "# Set environment variables for Python and conda paths\n",
- "!export PYTHONPATH=/usr/local/lib/python3.10/site-packages/\n",
- "!export CONDA_PREFIX=/usr/local/envs/myenv"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Python Version in New Virtual Environment (Compatible with ML-Agents)\n",
- "!python --version"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Installing the dependencies 🔽"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "%%capture\n",
- "# Go inside the repository and install the package (can take 3min)\n",
- "%cd ml-agents\n",
- "!pip3 install -e ./ml-agents-envs\n",
- "!pip3 install -e ./ml-agents"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "HRY5ufKUKfhI"
- },
- "source": [
- "## Download and move the environment zip file in `./trained-envs-executables/linux/`\n",
- "\n",
- "- Our environment executable is in a zip file.\n",
- "- We need to download it and place it to `./trained-envs-executables/linux/`"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "C9Ls6_6eOKiA"
- },
- "outputs": [],
- "source": [
- "!mkdir ./trained-envs-executables\n",
- "!mkdir ./trained-envs-executables/linux"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "IHh_LXsRrrbM"
- },
- "source": [
- "We downloaded the file Huggy.zip from https://github.com/huggingface/Huggy using `wget`"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "8xNAD1tRpy0_"
- },
- "outputs": [],
- "source": [
- "!wget \"https://github.com/huggingface/Huggy/raw/main/Huggy.zip\" -O ./trained-envs-executables/linux/Huggy.zip"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "8FPx0an9IAwO"
- },
- "outputs": [],
- "source": [
- "%%capture\n",
- "!unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "nyumV5XfPKzu"
- },
- "source": [
- "Make sure your file is accessible"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "EdFsLJ11JvQf"
- },
- "outputs": [],
- "source": [
- "!chmod -R 755 ./trained-envs-executables/linux/Huggy"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "dYKVj8yUvj55"
- },
- "source": [
- "## Let's recap how this environment works\n",
- "\n",
- "### The State Space: what Huggy \"perceives.\"\n",
- "\n",
- "Huggy doesn't \"see\" his environment. Instead, we provide him information about the environment:\n",
- "\n",
- "- The target (stick) position\n",
- "- The relative position between himself and the target\n",
- "- The orientation of his legs.\n",
- "\n",
- "Given all this information, Huggy **can decide which action to take next to fulfill his goal**.\n",
- "\n",
- "
\n",
- "\n",
- "\n",
- "### The Action Space: what moves Huggy can do\n",
- "
\n",
- "\n",
- "**Joint motors drive huggy legs**. It means that to get the target, Huggy needs to **learn to rotate the joint motors of each of his legs correctly so he can move**.\n",
- "\n",
- "### The Reward Function\n",
- "\n",
- "The reward function is designed so that **Huggy will fulfill his goal** : fetch the stick.\n",
- "\n",
- "Remember that one of the foundations of Reinforcement Learning is the *reward hypothesis*: a goal can be described as the **maximization of the expected cumulative reward**.\n",
- "\n",
- "Here, our goal is that Huggy **goes towards the stick but without spinning too much**. Hence, our reward function must translate this goal.\n",
- "\n",
- "Our reward function:\n",
- "\n",
- "
\n",
- "\n",
- "- *Orientation bonus*: we **reward him for getting close to the target**.\n",
- "- *Time penalty*: a fixed-time penalty given at every action to **force him to get to the stick as fast as possible**.\n",
- "- *Rotation penalty*: we penalize Huggy if **he spins too much and turns too quickly**.\n",
- "- *Getting to the target reward*: we reward Huggy for **reaching the target**."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "NAuEq32Mwvtz"
- },
- "source": [
- "## Create the Huggy config file\n",
- "\n",
- "- In ML-Agents, you define the **training hyperparameters into config.yaml files.**\n",
- "\n",
- "- For the scope of this notebook, we're not going to modify the hyperparameters, but if you want to try as an experiment, you should also try to modify some other hyperparameters, Unity provides very [good documentation explaining each of them here](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md).\n",
- "\n",
- "- But we need to create a config file for Huggy.\n",
- "\n",
- " - To do that click on Folder logo on the left of your screen.\n",
- "\n",
- "
\n",
- "\n",
- " - Go to `/content/ml-agents/config/ppo`\n",
- " - Right mouse click and create a new file called `Huggy.yaml`\n",
- "\n",
- "
\n",
- "\n",
- "- Copy and paste the content below 🔽"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "loQ0N5jhXW71"
- },
- "outputs": [],
- "source": [
- "behaviors:\n",
- " Huggy:\n",
- " trainer_type: ppo\n",
- " hyperparameters:\n",
- " batch_size: 2048\n",
- " buffer_size: 20480\n",
- " learning_rate: 0.0003\n",
- " beta: 0.005\n",
- " epsilon: 0.2\n",
- " lambd: 0.95\n",
- " num_epoch: 3\n",
- " learning_rate_schedule: linear\n",
- " network_settings:\n",
- " normalize: true\n",
- " hidden_units: 512\n",
- " num_layers: 3\n",
- " vis_encode_type: simple\n",
- " reward_signals:\n",
- " extrinsic:\n",
- " gamma: 0.995\n",
- " strength: 1.0\n",
- " checkpoint_interval: 200000\n",
- " keep_checkpoints: 15\n",
- " max_steps: 2e6\n",
- " time_horizon: 1000\n",
- " summary_freq: 50000"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "oakN7UHwXdCX"
- },
- "source": [
- "- Don't forget to save the file!"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "r9wv5NYGw-05"
- },
- "source": [
- "- **In the case you want to modify the hyperparameters**, in Google Colab notebook, you can click here to open the config.yaml: `/content/ml-agents/config/ppo/Huggy.yaml`\n",
- "\n",
- "- For instance **if you want to save more models during the training** (for now, we save every 200,000 training timesteps). You need to modify:\n",
- " - `checkpoint_interval`: The number of training timesteps collected between each checkpoint.\n",
- " - `keep_checkpoints`: The maximum number of model checkpoints to keep.\n",
- "\n",
- "=> Just keep in mind that **decreasing the `checkpoint_interval` means more models to upload to the Hub and so a longer uploading time**\n",
- "We’re now ready to train our agent 🔥."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "f9fI555bO12v"
- },
- "source": [
- "## Train our agent\n",
- "\n",
- "To train our agent, we just need to **launch mlagents-learn and select the executable containing the environment.**\n",
- "\n",
- "
\n",
- "\n",
- "With ML Agents, we run a training script. We define four parameters:\n",
- "\n",
- "1. `mlagents-learn `: the path where the hyperparameter config file is.\n",
- "2. `--env`: where the environment executable is.\n",
- "3. `--run-id`: the name you want to give to your training run id.\n",
- "4. `--no-graphics`: to not launch the visualization during the training.\n",
- "\n",
- "Train the model and use the `--resume` flag to continue training in case of interruption.\n",
- "\n",
- "> It will fail first time when you use `--resume`, try running the block again to bypass the error.\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "lN32oWF8zPjs"
- },
- "source": [
- "The training will take 30 to 45min depending on your machine (don't forget to **set up a GPU**), go take a ☕️you deserve it 🤗."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "bS-Yh1UdHfzy"
- },
- "outputs": [],
- "source": [
- "!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id=\"Huggy2\" --no-graphics"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "5Vue94AzPy1t"
- },
- "source": [
- "## Push the agent to the 🤗 Hub\n",
- "\n",
- "- Now that we trained our agent, we’re **ready to push it to the Hub to be able to play with Huggy on your browser🔥.**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "izT6FpgNzZ6R"
- },
- "source": [
- "To be able to share your model with the community there are three more steps to follow:\n",
- "\n",
- "1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join\n",
- "\n",
- "2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.\n",
- "- Create a new token (https://huggingface.co/settings/tokens) **with write role**\n",
- "\n",
- "
\n",
- "\n",
- "- Copy the token\n",
- "- Run the cell below and paste the token"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "rKt2vsYoK56o"
- },
- "outputs": [],
- "source": [
- "from huggingface_hub import notebook_login\n",
- "notebook_login()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "ew59mK19zjtN"
- },
- "source": [
- "If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "Xi0y_VASRzJU"
- },
- "source": [
- "Then, we simply need to run `mlagents-push-to-hf`.\n",
- "\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "KK4fPfnczunT"
- },
- "source": [
- "And we define 4 parameters:\n",
- "\n",
- "1. `--run-id`: the name of the training run id.\n",
- "2. `--local-dir`: where the agent was saved, it’s results/, so in my case results/First Training.\n",
- "3. `--repo-id`: the name of the Hugging Face repo you want to create or update. It’s always /\n",
- "If the repo does not exist **it will be created automatically**\n",
- "4. `--commit-message`: since HF repos are git repository you need to define a commit message."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "dGEFAIboLVc6"
- },
- "outputs": [],
- "source": [
- "!mlagents-push-to-hf --run-id=\"HuggyTraining\" --local-dir=\"./results/Huggy2\" --repo-id=\"ThomasSimonini/ppo-Huggy\" --commit-message=\"Huggy\""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "yborB0850FTM"
- },
- "source": [
- "Else, if everything worked you should have this at the end of the process(but with a different url 😆) :\n",
- "\n",
- "\n",
- "\n",
- "```\n",
- "Your model is pushed to the hub. You can view your model here: https://huggingface.co/ThomasSimonini/ppo-Huggy\n",
- "```\n",
- "\n",
- "It’s the link to your model repository. The repository contains a model card that explains how to use the model, your Tensorboard logs and your config file. **What’s awesome is that it’s a git repository, which means you can have different commits, update your repository with a new push, open Pull Requests, etc.**\n",
- "\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "5Uaon2cg0NrL"
- },
- "source": [
- "But now comes the best: **being able to play with Huggy online 👀.**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "VMc4oOsE0QiZ"
- },
- "source": [
- "## Play with your Huggy 🐕\n",
- "\n",
- "This step is the simplest:\n",
- "\n",
- "- Open the game Huggy in your browser: https://huggingface.co/spaces/ThomasSimonini/Huggy\n",
- "\n",
- "- Click on Play with my Huggy model\n",
- "\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "Djs8c5rR0Z8a"
- },
- "source": [
- "1. In step 1, choose your model repository which is the model id (in my case ThomasSimonini/ppo-Huggy).\n",
- "\n",
- "2. In step 2, **choose what model you want to replay**:\n",
- " - I have multiple ones, since we saved a model every 500000 timesteps.\n",
- " - But since I want the more recent, I choose `Huggy.onnx`\n",
- "\n",
- "👉 What’s nice **is to try with different models steps to see the improvement of the agent.**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "PI6dPWmh064H"
- },
- "source": [
- "Congrats on finishing this bonus unit!\n",
- "\n",
- "You can now sit and enjoy playing with your Huggy 🐶. And don't **forget to spread the love by sharing Huggy with your friends 🤗**. And if you share about it on social media, **please tag us @huggingface and me @simoninithomas**\n",
- "\n",
- "
\n",
- "\n",
- "\n",
- "## Keep Learning, Stay awesome 🤗"
- ]
- }
- ],
- "metadata": {
- "accelerator": "GPU",
- "colab": {
- "include_colab_link": true,
- "private_outputs": true,
- "provenance": []
- },
- "gpuClass": "standard",
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- },
- "language_info": {
- "name": "python"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/notebooks/unit1/unit1.ipynb b/notebooks/unit1/unit1.ipynb
index 06d62b0..3605d63 100644
--- a/notebooks/unit1/unit1.ipynb
+++ b/notebooks/unit1/unit1.ipynb
@@ -284,11 +284,21 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {
"id": "BE5JWP5rQIKf"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "KeyboardInterrupt\n",
+ "\n"
+ ]
+ }
+ ],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
@@ -316,11 +326,24 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 2,
"metadata": {
"id": "cygWLPGsEQ0m"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.\n",
+ "Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.\n",
+ "Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.\n",
+ "See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit1/venv-u1/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+ " from .autonotebook import tqdm as notebook_tqdm\n"
+ ]
+ }
+ ],
"source": [
"import gymnasium\n",
"\n",
@@ -353,7 +376,7 @@
"\n",
"Let's look at an example, but first let's recall the RL loop.\n",
"\n",
- "
"
+ "
"
]
},
{
@@ -396,11 +419,59 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"metadata": {
"id": "w7vOFlpA_ONz"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Action taken: 2\n",
+ " - reward: 3.0692112001439513\n",
+ "Action taken: 3\n",
+ " - reward: -2.0283326535021318\n",
+ "Action taken: 0\n",
+ " - reward: -2.013109629392062\n",
+ "Action taken: 1\n",
+ " - reward: -1.8387614642986694\n",
+ "Action taken: 3\n",
+ " - reward: -1.9646071472228346\n",
+ "Action taken: 2\n",
+ " - reward: 1.724712789874087\n",
+ "Action taken: 3\n",
+ " - reward: -2.0772821745045733\n",
+ "Action taken: 3\n",
+ " - reward: -2.263394443942046\n",
+ "Action taken: 2\n",
+ " - reward: 1.03422570110298\n",
+ "Action taken: 1\n",
+ " - reward: -1.9686919634781634\n",
+ "Action taken: 0\n",
+ " - reward: -1.880365204866706\n",
+ "Action taken: 3\n",
+ " - reward: -2.1378125038369533\n",
+ "Action taken: 2\n",
+ " - reward: 0.23407670781683693\n",
+ "Action taken: 0\n",
+ " - reward: -2.0440816329574147\n",
+ "Action taken: 0\n",
+ " - reward: -1.9836184981424765\n",
+ "Action taken: 2\n",
+ " - reward: 1.1548347711850055\n",
+ "Action taken: 1\n",
+ " - reward: -1.7956347801317054\n",
+ "Action taken: 0\n",
+ " - reward: -1.7729850216231284\n",
+ "Action taken: 2\n",
+ " - reward: 1.9191545079788284\n",
+ "Action taken: 3\n",
+ " - reward: -2.0884827451743875\n",
+ "Total reward: -18.720944184971565\n"
+ ]
+ }
+ ],
"source": [
"import gymnasium as gym\n",
"\n",
@@ -410,6 +481,7 @@
"# Then we reset this environment\n",
"observation, info = env.reset()\n",
"\n",
+ "total_reward = 0\n",
"for _ in range(20):\n",
" # Take a random action\n",
" action = env.action_space.sample()\n",
@@ -418,13 +490,16 @@
" # Do this action in the environment and get\n",
" # next_state, reward, terminated, truncated and info\n",
" observation, reward, terminated, truncated, info = env.step(action)\n",
- "\n",
+ " print(f\" - reward: {reward}\")\n",
+ " total_reward += reward\n",
" # If the game is terminated (in our case we land, crashed) or truncated (timeout)\n",
+ " \n",
" if terminated or truncated:\n",
" # Reset the environment\n",
" print(\"Environment is reset\")\n",
" observation, info = env.reset()\n",
"\n",
+ "print(\"Total reward:\", total_reward)\n",
"env.close()"
]
},
@@ -450,6 +525,29 @@
"---\n"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The state is an 8-dimensional vector: the coordinates of the lander in x & y, its linear velocities in x & y, its angle, its angular velocity, and two booleans that represent whether each leg is in contact with the ground or not.\n",
+ "\n",
+ "```\n",
+ "Box([ -2.5 -2.5 -10. -10. -6.2831855 -10. -0. -0. ], [ 2.5 2.5 10. 10. 6.2831855 10. 1. 1. ], (8,), float32)\n",
+ "Box(\n",
+ "[ \n",
+ " x -2.5 y -2.5 \n",
+ " vx -10. vy -10. \n",
+ " lv -6.2831855 \n",
+ " av -10. \n",
+ " ll 1. \n",
+ " rl 1. \n",
+ "],\n",
+ "size (8,)\n",
+ ", float32)\n",
+ "```\n",
+ "\n"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {
@@ -461,11 +559,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 4,
"metadata": {
"id": "ZNPG0g_UGCfh"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "_____OBSERVATION SPACE_____ \n",
+ "\n",
+ "Observation Space Shape (8,)\n",
+ "Sample observation [ 53.21532 -87.118256 -0.84611297 3.4404945 0.7532178\n",
+ " 2.645675 0.9980984 0.40649492]\n"
+ ]
+ }
+ ],
"source": [
"# We create our environment with gym.make(\"\")\n",
"env = gym.make(\"LunarLander-v2\")\n",
@@ -494,11 +604,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 5,
"metadata": {
"id": "We5WqOBGLoSm"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " _____ACTION SPACE_____ \n",
+ "\n",
+ "Action Space Shape 4\n",
+ "Action Space Sample 3\n"
+ ]
+ }
+ ],
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"Action Space Shape\", env.action_space.n)\n",
@@ -549,7 +671,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 6,
"metadata": {
"id": "99hqQ_etEy1N"
},
@@ -629,16 +751,24 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 7,
"metadata": {
"id": "nxI6hT1GE4-A"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using cuda device\n"
+ ]
+ }
+ ],
"source": [
"# TODO: Define a PPO MlpPolicy architecture\n",
"# We use MultiLayerPerceptron (MLPPolicy) because the input is a vector,\n",
"# if we had frames as input we would use CnnPolicy\n",
- "model ="
+ "model = PPO(\"MlpPolicy\", env=env, verbose=1)"
]
},
{
@@ -652,11 +782,19 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 8,
"metadata": {
"id": "543OHYDfcjK4"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using cuda device\n"
+ ]
+ }
+ ],
"source": [
"# SOLUTION\n",
"# We added some parameters to accelerate the training\n",
@@ -669,7 +807,9 @@
" gamma = 0.999,\n",
" gae_lambda = 0.98,\n",
" ent_coef = 0.01,\n",
- " verbose=1)"
+ " verbose=1\n",
+ ")\n",
+ "model_name = \"ppo-LunarLander-v2\"\n"
]
},
{
@@ -685,16 +825,16 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 14,
"metadata": {
"id": "qKnYkNiVp89p"
},
"outputs": [],
"source": [
"# TODO: Train it for 1,000,000 timesteps\n",
- "\n",
+ "model.learn(total_timesteps=1_000_000)\n",
"# TODO: Specify file name for model and save the model to file\n",
- "model_name = \"ppo-LunarLander-v2\"\n"
+ "model.save(model_name)"
]
},
{
@@ -741,21 +881,238 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 9,
"metadata": {
"id": "yRpno0glsADy"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean reward: 252.266642 +/- 34.661009936952354\n"
+ ]
+ }
+ ],
"source": [
"# TODO: Evaluate the agent\n",
"# Create a new environment for evaluation\n",
- "eval_env =\n",
+ "eval_env = Monitor(gym.make(\"LunarLander-v2\", render_mode='rgb_array'))\n",
"\n",
+ "# Load model\n",
+ "model = PPO.load(model_name, env=eval_env)\n",
"# Evaluate the model with 10 evaluation episodes and deterministic=True\n",
- "mean_reward, std_reward =\n",
+ "mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10, deterministic=True)\n",
"\n",
"# Print the results\n",
- "\n"
+ "print(f\"Mean reward: {mean_reward} +/- {std_reward}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 250\n",
+ " - frame: 500\n",
+ " - frame: 750\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 1000\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 250\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 250\n",
+ " - frame: 500\n",
+ " - frame: 750\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 1000\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 250\n",
+ " - frame: 250\n",
+ " - frame: 500\n",
+ " - frame: 750\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 1000\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - frame: 250\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ }
+ ],
+ "source": [
+ "from stable_baselines3.common.vec_env import DummyVecEnv\n",
+ "from stable_baselines3.common.monitor import Monitor\n",
+ "import imageio\n",
+ "import gym\n",
+ "import numpy as np\n",
+ "np.bool8 = np.bool_\n",
+ "\n",
+ "for i in range(30):\n",
+ " eval_env = DummyVecEnv([lambda: Monitor(gym.make(\"LunarLander-v2\", render_mode=\"rgb_array\"))])\n",
+ "\n",
+ " frames = []\n",
+ " obs = eval_env.reset()\n",
+ " done = False\n",
+ " while not done:\n",
+ " action, _ = model.predict(obs, deterministic=False)\n",
+ " obs, reward, done, info = eval_env.step(action)\n",
+ " done = done[0] # VecEnv returns a list\n",
+ " img = eval_env.envs[0].render() # returns RGB array\n",
+ " frames.append(img)\n",
+ " if len(frames) % 250 == 0:\n",
+ " print(f\" - frame: {len(frames)}\")\n",
+ "\n",
+ " imageio.mimsave(f'lunarlander_run-{i}.mp4', frames, fps=30)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (600, 400) to (608, 400) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "ename": "",
+ "evalue": "",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n",
+ "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n",
+ "\u001b[1;31mClick here for more info. \n",
+ "\u001b[1;31mView Jupyter log for further details."
+ ]
+ }
+ ],
+ "source": [
+ "imageio.mimsave('lunarlander_run.mp4', frames, fps=30)\n"
]
},
{
@@ -777,7 +1134,7 @@
"source": [
"#@title\n",
"eval_env = Monitor(gym.make(\"LunarLander-v2\", render_mode='rgb_array'))\n",
- "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)\n",
+ "mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True, )\n",
"print(f\"mean_reward={mean_reward:.2f} +/- {std_reward}\")"
]
},
@@ -889,12 +1246,248 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 19,
"metadata": {
"id": "JPG7ofdGIHN8"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n",
+ "create a model card and push everything to the hub. It might take up to 1min.\n",
+ "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n",
+ "Saving video to /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n",
+ "MoviePy - Building video /tmp/tmpztlifguu/-step-0-to-step-1000.mp4.\n",
+ "MoviePy - Writing video /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n",
+ " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n",
+ " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n",
+ " libavutil 58. 29.100 / 58. 29.100\n",
+ " libavcodec 60. 31.102 / 60. 31.102\n",
+ " libavformat 60. 16.100 / 60. 16.100\n",
+ " libavdevice 60. 3.100 / 60. 3.100\n",
+ " libavfilter 9. 12.100 / 9. 12.100\n",
+ " libswscale 7. 5.100 / 7. 5.100\n",
+ " libswresample 4. 12.100 / 4. 12.100\n",
+ " libpostproc 57. 3.100 / 57. 3.100\n",
+ "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmpztlifguu/-step-0-to-step-1000.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf61.1.100\n",
+ " Duration: 00:00:20.00, start: 0.000000, bitrate: 51 kb/s\n",
+ " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 600x400, 46 kb/s, 50 fps, 50 tbr, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc61.3.100 libx264\n",
+ "Stream mapping:\n",
+ " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n",
+ "Press [q] to stop, [?] for help\n",
+ "[libx264 @ 0x55b3ab1fa980] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n",
+ "[libx264 @ 0x55b3ab1fa980] profile High, level 3.1, 4:2:0, 8-bit\n",
+ "[libx264 @ 0x55b3ab1fa980] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n",
+ "Output #0, mp4, to '/tmp/tmp2etu86el/replay.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf60.16.100\n",
+ " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 600x400, q=2-31, 50 fps, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc60.31.102 libx264\n",
+ " Side data:\n",
+ " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n",
+ "frame= 0 fps=0.0 q=0.0 size= 0kB time=N/A bitrate=N/A speed=N/A \r"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MoviePy - Done !\n",
+ "MoviePy - video ready /tmp/tmpztlifguu/-step-0-to-step-1000.mp4\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "[out#0/mp4 @ 0x55b3ab12b880] video:110kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 11.302154%\n",
+ "frame= 1000 fps=0.0 q=-1.0 Lsize= 122kB time=00:00:19.94 bitrate= 50.1kbits/s speed=22.1x \n",
+ "[libx264 @ 0x55b3ab1fa980] frame I:4 Avg QP: 9.42 size: 2199\n",
+ "[libx264 @ 0x55b3ab1fa980] frame P:268 Avg QP:18.69 size: 158\n",
+ "[libx264 @ 0x55b3ab1fa980] frame B:728 Avg QP:20.11 size: 83\n",
+ "[libx264 @ 0x55b3ab1fa980] consecutive B-frames: 0.8% 4.2% 6.6% 88.4%\n",
+ "[libx264 @ 0x55b3ab1fa980] mb I I16..4: 92.1% 1.7% 6.2%\n",
+ "[libx264 @ 0x55b3ab1fa980] mb P I16..4: 0.1% 0.3% 0.1% P16..4: 1.1% 0.2% 0.1% 0.0% 0.0% skip:98.1%\n",
+ "[libx264 @ 0x55b3ab1fa980] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 1.7% 0.2% 0.0% direct: 0.0% skip:98.0% L0:55.7% L1:43.8% BI: 0.6%\n",
+ "[libx264 @ 0x55b3ab1fa980] 8x8 transform intra:15.7% inter:16.2%\n",
+ "[libx264 @ 0x55b3ab1fa980] coded y,uvDC,uvAC intra: 7.0% 9.7% 8.7% inter: 0.1% 0.2% 0.1%\n",
+ "[libx264 @ 0x55b3ab1fa980] i16 v,h,dc,p: 90% 5% 5% 0%\n",
+ "[libx264 @ 0x55b3ab1fa980] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 11% 4% 84% 0% 0% 0% 0% 0% 0%\n",
+ "[libx264 @ 0x55b3ab1fa980] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 15% 58% 2% 3% 1% 3% 1% 3%\n",
+ "[libx264 @ 0x55b3ab1fa980] i8c dc,h,v,p: 93% 3% 3% 0%\n",
+ "[libx264 @ 0x55b3ab1fa980] Weighted P-Frames: Y:0.0% UV:0.0%\n",
+ "[libx264 @ 0x55b3ab1fa980] ref P L0: 66.6% 1.8% 20.4% 11.1%\n",
+ "[libx264 @ 0x55b3ab1fa980] ref B L0: 68.4% 27.3% 4.2%\n",
+ "[libx264 @ 0x55b3ab1fa980] ref B L1: 93.3% 6.7%\n",
+ "[libx264 @ 0x55b3ab1fa980] kb/s:44.60\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Pushing repo turbo-maikol/rl-course-unit1 to the Hugging Face Hub\u001b[0m\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B \n",
+ "\u001b[A\n",
+ "Processing Files (1 / 1) : 0%| | 1.26kB / 408kB, 3.16kB/s \n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (5 / 5) : 100%|██████████| 408kB / 408kB, 255kB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (5 / 5) : 100%|██████████| 408kB / 408kB, 185kB/s \n",
+ "New Data Upload : 100%|██████████| 406kB / 406kB, 185kB/s \n",
+ " ...unarLander-v2/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n",
+ " ...LunarLander-v2/policy.optimizer.pth: 100%|██████████| 88.9kB / 88.9kB \n",
+ " ...u86el/ppo-LunarLander-v2/policy.pth: 100%|██████████| 44.1kB / 44.1kB \n",
+ " .../tmp2etu86el/ppo-LunarLander-v2.zip: 100%|██████████| 149kB / 149kB \n",
+ " /tmp/tmp2etu86el/replay.mp4 : 100%|██████████| 125kB / 125kB \n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n",
+ "https://huggingface.co/turbo-maikol/rl-course-unit1/tree/main/\u001b[0m\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/rl-course-unit1/commit/3de80d180623b404c50319ea857ba782dccad4c9', commit_message='Model trained with PPO on LunarLander-v2 for the DEEP RL huggingface course', commit_description='', oid='3de80d180623b404c50319ea857ba782dccad4c9', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/rl-course-unit1', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/rl-course-unit1'), pr_revision=None, pr_num=None)"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
+ "import os\n",
+ "from dotenv import load_dotenv\n",
+ "load_dotenv()\n",
+ "\n",
+ "\n",
"import gymnasium as gym\n",
"from stable_baselines3.common.vec_env import DummyVecEnv\n",
"from stable_baselines3.common.env_util import make_vec_env\n",
@@ -903,29 +1496,32 @@
"\n",
"## TODO: Define a repo_id\n",
"## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n",
- "repo_id =\n",
+ "repo_id = \"turbo-maikol/rl-course-unit1\"\n",
"\n",
"# TODO: Define the name of the environment\n",
- "env_id =\n",
+ "env_id = \"LunarLander-v2\"\n",
"\n",
"# Create the evaluation env and set the render_mode=\"rgb_array\"\n",
"eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode=\"rgb_array\"))])\n",
"\n",
"\n",
"# TODO: Define the model architecture we used\n",
- "model_architecture = \"\"\n",
+ "model_architecture = \"PPO\"\n",
"\n",
"## TODO: Define the commit message\n",
- "commit_message = \"\"\n",
+ "commit_message = \"Model trained with PPO on LunarLander-v2 for the DEEP RL huggingface course\"\n",
"\n",
"# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub\n",
- "package_to_hub(model=model, # Our trained model\n",
- " model_name=model_name, # The name of our trained model\n",
- " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
- " env_id=env_id, # Name of the environment\n",
- " eval_env=eval_env, # Evaluation Environment\n",
- " repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n",
- " commit_message=commit_message)"
+ "package_to_hub(\n",
+ " model=model, # Our trained model\n",
+ " model_name=model_name, # The name of our trained model\n",
+ " model_architecture=model_architecture, # The model architecture we used: in our case PPO\n",
+ " env_id=env_id, # Name of the environment\n",
+ " eval_env=eval_env, # Evaluation Environment\n",
+ " repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2\n",
+ " commit_message=commit_message,\n",
+ " token=os.getenv(\"HF_HUB_TOKEN\")\n",
+ ")"
]
},
{
@@ -1066,9 +1662,9 @@
"# 1. Install pickle5 (we done it at the beginning of the colab)\n",
"# 2. Create a custom empty object we pass as parameter to PPO.load()\n",
"custom_objects = {\n",
- " \"learning_rate\": 0.0,\n",
- " \"lr_schedule\": lambda _: 0.0,\n",
- " \"clip_range\": lambda _: 0.0,\n",
+ " \"learning_rate\": 0.0,\n",
+ " \"lr_schedule\": lambda _: 0.0, \n",
+ " \"clip_range\": lambda _: 0.0,\n",
"}\n",
"\n",
"checkpoint = load_from_hub(repo_id, filename)\n",
@@ -1163,18 +1759,21 @@
},
"gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3.9.7",
+ "display_name": "venv-u1",
"language": "python",
"name": "python3"
},
"language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
"name": "python",
- "version": "3.9.7"
- },
- "vscode": {
- "interpreter": {
- "hash": "ed7f8024e43d3b8f5ca3c5e1a8151ab4d136b3ecee1e3fd59e0766ccc55e1b10"
- }
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.3"
}
},
"nbformat": 4,
diff --git a/notebooks/unit2/unit2.ipynb b/notebooks/unit2/unit2.ipynb
index e9ae624..5df36f4 100644
--- a/notebooks/unit2/unit2.ipynb
+++ b/notebooks/unit2/unit2.ipynb
@@ -3,8 +3,8 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -36,6 +36,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "DPTBOv9HYLZ2"
+ },
"source": [
"###🎮 Environments:\n",
"\n",
@@ -48,10 +51,7 @@
"- [Gymnasium](https://gymnasium.farama.org/)\n",
"\n",
"We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
- ],
- "metadata": {
- "id": "DPTBOv9HYLZ2"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -72,14 +72,14 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "viNzVbVaYvY3"
+ },
"source": [
"## This notebook is from the Deep Reinforcement Learning Course\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "viNzVbVaYvY3"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -156,28 +156,31 @@
},
{
"cell_type": "markdown",
- "source": [
- "# Let's code our first Reinforcement Learning algorithm 🚀"
- ],
"metadata": {
"id": "HEtx8Y8MqKfH"
- }
+ },
+ "source": [
+ "# Let's code our first Reinforcement Learning algorithm 🚀"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Kdxb1IhzTn0v"
+ },
"source": [
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained Taxi model to the Hub and **get a result of >= 4.5**.\n",
"\n",
"To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
- ],
- "metadata": {
- "id": "Kdxb1IhzTn0v"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "4gpxC1_kqUYe"
+ },
"source": [
"## Install dependencies and create a virtual display 🔽\n",
"\n",
@@ -194,10 +197,7 @@
"The Hugging Face Hub 🤗 works as a central place where anyone can share and explore models and datasets. It has versioning, metrics, visualizations and other features that will allow you to easily collaborate with others.\n",
"\n",
"You can see here all the Deep RL models available (if they use Q Learning) here 👉 https://huggingface.co/models?other=q-learning"
- ],
- "metadata": {
- "id": "4gpxC1_kqUYe"
- }
+ ]
},
{
"cell_type": "code",
@@ -212,53 +212,53 @@
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "n71uTX7qqzz2"
+ },
+ "outputs": [],
"source": [
"!sudo apt-get update\n",
"!sudo apt-get install -y python3-opengl\n",
"!apt install ffmpeg xvfb\n",
"!pip3 install pyvirtualdisplay"
- ],
- "metadata": {
- "id": "n71uTX7qqzz2"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**"
- ],
"metadata": {
"id": "K6XC13pTfFiD"
- }
+ },
+ "source": [
+ "To make sure the new installed libraries are used, **sometimes it's required to restart the notebook runtime**. The next cell will force the **runtime to crash, so you'll need to connect again and run the code starting from here**. Thanks to this trick, **we will be able to run our virtual screen.**"
+ ]
},
{
"cell_type": "code",
- "source": [
- "import os\n",
- "os.kill(os.getpid(), 9)"
- ],
+ "execution_count": null,
"metadata": {
"id": "3kuZbWAkfHdg"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.kill(os.getpid(), 9)"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "DaY1N4dBrabi"
+ },
+ "outputs": [],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
- ],
- "metadata": {
- "id": "DaY1N4dBrabi"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -276,7 +276,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 7,
"metadata": {
"id": "VcNvOAQlysBJ"
},
@@ -287,10 +287,8 @@
"import random\n",
"import imageio\n",
"import os\n",
- "import tqdm\n",
"\n",
- "import pickle5 as pickle\n",
- "from tqdm.notebook import tqdm"
+ "import pickle5 as pickle"
]
},
{
@@ -354,14 +352,21 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 35,
"metadata": {
"id": "IzJnb8O3y8up"
},
"outputs": [],
"source": [
"# Create the FrozenLake-v1 environment using 4x4 map and non-slippery version and render_mode=\"rgb_array\"\n",
- "env = gym.make() # TODO use the correct parameters"
+ "\n",
+ "desc=[\n",
+ " \"SFFF\", \n",
+ " \"FHFH\", \n",
+ " \"FFFH\", \n",
+ " \"HFFG\"\n",
+ "]\n",
+ "env = gym.make(\"FrozenLake-v1\", map_name=\"4x4\", desc=desc, is_slippery=False, render_mode=\"rgb_array\") # TODO use the correct parameters"
]
},
{
@@ -411,11 +416,22 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 34,
"metadata": {
"id": "ZNPG0g_UGCfh"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "_____OBSERVATION SPACE_____ \n",
+ "\n",
+ "Observation Space Discrete(16)\n",
+ "Sample observation 0\n"
+ ]
+ }
+ ],
"source": [
"# We create our environment with gym.make(\"\")- `is_slippery=False`: The agent always moves in the intended direction due to the non-slippery nature of the frozen lake (deterministic).\n",
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
@@ -441,11 +457,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 10,
"metadata": {
"id": "We5WqOBGLoSm"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " _____ACTION SPACE_____ \n",
+ "\n",
+ "Action Space Shape 4\n",
+ "Action Space Sample 2\n"
+ ]
+ }
+ ],
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"Action Space Shape\", env.action_space.n)\n",
@@ -488,22 +516,31 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 11,
"metadata": {
"id": "y3ZCdluj3k0l"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "There are 16 possible states\n",
+ "There are 4 possible actions\n"
+ ]
+ }
+ ],
"source": [
- "state_space =\n",
+ "state_space = env.observation_space.n\n",
"print(\"There are \", state_space, \" possible states\")\n",
"\n",
- "action_space =\n",
+ "action_space = env.action_space.n\n",
"print(\"There are \", action_space, \" possible actions\")"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 12,
"metadata": {
"id": "rCddoOXM3UQH"
},
@@ -511,19 +548,47 @@
"source": [
"# Let's create our Qtable of size (state_space, action_space) and initialized each values at 0 using np.zeros. np.zeros needs a tuple (a,b)\n",
"def initialize_q_table(state_space, action_space):\n",
- " Qtable =\n",
+ " \"\"\"Is not a matrix, is an array and we can locate each game cell later with `current_row * ncols + current_col`\"\"\"\n",
+ " Qtable = np.zeros((state_space, action_space))\n",
" return Qtable"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 13,
"metadata": {
"id": "9YfvrqRt3jdR"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.]])"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "Qtable_frozenlake = initialize_q_table(state_space, action_space)"
+ "Qtable_frozenlake = initialize_q_table(state_space, action_space)\n",
+ "Qtable_frozenlake"
]
},
{
@@ -595,17 +660,30 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 14,
"metadata": {
"id": "E3SCLmLX5bWG"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "np.int64(0)"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"def greedy_policy(Qtable, state):\n",
" # Exploitation: take the action with the highest state, action value\n",
- " action =\n",
+ " action = np.argmax(Qtable[state])\n",
"\n",
- " return action"
+ " return action\n",
+ "\n",
+ "greedy_policy(Qtable_frozenlake, 2)"
]
},
{
@@ -638,7 +716,7 @@
"id": "flILKhBU3yZ7"
},
"source": [
- "##Define the epsilon-greedy policy 🤖\n",
+ "## Define the epsilon-greedy policy 🤖\n",
"\n",
"Epsilon-greedy is the training policy that handles the exploration/exploitation trade-off.\n",
"\n",
@@ -655,7 +733,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 15,
"metadata": {
"id": "6Bj7x3in3_Pq"
},
@@ -663,15 +741,15 @@
"source": [
"def epsilon_greedy_policy(Qtable, state, epsilon):\n",
" # Randomly generate a number between 0 and 1\n",
- " random_num =\n",
+ " random_num = np.random.random()\n",
" # if random_num > greater than epsilon --> exploitation\n",
" if random_num > epsilon:\n",
" # Take the action with the highest value given a state\n",
" # np.argmax can be useful here\n",
- " action =\n",
+ " action = greedy_policy(Qtable, state)\n",
" # else --> exploration\n",
" else:\n",
- " action = # Take a random action\n",
+ " action = env.action_space.sample() # np.random.randint(0, Qtable[state].size) # Take a random action\n",
"\n",
" return action"
]
@@ -724,7 +802,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 16,
"metadata": {
"id": "Y1tWn0tycWZ1"
},
@@ -778,12 +856,13 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 17,
"metadata": {
"id": "paOynXy3aoJW"
},
"outputs": [],
"source": [
+ "from tqdm import tqdm\n",
"def train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable):\n",
" for episode in tqdm(range(n_training_episodes)):\n",
" # Reduce epsilon (because we need less and less exploration)\n",
@@ -796,15 +875,16 @@
"\n",
" # repeat\n",
" for step in range(max_steps):\n",
- " # Choose the action At using epsilon greedy policy\n",
- " action =\n",
+ " # TODO: Choose the action At using epsilon greedy policy\n",
+ " action = epsilon_greedy_policy(Qtable, state, epsilon)\n",
"\n",
- " # Take action At and observe Rt+1 and St+1\n",
- " # Take the action (a) and observe the outcome state(s') and reward (r)\n",
- " new_state, reward, terminated, truncated, info =\n",
+ " # TODO: Take action At and observe Rt+1 and St+1\n",
+ " # TODO: Take the action (a) and observe the outcome state(s') and reward (r)\n",
+ " new_state, reward, terminated, truncated, info = env.step(action)\n",
"\n",
- " # Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]\n",
- " Qtable[state][action] =\n",
+ " # TODO: Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]\n",
+ " old_Qsa = Qtable[state][action]\n",
+ " Qtable[state][action] = old_Qsa + learning_rate * (reward + gamma * np.max(Qtable[new_state]) - old_Qsa)\n",
"\n",
" # If terminated or truncated finish the episode\n",
" if terminated or truncated:\n",
@@ -874,11 +954,19 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 18,
"metadata": {
"id": "DPBxfjJdTCOH"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 10000/10000 [00:00<00:00, 11230.14it/s]\n"
+ ]
+ }
+ ],
"source": [
"Qtable_frozenlake = train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable_frozenlake)"
]
@@ -894,11 +982,37 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 19,
"metadata": {
"id": "nmfchsTITw4q"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[0.73509189, 0.77378094, 0.77378094, 0.73509189],\n",
+ " [0.73509189, 0. , 0.81450625, 0.77378094],\n",
+ " [0.77378094, 0.857375 , 0.77378094, 0.81450625],\n",
+ " [0.81450625, 0. , 0.77378094, 0.77378094],\n",
+ " [0.77378094, 0.81450625, 0. , 0.73509189],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0.9025 , 0. , 0.81450625],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0.81450625, 0. , 0.857375 , 0.77378094],\n",
+ " [0.81450625, 0.9025 , 0.9025 , 0. ],\n",
+ " [0.857375 , 0.95 , 0. , 0.857375 ],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0.9025 , 0.95 , 0.857375 ],\n",
+ " [0.9025 , 0.95 , 1. , 0.9025 ],\n",
+ " [0. , 0. , 0. , 0. ]])"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"Qtable_frozenlake"
]
@@ -916,7 +1030,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 20,
"metadata": {
"id": "jNl0_JO2cbkm"
},
@@ -972,11 +1086,33 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 21,
"metadata": {
"id": "fAgB7s0HEFMm"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 100/100 [00:00<00:00, 12881.37it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean_reward=1.00 +/- 0.00\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
"source": [
"# Evaluate our Agent\n",
"mean_reward, std_reward = evaluate_agent(env, max_steps, n_eval_episodes, Qtable_frozenlake, eval_seed)\n",
@@ -1018,7 +1154,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 23,
"metadata": {
"id": "Jex3i9lZ8ksX"
},
@@ -1034,7 +1170,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 24,
"metadata": {
"id": "Qo57HBn3W74O"
},
@@ -1065,6 +1201,11 @@
},
{
"cell_type": "code",
+ "execution_count": 26,
+ "metadata": {
+ "id": "U4mdUTKkGnUd"
+ },
+ "outputs": [],
"source": [
"def push_to_hub(\n",
" repo_id, model, env, video_fps=1, local_repo_path=\"hub\"\n",
@@ -1194,12 +1335,7 @@
" )\n",
"\n",
" print(\"Your model is pushed to the Hub. You can view your model here: \", repo_url)"
- ],
- "metadata": {
- "id": "U4mdUTKkGnUd"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -1269,7 +1405,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 27,
"metadata": {
"id": "FiMqxqVHg0I4"
},
@@ -1311,24 +1447,153 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 28,
"metadata": {
"id": "5sBo2umnXpPd"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'env_id': 'FrozenLake-v1',\n",
+ " 'max_steps': 99,\n",
+ " 'n_training_episodes': 10000,\n",
+ " 'n_eval_episodes': 100,\n",
+ " 'eval_seed': [],\n",
+ " 'learning_rate': 0.7,\n",
+ " 'gamma': 0.95,\n",
+ " 'max_epsilon': 1.0,\n",
+ " 'min_epsilon': 0.05,\n",
+ " 'decay_rate': 0.0005,\n",
+ " 'qtable': array([[0.73509189, 0.77378094, 0.77378094, 0.73509189],\n",
+ " [0.73509189, 0. , 0.81450625, 0.77378094],\n",
+ " [0.77378094, 0.857375 , 0.77378094, 0.81450625],\n",
+ " [0.81450625, 0. , 0.77378094, 0.77378094],\n",
+ " [0.77378094, 0.81450625, 0. , 0.73509189],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0.9025 , 0. , 0.81450625],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0.81450625, 0. , 0.857375 , 0.77378094],\n",
+ " [0.81450625, 0.9025 , 0.9025 , 0. ],\n",
+ " [0.857375 , 0.95 , 0. , 0.857375 ],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0. , 0. , 0. ],\n",
+ " [0. , 0.9025 , 0.95 , 0.857375 ],\n",
+ " [0.9025 , 0.95 , 1. , 0.9025 ],\n",
+ " [0. , 0. , 0. , 0. ]])}"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"model"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 29,
"metadata": {
"id": "RpOTtSt83kPZ"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "e4d5c292dab14baa940d2ed46f0dd484",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Fetching 1 files: 0%| | 0/1 [00:00, ?it/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "d936b75185204c77bf43f7e60d55d730",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ ".gitattributes: 0.00B [00:00, ?B/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 100/100 [00:00<00:00, 15035.50it/s]\n",
+ "100%|██████████| 100/100 [00:00<00:00, 20026.28it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "False\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "b7cec9324f194da5a02d6aa4ff9f8bf0",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "735207614ab44f32b930ea8f15c2196e",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "New Data Upload : | | 0.00B / 0.00B "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "939cd7ae71654dd0b4ef8a2be93050a9",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ " ...d1efee86aa38377f5cc6/q-learning.pkl: 100%|##########| 915B / 915B "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Your model is pushed to the Hub. You can view your model here: https://huggingface.co/turbo-maikol/q-FrozenLake-v1-4x4-noSlippery\n"
+ ]
+ }
+ ],
"source": [
- "username = \"\" # FILL THIS\n",
+ "username = \"turbo-maikol\" # FILL THIS\n",
"repo_name = \"q-FrozenLake-v1-4x4-noSlippery\"\n",
"push_to_hub(\n",
" repo_id=f\"{username}/{repo_name}\",\n",
@@ -1373,7 +1638,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 56,
"metadata": {
"id": "gL0wpeO8gpej"
},
@@ -1393,11 +1658,19 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 37,
"metadata": {
"id": "_TPNaGSZrgqA"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "There are 500 possible states\n"
+ ]
+ }
+ ],
"source": [
"state_space = env.observation_space.n\n",
"print(\"There are \", state_space, \" possible states\")"
@@ -1405,11 +1678,19 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 38,
"metadata": {
"id": "CdeeZuokrhit"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "There are 6 possible actions\n"
+ ]
+ }
+ ],
"source": [
"action_space = env.action_space.n\n",
"print(\"There are \", action_space, \" possible actions\")"
@@ -1439,11 +1720,26 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 39,
"metadata": {
"id": "US3yDXnEtY9I"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0. 0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0. 0.]\n",
+ " ...\n",
+ " [0. 0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0. 0.]\n",
+ " [0. 0. 0. 0. 0. 0.]]\n",
+ "Q-table shape: (500, 6)\n"
+ ]
+ }
+ ],
"source": [
"# Create our Q table with state_size rows and action_size columns (500x6)\n",
"Qtable_taxi = initialize_q_table(state_space, action_space)\n",
@@ -1464,14 +1760,14 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 46,
"metadata": {
"id": "AB6n__hhg7YS"
},
"outputs": [],
"source": [
"# Training parameters\n",
- "n_training_episodes = 25000 # Total training episodes\n",
+ "n_training_episodes = 1_000_000 # Total training episodes\n",
"learning_rate = 0.7 # Learning rate\n",
"\n",
"# Evaluation parameters\n",
@@ -1484,14 +1780,14 @@
" # Each seed has a specific starting state\n",
"\n",
"# Environment parameters\n",
- "env_id = \"Taxi-v3\" # Name of the environment\n",
- "max_steps = 99 # Max steps per episode\n",
- "gamma = 0.95 # Discounting rate\n",
+ "env_id = \"Taxi-v3\" # Name of the environment\n",
+ "max_steps = 1000 # Max steps per episode\n",
+ "gamma = 0.90 # Discounting rate\n",
"\n",
"# Exploration parameters\n",
"max_epsilon = 1.0 # Exploration probability at start\n",
- "min_epsilon = 0.05 # Minimum exploration probability\n",
- "decay_rate = 0.005 # Exponential decay rate for exploration prob\n"
+ "min_epsilon = 0.05 # Minimum exploration probability\n",
+ "decay_rate = 0.001 # Exponential decay rate for exploration prob\n"
]
},
{
@@ -1505,11 +1801,41 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 47,
"metadata": {
"id": "WwP3Y2z2eS-K"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 1000000/1000000 [04:31<00:00, 3685.46it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "array([[ 0. , 0. , 0. , 0. , 0. ,\n",
+ " 0. ],\n",
+ " [-0.58568212, 0.4603532 , -0.58568212, 0.4603532 , 1.62261467,\n",
+ " -8.5396468 ],\n",
+ " [ 4.348907 , 5.94323 , 4.348907 , 5.94323 , 7.7147 ,\n",
+ " -3.05677 ],\n",
+ " ...,\n",
+ " [ 7.7147 , 9.683 , 7.7147 , 5.94323 , -1.28529994,\n",
+ " -1.28529997],\n",
+ " [ 1.62261485, 2.9140163 , 1.62261467, 2.9140163 , -7.37738434,\n",
+ " -7.37738533],\n",
+ " [14.3 , 11.87 , 14.3 , 17. , 5.3 ,\n",
+ " 5.3 ]], shape=(500, 6))"
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"Qtable_taxi = train(n_training_episodes, min_epsilon, max_epsilon, decay_rate, env, max_steps, Qtable_taxi)\n",
"Qtable_taxi"
@@ -1528,7 +1854,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 62,
"metadata": {
"id": "0a1FpE_3hNYr"
},
@@ -1554,14 +1880,108 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 57,
"metadata": {
"id": "dhQtiQozhOn1"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "9af9e29c591e4d18b2095650f25420f1",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Fetching 5 files: 0%| | 0/5 [00:00, ?it/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 100/100 [00:00<00:00, 5536.60it/s]\n",
+ "100%|██████████| 100/100 [00:00<00:00, 3498.34it/s]\n",
+ "IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (550, 350) to (560, 352) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "True\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "086883edd0094686bbdc96d55b16e223",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "f7558465d2ba49ac921286f9b8bf3488",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "New Data Upload : | | 0.00B / 0.00B "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "e66c844d130a4596a3998d30f0af6754",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ " ...01fbfd9e2ed75397e4c7/q-learning.pkl: 100%|##########| 24.6kB / 24.6kB "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "b8c2768b25794da980f7d1264b32ba1f",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ " ...59e101fbfd9e2ed75397e4c7/replay.mp4: 100%|##########| 117kB / 117kB "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Your model is pushed to the Hub. You can view your model here: https://huggingface.co/turbo-maikol/rl-course-unit2\n"
+ ]
+ }
+ ],
"source": [
- "username = \"\" # FILL THIS\n",
- "repo_name = \"\" # FILL THIS\n",
+ "username = \"turbo-maikol\" # FILL THIS\n",
+ "repo_name = \"rl-course-unit2\" # FILL THIS\n",
"push_to_hub(\n",
" repo_id=f\"{username}/{repo_name}\",\n",
" model=model,\n",
@@ -1620,7 +2040,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 60,
"metadata": {
"id": "Eo8qEzNtCaVI"
},
@@ -1660,13 +2080,50 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 61,
"metadata": {
"id": "JUm9lz2gCQcU"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'env_id': 'Taxi-v3', 'max_steps': 1000, 'n_training_episodes': 1000000, 'n_eval_episodes': 100, 'eval_seed': [16, 54, 165, 177, 191, 191, 120, 80, 149, 178, 48, 38, 6, 125, 174, 73, 50, 172, 100, 148, 146, 6, 25, 40, 68, 148, 49, 167, 9, 97, 164, 176, 61, 7, 54, 55, 161, 131, 184, 51, 170, 12, 120, 113, 95, 126, 51, 98, 36, 135, 54, 82, 45, 95, 89, 59, 95, 124, 9, 113, 58, 85, 51, 134, 121, 169, 105, 21, 30, 11, 50, 65, 12, 43, 82, 145, 152, 97, 106, 55, 31, 85, 38, 112, 102, 168, 123, 97, 21, 83, 158, 26, 80, 63, 5, 81, 32, 11, 28, 148], 'learning_rate': 0.7, 'gamma': 0.9, 'max_epsilon': 1.0, 'min_epsilon': 0.05, 'decay_rate': 0.001, 'qtable': array([[ 0. , 0. , 0. , 0. , 0. ,\n",
+ " 0. ],\n",
+ " [-0.58568212, 0.4603532 , -0.58568212, 0.4603532 , 1.62261467,\n",
+ " -8.5396468 ],\n",
+ " [ 4.348907 , 5.94323 , 4.348907 , 5.94323 , 7.7147 ,\n",
+ " -3.05677 ],\n",
+ " ...,\n",
+ " [ 7.7147 , 9.683 , 7.7147 , 5.94323 , -1.28529994,\n",
+ " -1.28529997],\n",
+ " [ 1.62261485, 2.9140163 , 1.62261467, 2.9140163 , -7.37738434,\n",
+ " -7.37738533],\n",
+ " [14.3 , 11.87 , 14.3 , 17. , 5.3 ,\n",
+ " 5.3 ]], shape=(500, 6))}\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 100/100 [00:00<00:00, 2779.93it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(np.float64(7.56), np.float64(2.706732347314747))"
+ ]
+ },
+ "execution_count": 61,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "model = load_from_hub(repo_id=\"ThomasSimonini/q-Taxi-v3\", filename=\"q-learning.pkl\") # Try to use another model\n",
+ "model = load_from_hub(repo_id=\"turbo-maikol/rl-course-unit2\", filename=\"q-learning.pkl\") # Try to use another model\n",
"\n",
"print(model)\n",
"env = gym.make(model[\"env_id\"])\n",
@@ -1675,18 +2132,10 @@
]
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "O7pL8rg1MulN"
- },
- "outputs": [],
+ "cell_type": "markdown",
+ "metadata": {},
"source": [
- "model = load_from_hub(repo_id=\"ThomasSimonini/q-FrozenLake-v1-no-slippery\", filename=\"q-learning.pkl\") # Try to use another model\n",
- "\n",
- "env = gym.make(model[\"env_id\"], is_slippery=False)\n",
- "\n",
- "evaluate_agent(env, model[\"max_steps\"], model[\"n_eval_episodes\"], model[\"qtable\"], model[\"eval_seed\"])"
+ "np.float64(7.56), np.float64(2.706732347314747)"
]
},
{
@@ -1748,25 +2197,35 @@
],
"metadata": {
"colab": {
- "private_outputs": true,
- "provenance": [],
"collapsed_sections": [
"67OdoKL63eDD",
"B2_-8b8z5k54",
"8R5ej1fS4P2V",
"Pnpk2ePoem3r"
],
- "include_colab_link": true
+ "include_colab_link": true,
+ "private_outputs": true,
+ "provenance": []
},
"gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "display_name": "Python (venv-u2)",
+ "language": "python",
+ "name": "venv-u2"
},
"language_info": {
- "name": "python"
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.18"
}
},
"nbformat": 4,
"nbformat_minor": 0
-}
\ No newline at end of file
+}
diff --git a/notebooks/unit3/unit3.ipynb b/notebooks/unit3/unit3.ipynb
index bcd3410..43265af 100644
--- a/notebooks/unit3/unit3.ipynb
+++ b/notebooks/unit3/unit3.ipynb
@@ -3,8 +3,8 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -41,6 +41,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "ykJiGevCMVc5"
+ },
"source": [
"### 🎮 Environments:\n",
"\n",
@@ -51,10 +54,7 @@
"### 📚 RL-Library:\n",
"\n",
"- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)"
- ],
- "metadata": {
- "id": "ykJiGevCMVc5"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -72,13 +72,13 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "TsnP0rjxMn1e"
+ },
"source": [
"## This notebook is from Deep Reinforcement Learning Course\n",
"
"
- ],
- "metadata": {
- "id": "TsnP0rjxMn1e"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -114,12 +114,12 @@
},
{
"cell_type": "markdown",
- "source": [
- "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)."
- ],
"metadata": {
"id": "7kszpGFaRVhq"
- }
+ },
+ "source": [
+ "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues)."
+ ]
},
{
"cell_type": "markdown",
@@ -142,6 +142,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Nc8BnyVEc3Ys"
+ },
"source": [
"## An advice 💡\n",
"It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.\n",
@@ -151,66 +154,63 @@
"Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.\n",
"\n",
"And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`."
- ],
- "metadata": {
- "id": "Nc8BnyVEc3Ys"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PU4FVzaoM6fC"
+ },
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "PU4FVzaoM6fC"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "KV0NyFdQM9ZG"
+ },
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "KV0NyFdQM9ZG"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "wS_cVefO-aYg"
+ },
"source": [
"# Install RL-Baselines3 Zoo and its dependencies 📚\n",
"\n",
"If you see `ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.` **this is normal and it's not a critical error** there's a conflict of version. But the packages we need are installed."
- ],
- "metadata": {
- "id": "wS_cVefO-aYg"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo"
- ],
+ "execution_count": null,
"metadata": {
"id": "S1A_E4z3awa_"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo"
+ ]
},
{
"cell_type": "code",
- "source": [
- "!apt-get install swig cmake ffmpeg"
- ],
+ "execution_count": null,
"metadata": {
"id": "8_MllY6Om1eI"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!apt-get install swig cmake ffmpeg"
+ ]
},
{
"cell_type": "markdown",
@@ -223,28 +223,28 @@
},
{
"cell_type": "code",
- "source": [
- "!pip install gymnasium[atari]\n",
- "!pip install gymnasium[accept-rom-license]"
- ],
+ "execution_count": null,
"metadata": {
"id": "NsRP-lX1_2fC"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!pip install gymnasium[atari]\n",
+ "!pip install gymnasium[accept-rom-license]"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "bTpYcVZVMzUI"
+ },
"source": [
"## Create a virtual display 🔽\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
"\n",
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
- ],
- "metadata": {
- "id": "bTpYcVZVMzUI"
- }
+ ]
},
{
"cell_type": "code",
@@ -262,18 +262,18 @@
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BE5JWP5rQIKf"
+ },
+ "outputs": [],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
- ],
- "metadata": {
- "id": "BE5JWP5rQIKf"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -360,7 +360,7 @@
},
"outputs": [],
"source": [
- "!python -m rl_zoo3.train --algo ________ --env SpaceInvadersNoFrameskip-v4 -f _________ -c _________"
+ "!python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -c dqn.yml"
]
},
{
@@ -396,13 +396,185 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {
"id": "co5um_KeKbBJ"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Loading latest experiment, id=2\n",
+ "Loading logs/dqn/SpaceInvadersNoFrameskip-v4_2/SpaceInvadersNoFrameskip-v4.zip\n",
+ "A.L.E: Arcade Learning Environment (version 0.11.2+ecc1138)\n",
+ "[Powered by Stella]\n",
+ "Stacking 4 frames\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2771\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 25.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1943\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 35.00\n",
+ "Atari Episode Length 1891\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2727\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2749\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1985\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 30.00\n",
+ "Atari Episode Length 2727\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2787\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2787\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1927\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 30.00\n",
+ "Atari Episode Length 2077\n",
+ "Atari Episode Score: 35.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 30.00\n",
+ "Atari Episode Length 2749\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2699\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 25.00\n",
+ "Atari Episode Length 2001\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2069\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2863\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1943\n",
+ "Atari Episode Score: 10.00\n",
+ "Atari Episode Length 2675\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2775\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2025\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 35.00\n",
+ "Atari Episode Length 2787\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1925\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2699\n",
+ "Atari Episode Score: 10.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 35.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2771\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2709\n",
+ "Atari Episode Score: 20.00\n",
+ "Atari Episode Length 1943\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2749\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1973\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 1943\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 15.00\n",
+ "Atari Episode Length 2025\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2749\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 0.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 30.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 30.00\n",
+ "Atari Episode Length 2769\n",
+ "Atari Episode Score: 5.00\n",
+ "Atari Episode Length 2769\n"
+ ]
+ }
+ ],
"source": [
- "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps _________ --folder logs/"
+ "!python -m rl_zoo3.enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 --no-render --n-timesteps 50000 --folder logs/"
]
},
{
@@ -534,7 +706,7 @@
},
"outputs": [],
"source": [
- "!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name _____________________ -orga _____________________ -f logs/"
+ "python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name rl-course-unit3 -orga turbo-maikol -f logs/"
]
},
{
@@ -627,11 +799,26 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {
"id": "OdBNZHy0NGTR"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Downloading from https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4\n",
+ "dqn-BeamRiderNoFrameskip-v4.zip: 100%|█████| 27.2M/27.2M [00:02<00:00, 12.6MB/s]\n",
+ "config.yml: 100%|██████████████████████████████| 548/548 [00:00<00:00, 4.99MB/s]\n",
+ "No normalization file\n",
+ "args.yml: 100%|████████████████████████████████| 887/887 [00:00<00:00, 4.13MB/s]\n",
+ "env_kwargs.yml: 100%|████████████████████████| 3.00/3.00 [00:00<00:00, 9.20kB/s]\n",
+ "train_eval_metrics.zip: 100%|████████████████| 244k/244k [00:00<00:00, 12.7MB/s]\n",
+ "Saving to rl_trained/dqn/BeamRiderNoFrameskip-v4_1\n"
+ ]
+ }
+ ],
"source": [
"# Download model and save it into the logs/ folder\n",
"!python -m rl_zoo3.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/"
@@ -648,11 +835,35 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"metadata": {
"id": "aOxs0rNuN0uS"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Gym has been unmaintained since 2022 and does not support NumPy 2.0 amongst other critical functionality.\n",
+ "Please upgrade to Gymnasium, the maintained drop-in replacement of Gym, or contact the authors of your software and request that they upgrade.\n",
+ "Users of this version of Gym should be able to simply replace 'import gym' with 'import gymnasium as gym' in the vast majority of cases.\n",
+ "See the migration guide at https://gymnasium.farama.org/introduction/migration_guide/ for additional information.\n",
+ "Loading latest experiment, id=1\n",
+ "Loading rl_trained/dqn/BeamRiderNoFrameskip-v4_1/BeamRiderNoFrameskip-v4.zip\n",
+ "A.L.E: Arcade Learning Environment (version 0.11.2+ecc1138)\n",
+ "[Powered by Stella]\n",
+ "Stacking 4 frames\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/save_util.py:167: UserWarning: Could not deserialize object exploration_schedule. Consider using `custom_objects` argument to replace this object.\n",
+ "Exception: 'bytes' object cannot be interpreted as an integer\n",
+ " warnings.warn(\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/vec_env/patch_gym.py:95: UserWarning: You loaded a model that was trained using OpenAI Gym. We strongly recommend transitioning to Gymnasium by saving that model again.\n",
+ " warnings.warn(\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit2/venv-u2/lib/python3.10/site-packages/stable_baselines3/common/base_class.py:773: UserWarning: You are probably loading a DQN model saved with SB3 < 2.4.0, we truncated the optimizer state so you can save the model again to avoid issues in the future (see https://github.com/DLR-RM/stable-baselines3/pull/1963 for more info). Original error: loaded state dict contains a parameter group that doesn't match the size of optimizer's group \n",
+ "Note: the model should still work fine, this only a warning.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
"source": [
"!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000 -f rl_trained/ --no-render"
]
@@ -734,12 +945,12 @@
},
{
"cell_type": "markdown",
- "source": [
- "See you on Bonus unit 2! 🔥"
- ],
"metadata": {
"id": "Kc3udPT-RcXc"
- }
+ },
+ "source": [
+ "See you on Bonus unit 2! 🔥"
+ ]
},
{
"cell_type": "markdown",
@@ -752,13 +963,15 @@
}
],
"metadata": {
+ "accelerator": "GPU",
"colab": {
+ "include_colab_link": true,
"private_outputs": true,
- "provenance": [],
- "include_colab_link": true
+ "provenance": []
},
+ "gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": "venv-u2",
"language": "python",
"name": "python3"
},
@@ -772,7 +985,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.10.6"
+ "version": "3.10.18"
},
"varInspector": {
"cols": {
@@ -802,10 +1015,8 @@
"_Feature"
],
"window_display": false
- },
- "accelerator": "GPU",
- "gpuClass": "standard"
+ }
},
"nbformat": 4,
"nbformat_minor": 0
-}
\ No newline at end of file
+}
diff --git a/notebooks/unit4/unit4.ipynb b/notebooks/unit4/unit4.ipynb
index 884eddd..afa1d5c 100644
--- a/notebooks/unit4/unit4.ipynb
+++ b/notebooks/unit4/unit4.ipynb
@@ -3,8 +3,8 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -36,15 +36,18 @@
},
{
"cell_type": "markdown",
- "source": [
- "
\n"
- ],
"metadata": {
"id": "s4rBom2sbo7S"
- }
+ },
+ "source": [
+ "
\n"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "BPLwsPajb1f8"
+ },
"source": [
"### 🎮 Environments: \n",
"\n",
@@ -58,10 +61,7 @@
"\n",
"\n",
"We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
- ],
- "metadata": {
- "id": "BPLwsPajb1f8"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -120,6 +120,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Bsh4ZAamchSl"
+ },
"source": [
"# Let's code Reinforce algorithm from scratch 🔥\n",
"\n",
@@ -132,58 +135,55 @@
"To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**. **If you don't see your model on the leaderboard, go at the bottom of the leaderboard page and click on the refresh button**.\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process\n"
- ],
- "metadata": {
- "id": "Bsh4ZAamchSl"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "JoTC9o2SczNn"
+ },
"source": [
"## An advice 💡\n",
"It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.\n",
"\n",
"To do that you can either do `Ctrl + S` or `File > Save a copy in Google Drive.`"
- ],
- "metadata": {
- "id": "JoTC9o2SczNn"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PU4FVzaoM6fC"
+ },
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "PU4FVzaoM6fC"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "KV0NyFdQM9ZG"
+ },
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "KV0NyFdQM9ZG"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "bTpYcVZVMzUI"
+ },
"source": [
"## Create a virtual display 🖥\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
"\n",
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
- ],
- "metadata": {
- "id": "bTpYcVZVMzUI"
- }
+ ]
},
{
"cell_type": "code",
@@ -203,18 +203,18 @@
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Sr-Nuyb1dBm0"
+ },
+ "outputs": [],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
- ],
- "metadata": {
- "id": "Sr-Nuyb1dBm0"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -245,14 +245,14 @@
},
{
"cell_type": "code",
- "source": [
- "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit4/requirements-unit4.txt"
- ],
+ "execution_count": null,
"metadata": {
"id": "e8ZVi-uydpgL"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit4/requirements-unit4.txt"
+ ]
},
{
"cell_type": "markdown",
@@ -269,7 +269,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 25,
"metadata": {
"id": "V8oadoJSWp7C"
},
@@ -290,46 +290,47 @@
"from torch.distributions import Categorical\n",
"\n",
"# Gym\n",
- "import gym\n",
- "import gym_pygame\n",
+ "import gymnasium as gym\n",
+ "# import gym_pygame\n",
"\n",
"# Hugging Face Hub\n",
"from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.\n",
- "import imageio"
+ "import imageio\n",
+ "\n",
+ "%load_ext autoreload\n",
+ "%autoreload 2"
]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "RfxJYdMeeVgv"
+ },
"source": [
"## Check if we have a GPU\n",
"\n",
"- Let's check if we have a GPU\n",
"- If it's the case you should see `device:cuda0`"
- ],
- "metadata": {
- "id": "RfxJYdMeeVgv"
- }
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "kaJu5FeZxXGY"
- },
- "outputs": [],
- "source": [
- "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 5,
"metadata": {
- "id": "U5TNYa14aRav"
+ "id": "kaJu5FeZxXGY"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "cuda:0\n"
+ ]
+ }
+ ],
"source": [
- "print(device)"
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ "print(device)\n"
]
},
{
@@ -393,7 +394,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 6,
"metadata": {
"id": "POOOk15_K6KA"
},
@@ -404,7 +405,7 @@
"env = gym.make(env_id)\n",
"\n",
"# Create the evaluation env\n",
- "eval_env = gym.make(env_id)\n",
+ "eval_env = gym.make(env_id, render_mode=\"rgb_array\")\n",
"\n",
"# Get the state space and action space\n",
"s_size = env.observation_space.shape[0]\n",
@@ -413,11 +414,22 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 7,
"metadata": {
"id": "FMLFrjiBNLYJ"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "_____OBSERVATION SPACE_____ \n",
+ "\n",
+ "The State Space is: 4\n",
+ "Sample observation [-0.92062986 -0.65902454 0.2579916 -0.6175645 ]\n"
+ ]
+ }
+ ],
"source": [
"print(\"_____OBSERVATION SPACE_____ \\n\")\n",
"print(\"The State Space is: \", s_size)\n",
@@ -426,11 +438,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 8,
"metadata": {
"id": "Lu6t4sRNNWkN"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " _____ACTION SPACE_____ \n",
+ "\n",
+ "The Action Space is: 2\n",
+ "Action Space Sample 1\n"
+ ]
+ }
+ ],
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"The Action Space is: \", a_size)\n",
@@ -466,27 +490,43 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 21,
"metadata": {
"id": "w2LHcHhVZvPZ"
},
- "outputs": [],
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'nn' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[21], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mclass\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mPolicy\u001b[39;00m(\u001b[43mnn\u001b[49m\u001b[38;5;241m.\u001b[39mModule):\n\u001b[1;32m 2\u001b[0m \u001b[38;5;66;03m# State # Action # hidden\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21m__init__\u001b[39m(\u001b[38;5;28mself\u001b[39m, s_size, a_size, h_size):\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28msuper\u001b[39m(Policy, \u001b[38;5;28mself\u001b[39m)\u001b[38;5;241m.\u001b[39m\u001b[38;5;21m__init__\u001b[39m()\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'nn' is not defined"
+ ]
+ }
+ ],
"source": [
"class Policy(nn.Module):\n",
+ " # State # Action # hidden\n",
" def __init__(self, s_size, a_size, h_size):\n",
" super(Policy, self).__init__()\n",
" # Create two fully connected layers\n",
- "\n",
- "\n",
+ " self.fc1 = nn.Linear(s_size, h_size)\n",
+ " self.fc2 = nn.Linear(h_size, a_size)\n",
+ " self.relu = nn.ReLU()\n",
"\n",
" def forward(self, x):\n",
" # Define the forward pass\n",
" # state goes to fc1 then we apply ReLU activation function\n",
- "\n",
+ " x = self.relu(self.fc1(x))\n",
" # fc1 outputs goes to fc2\n",
+ " x = self.fc2(x)\n",
"\n",
" # We output the softmax\n",
- " \n",
+ " return F.softmax(x, dim=1)\n",
+ "\n",
" def act(self, state):\n",
" \"\"\"\n",
" Given a state, take action\n",
@@ -494,7 +534,7 @@
" state = torch.from_numpy(state).float().unsqueeze(0).to(device)\n",
" probs = self.forward(state).cpu()\n",
" m = Categorical(probs)\n",
- " action = np.argmax(m)\n",
+ " action = m.sample() #torch.argmax(probs, dim=1)#np.argmax(m)\n",
" return action.item(), m.log_prob(action)"
]
},
@@ -554,7 +594,7 @@
"outputs": [],
"source": [
"debug_policy = Policy(s_size, a_size, 64).to(device)\n",
- "debug_policy.act(env.reset())"
+ "debug_policy.act(env.reset()[0])"
]
},
{
@@ -619,14 +659,14 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "c-20i7Pk0l1T"
+ },
"source": [
"- Since **we want to sample an action from the probability distribution over actions**, we can't use `action = np.argmax(m)` since it will always output the action that have the highest probability.\n",
"\n",
"- We need to replace with `action = m.sample()` that will sample an action from the probability distribution P(.|s)"
- ],
- "metadata": {
- "id": "c-20i7Pk0l1T"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -643,6 +683,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "QmcXG-9i2Qu2"
+ },
"source": [
"- When we calculate the return Gt (line 6) we see that we calculate the sum of discounted rewards **starting at timestep t**.\n",
"\n",
@@ -652,10 +695,7 @@
"\n",
"We use an interesting technique coded by [Chris1nexus](https://github.com/Chris1nexus) to **compute the return at each timestep efficiently**. The comments explained the procedure. Don't hesitate also [to check the PR explanation](https://github.com/huggingface/deep-rl-class/pull/95)\n",
"But overall the idea is to **compute the return at each timestep efficiently**."
- ],
- "metadata": {
- "id": "QmcXG-9i2Qu2"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -676,38 +716,72 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "state = array([-0.01473051, 0.02841404, 0.0272485 , -0.03844116], dtype=float32)\n",
+ "state = array([-0.01416223, 0.22313488, 0.02647968, -0.3224039 ], dtype=float32)\n",
+ "reward = 1.0\n",
+ "terminated = False\n",
+ "truncated = False\n",
+ "_ = {}\n"
+ ]
+ }
+ ],
+ "source": [
+ "state, _ = env.reset()\n",
+ "print(f\"{state = }\")\n",
+ "state, reward, terminated, truncated, _ = env.step(1)\n",
+ "\n",
+ "print(f\"{state = }\")\n",
+ "print(f\"{reward = }\")\n",
+ "print(f\"{terminated = }\")\n",
+ "print(f\"{truncated = }\")\n",
+ "print(f\"{_ = }\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
"metadata": {
"id": "iOdv8Q9NfLK7"
},
"outputs": [],
"source": [
- "def reinforce(policy, optimizer, n_training_episodes, max_t, gamma, print_every):\n",
+ "def reinforce(policy, optimizer, n_training_episodes, max_t, gamma, print_every, max_pacience = 50):\n",
" # Help us to calculate the score during the training\n",
- " scores_deque = deque(maxlen=100)\n",
- " scores = []\n",
+ " scores = deque(maxlen=100)\n",
+ "\n",
+ " last_max = np.mean(scores)\n",
+ " pacience = 0\n",
" # Line 3 of pseudocode\n",
" for i_episode in range(1, n_training_episodes+1):\n",
- " saved_log_probs = []\n",
- " rewards = []\n",
- " state = # TODO: reset the environment\n",
- " # Line 4 of pseudocode\n",
+ " rewards, saved_log_probs = [], []\n",
+ " state, _ = env.reset() # TODO: reset the environment\n",
+ "\n",
+ " # ========= Line 4 of pseudocode =========\n",
" for t in range(max_t):\n",
- " action, log_prob = # TODO get the action\n",
+ " action, log_prob = policy.act(state) # TODO get the action\n",
" saved_log_probs.append(log_prob)\n",
- " state, reward, done, _ = # TODO: take an env step\n",
+ " state, reward, terminated, truncated, _ = env.step(action) # TODO: take an env step\n",
" rewards.append(reward)\n",
- " if done:\n",
+ " if terminated or truncated:\n",
" break \n",
- " scores_deque.append(sum(rewards))\n",
+ "\n",
" scores.append(sum(rewards))\n",
" \n",
- " # Line 6 of pseudocode: calculate the return\n",
+ " # ========= Line 6 of pseudocode: calculate the return =========\n",
" returns = deque(maxlen=max_t) \n",
" n_steps = len(rewards) \n",
+ " \n",
+ " \"\"\"# ================ EXPLANATION ================\n",
" # Compute the discounted returns at each timestep,\n",
" # as the sum of the gamma-discounted return at time t (G_t) + the reward at time t\n",
- " \n",
+ "\n",
" # In O(N) time, where N is the number of time steps\n",
" # (this definition of the discounted return G_t follows the definition of this quantity \n",
" # shown at page 44 of Sutton&Barto 2017 2nd draft)\n",
@@ -723,7 +797,6 @@
" # This is correct since the above is equivalent to (see also page 46 of Sutton&Barto 2017 2nd draft)\n",
" # G_(t-1) = r_t + gamma*r_(t+1) + gamma*gamma*r_(t+2) + ...\n",
" \n",
- " \n",
" ## Given the above, we calculate the returns at timestep t as: \n",
" # gamma[t] * return[t] + reward[t]\n",
" #\n",
@@ -733,10 +806,11 @@
" \n",
" ## Hence, the queue \"returns\" will hold the returns in chronological order, from t=0 to t=n_steps\n",
" ## thanks to the appendleft() function which allows to append to the position 0 in constant time O(1)\n",
- " ## a normal python list would instead require O(N) to do this.\n",
+ " ## a normal python list would instead require O(N) to do this.\"\"\"\n",
+ " disc_return_t = 0\n",
" for t in range(n_steps)[::-1]:\n",
- " disc_return_t = (returns[0] if len(returns)>0 else 0)\n",
- " returns.appendleft( ) # TODO: complete here \n",
+ " returns.appendleft(disc_return_t * gamma + rewards[t]) \n",
+ " disc_return_t = returns[0]\n",
" \n",
" ## standardization of the returns is employed to make training more stable\n",
" eps = np.finfo(np.float32).eps.item()\n",
@@ -746,21 +820,30 @@
" returns = torch.tensor(returns)\n",
" returns = (returns - returns.mean()) / (returns.std() + eps)\n",
" \n",
- " # Line 7:\n",
+ " # ========= Line 7=========\n",
" policy_loss = []\n",
" for log_prob, disc_return in zip(saved_log_probs, returns):\n",
" policy_loss.append(-log_prob * disc_return)\n",
" policy_loss = torch.cat(policy_loss).sum()\n",
" \n",
- " # Line 8: PyTorch prefers gradient descent \n",
+ " # ========= Line 8: PyTorch prefers gradient descent =========\n",
" optimizer.zero_grad()\n",
" policy_loss.backward()\n",
" optimizer.step()\n",
" \n",
+ " mean = np.mean(scores)\n",
" if i_episode % print_every == 0:\n",
- " print('Episode {}\\tAverage Score: {:.2f}'.format(i_episode, np.mean(scores_deque)))\n",
+ " print('Episode {}\\tAverage Score: {:.2f}'.format(i_episode, mean))\n",
+ "\n",
+ " if last_max >= mean:\n",
+ " pacience += 1\n",
+ " if pacience >= max_pacience:\n",
+ " print(' - Breaking at Episode {}\\t with average Score: {:.2f} for max pacience {:.2f}'.format(i_episode, mean, last_max))\n",
+ " break\n",
+ " else:\n",
+ " last_max, pacience = mean, 0\n",
" \n",
- " return scores"
+ " return list(scores)"
]
},
{
@@ -788,7 +871,7 @@
" for i_episode in range(1, n_training_episodes+1):\n",
" saved_log_probs = []\n",
" rewards = []\n",
- " state = env.reset()\n",
+ " state, _ = env.reset()\n",
" # Line 4 of pseudocode\n",
" for t in range(max_t):\n",
" action, log_prob = policy.act(state)\n",
@@ -875,7 +958,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 12,
"metadata": {
"id": "utRe1NgtVBYF"
},
@@ -886,17 +969,17 @@
" \"n_training_episodes\": 1000,\n",
" \"n_evaluation_episodes\": 10,\n",
" \"max_t\": 1000,\n",
- " \"gamma\": 1.0,\n",
+ " \"gamma\": 0.99,\n",
" \"lr\": 1e-2,\n",
" \"env_id\": env_id,\n",
- " \"state_space\": s_size,\n",
- " \"action_space\": a_size,\n",
+ " \"state_space\": int(s_size),\n",
+ " \"action_space\": int(a_size),\n",
"}"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 13,
"metadata": {
"id": "D3lWyVXBVfl6"
},
@@ -909,18 +992,31 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 32,
"metadata": {
"id": "uGf-hQCnfouB"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Episode 25\tAverage Score: 500.00\n",
+ "Episode 50\tAverage Score: 500.00\n",
+ " - Breaking at Episode 51\t with average Score: 500.00 for max pacience 500.00\n"
+ ]
+ }
+ ],
"source": [
- "scores = reinforce(cartpole_policy,\n",
- " cartpole_optimizer,\n",
- " cartpole_hyperparameters[\"n_training_episodes\"], \n",
- " cartpole_hyperparameters[\"max_t\"],\n",
- " cartpole_hyperparameters[\"gamma\"], \n",
- " 100)"
+ "scores = reinforce(\n",
+ " cartpole_policy,\n",
+ " cartpole_optimizer,\n",
+ " cartpole_hyperparameters[\"n_training_episodes\"], \n",
+ " cartpole_hyperparameters[\"max_t\"],\n",
+ " cartpole_hyperparameters[\"gamma\"], \n",
+ " 25,\n",
+ " 50\n",
+ ")"
]
},
{
@@ -935,7 +1031,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 20,
"metadata": {
"id": "3FamHmxyhBEU"
},
@@ -950,20 +1046,60 @@
" \"\"\"\n",
" episode_rewards = []\n",
" for episode in range(n_eval_episodes):\n",
- " state = env.reset()\n",
- " step = 0\n",
- " done = False\n",
+ " state, _ = env.reset()\n",
" total_rewards_ep = 0\n",
" \n",
- " for step in range(max_steps):\n",
+ " for _ in range(max_steps):\n",
" action, _ = policy.act(state)\n",
- " new_state, reward, done, info = env.step(action)\n",
+ " new_state, reward, terminated, truncated, _ = env.step(action)\n",
" total_rewards_ep += reward\n",
" \n",
- " if done:\n",
+ " if terminated or truncated:\n",
+ " break\n",
+ "\n",
+ " state = new_state\n",
+ " episode_rewards.append(total_rewards_ep)\n",
+ "\n",
+ " if episode % 100 == 0:\n",
+ " print(f\"Episode: {episode:.4f}, mean reward: {np.mean(episode_rewards):.4f}\")\n",
+ "\n",
+ " mean_reward = np.mean(episode_rewards)\n",
+ " std_reward = np.std(episode_rewards)\n",
+ "\n",
+ " return mean_reward, std_reward\n",
+ "\n",
+ "\n",
+ "def evaluate_agent_pygame(env, max_steps, n_eval_episodes, policy, game_p):\n",
+ " \"\"\"\n",
+ " Evaluate the agent for ``n_eval_episodes`` episodes and returns average reward and std of reward.\n",
+ " :param env: The evaluation environment\n",
+ " :param n_eval_episodes: Number of episode to evaluate the agent\n",
+ " :param policy: The Reinforce agent\n",
+ " \"\"\"\n",
+ " episode_rewards = []\n",
+ " game_p.init()\n",
+ " actions_set = game_p.getActionSet()\n",
+ " for episode in range(n_eval_episodes):\n",
+ " game_p.reset_game()\n",
+ " state = np.array(list(game_p.getGameState().values()), dtype=np.float32)\n",
+ " \n",
+ " total_rewards_ep = 0\n",
+ " \n",
+ " for _ in range(max_steps):\n",
+ " action, _ = policy.act(state)\n",
+ " action = actions_set[action]\n",
+ " reward = game_p.act(action)\n",
+ " total_rewards_ep += reward\n",
+ " new_state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n",
+ " if game_p.game_over():\n",
" break\n",
+ "\n",
" state = new_state\n",
" episode_rewards.append(total_rewards_ep)\n",
+ "\n",
+ " if episode % 100 == 0:\n",
+ " print(f\"Episode: {episode:.4f}, mean reward: {np.mean(episode_rewards):.4f}\")\n",
+ "\n",
" mean_reward = np.mean(episode_rewards)\n",
" std_reward = np.std(episode_rewards)\n",
"\n",
@@ -981,16 +1117,36 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 34,
"metadata": {
"id": "ohGSXDyHh0xx"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Episode: 0.0000, mean reward: 500.0000\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(np.float64(500.0), np.float64(0.0))"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "evaluate_agent(eval_env, \n",
- " cartpole_hyperparameters[\"max_t\"], \n",
- " cartpole_hyperparameters[\"n_evaluation_episodes\"],\n",
- " cartpole_policy)"
+ "evaluate_agent(\n",
+ " eval_env, \n",
+ " cartpole_hyperparameters[\"max_t\"], \n",
+ " cartpole_hyperparameters[\"n_evaluation_episodes\"],\n",
+ " cartpole_policy\n",
+ ")"
]
},
{
@@ -1019,6 +1175,11 @@
},
{
"cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "id": "LIVsvlW_8tcw"
+ },
+ "outputs": [],
"source": [
"from huggingface_hub import HfApi, snapshot_download\n",
"from huggingface_hub.repocard import metadata_eval_result, metadata_save\n",
@@ -1031,21 +1192,18 @@
"import tempfile\n",
"\n",
"import os"
- ],
- "metadata": {
- "id": "LIVsvlW_8tcw"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 17,
"metadata": {
"id": "Lo4JH45if81z"
},
"outputs": [],
"source": [
+ "import pygame\n",
+ "\n",
"def record_video(env, policy, out_directory, fps=30):\n",
" \"\"\"\n",
" Generate a replay video of the agent\n",
@@ -1056,27 +1214,75 @@
" \"\"\"\n",
" images = [] \n",
" done = False\n",
- " state = env.reset()\n",
- " img = env.render(mode='rgb_array')\n",
+ " state, _ = env.reset()\n",
+ " img = env.render()\n",
" images.append(img)\n",
- " while not done:\n",
+ " for frame in range(fps*100):\n",
" # Take the action (index) that have the maximum expected future reward given that state\n",
" action, _ = policy.act(state)\n",
- " state, reward, done, info = env.step(action) # We directly put next_state = state for recording logic\n",
- " img = env.render(mode='rgb_array')\n",
+ " state, reward, terminated, truncated, _ = env.step(action) # We directly put next_state = state for recording logic\n",
+ " img = env.render()\n",
" images.append(img)\n",
+ "\n",
+ " if terminated or truncated:\n",
+ " break\n",
+ "\n",
+ "\n",
+ " print(\" - Teminated video loop, mimsave...\")\n",
+ " imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)\n",
+ "\n",
+ "\n",
+ "def record_video_pygame(env, policy, out_directory, game_p, fps=30):\n",
+ " \"\"\"\n",
+ " Generate a replay video of the agent\n",
+ " :param env\n",
+ " :param Qtable: Qtable of our agent\n",
+ " :param out_directory\n",
+ " :param fps: how many frame per seconds (with taxi-v3 and frozenlake-v1 we use 1)\n",
+ " \"\"\"\n",
+ " images = [] \n",
+ " game_p.init()\n",
+ " actions_set = game_p.getActionSet()\n",
+ " game_p.reset_game()\n",
+ " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) # TODO: reset the environment\n",
+ " \n",
+ " for frame in range(fps*100):\n",
+ " # Take the action (index) that have the maximum expected future reward given that state\n",
+ " action, _ = policy.act(state)\n",
+ " action = actions_set[action]\n",
+ " reward = game_p.act(action) # We directly put next_state = state for recording logic\n",
+ "\n",
+ "\n",
+ " surface = pygame.display.get_surface()\n",
+ " if surface is not None:\n",
+ " img = pygame.surfarray.array3d(surface) # shape (W,H,3)\n",
+ " img = np.transpose(img, (1, 0, 2)) # (H,W,3)\n",
+ " images.append(img)\n",
+ "\n",
+ " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n",
+ " if game_p.game_over():\n",
+ " break\n",
+ "\n",
+ "\n",
+ " print(\" - Teminated video loop, mimsave...\")\n",
" imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)"
]
},
{
"cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "id": "_TPdq47D7_f_"
+ },
+ "outputs": [],
"source": [
- "def push_to_hub(repo_id, \n",
- " model,\n",
- " hyperparameters,\n",
- " eval_env,\n",
- " video_fps=30\n",
- " ):\n",
+ "def push_to_hub(\n",
+ " repo_id, \n",
+ " model,\n",
+ " hyperparameters,\n",
+ " eval_env,\n",
+ " video_fps=30\n",
+ "):\n",
" \"\"\"\n",
" Evaluate, Generate a video and Upload a model to Hugging Face Hub.\n",
" This method does the complete pipeline:\n",
@@ -1180,9 +1386,11 @@
"\n",
" # Step 6: Record a video\n",
" video_path = local_directory / \"replay.mp4\"\n",
- " record_video(env, model, video_path, video_fps)\n",
+ " print(\"VIDEO\")\n",
+ " record_video(eval_env, model, video_path, video_fps)\n",
"\n",
" # Step 7. Push everything to the Hub\n",
+ " print(\"PUSH\")\n",
" api.upload_folder(\n",
" repo_id=repo_id,\n",
" folder_path=local_directory,\n",
@@ -1190,26 +1398,155 @@
" )\n",
"\n",
" print(f\"Your model is pushed to the Hub. You can view your model here: {repo_url}\")"
- ],
- "metadata": {
- "id": "_TPdq47D7_f_"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
- "cell_type": "markdown",
- "metadata": {
- "id": "w17w8CxzoURM"
- },
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
"source": [
- "### .\n",
- "\n",
- "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.\n",
+ "def push_to_hub_pygame(\n",
+ " repo_id, \n",
+ " model,\n",
+ " hyperparameters,\n",
+ " eval_env,\n",
+ " game_p,\n",
+ " video_fps=30,\n",
+ "):\n",
+ " \"\"\"\n",
+ " Evaluate, Generate a video and Upload a model to Hugging Face Hub.\n",
+ " This method does the complete pipeline:\n",
+ " - It evaluates the model\n",
+ " - It generates the model card\n",
+ " - It generates a replay video of the agent\n",
+ " - It pushes everything to the Hub\n",
"\n",
- "This way:\n",
- "- You can **showcase our work** 🔥\n",
- "- You can **visualize your agent playing** 👀\n",
+ " :param repo_id: repo_id: id of the model repository from the Hugging Face Hub\n",
+ " :param model: the pytorch model we want to save\n",
+ " :param hyperparameters: training hyperparameters\n",
+ " :param eval_env: evaluation environment\n",
+ " :param video_fps: how many frame per seconds to record our video replay \n",
+ " \"\"\"\n",
+ "\n",
+ " _, repo_name = repo_id.split(\"/\")\n",
+ " api = HfApi()\n",
+ " \n",
+ " # Step 1: Create the repo\n",
+ " repo_url = api.create_repo(\n",
+ " repo_id=repo_id,\n",
+ " exist_ok=True,\n",
+ " )\n",
+ "\n",
+ " with tempfile.TemporaryDirectory() as tmpdirname:\n",
+ " local_directory = Path(tmpdirname)\n",
+ " \n",
+ " # Step 2: Save the model\n",
+ " torch.save(model, local_directory / \"model.pt\")\n",
+ "\n",
+ " # Step 3: Save the hyperparameters to JSON\n",
+ " with open(local_directory / \"hyperparameters.json\", \"w\") as outfile:\n",
+ " json.dump(hyperparameters, outfile)\n",
+ " \n",
+ " # Step 4: Evaluate the model and build JSON\n",
+ " mean_reward, std_reward = evaluate_agent_pygame(\n",
+ " eval_env, \n",
+ " hyperparameters[\"max_t\"],\n",
+ " hyperparameters[\"n_evaluation_episodes\"], \n",
+ " model,\n",
+ " game_p\n",
+ " )\n",
+ " # Get datetime\n",
+ " eval_datetime = datetime.datetime.now()\n",
+ " eval_form_datetime = eval_datetime.isoformat()\n",
+ "\n",
+ " evaluate_data = {\n",
+ " \"env_id\": hyperparameters[\"env_id\"], \n",
+ " \"mean_reward\": mean_reward,\n",
+ " \"n_evaluation_episodes\": hyperparameters[\"n_evaluation_episodes\"],\n",
+ " \"eval_datetime\": eval_form_datetime,\n",
+ " }\n",
+ "\n",
+ " # Write a JSON file\n",
+ " with open(local_directory / \"results.json\", \"w\") as outfile:\n",
+ " json.dump(evaluate_data, outfile)\n",
+ "\n",
+ " # Step 5: Create the model card\n",
+ " env_name = hyperparameters[\"env_id\"]\n",
+ " \n",
+ " metadata = {}\n",
+ " metadata[\"tags\"] = [\n",
+ " env_name,\n",
+ " \"reinforce\",\n",
+ " \"reinforcement-learning\",\n",
+ " \"custom-implementation\",\n",
+ " \"deep-rl-class\"\n",
+ " ]\n",
+ "\n",
+ " # Add metrics\n",
+ " eval = metadata_eval_result(\n",
+ " model_pretty_name=repo_name,\n",
+ " task_pretty_name=\"reinforcement-learning\",\n",
+ " task_id=\"reinforcement-learning\",\n",
+ " metrics_pretty_name=\"mean_reward\",\n",
+ " metrics_id=\"mean_reward\",\n",
+ " metrics_value=f\"{mean_reward:.2f} +/- {std_reward:.2f}\",\n",
+ " dataset_pretty_name=env_name,\n",
+ " dataset_id=env_name,\n",
+ " )\n",
+ "\n",
+ " # Merges both dictionaries\n",
+ " metadata = {**metadata, **eval}\n",
+ "\n",
+ " model_card = f\"\"\"\n",
+ " # **Reinforce** Agent playing **{env_id}**\n",
+ " This is a trained model of a **Reinforce** agent playing **{env_id}** .\n",
+ " To learn to use this model and train yours check Unit 4 of the Deep Reinforcement Learning Course: https://huggingface.co/deep-rl-course/unit4/introduction\n",
+ " \"\"\"\n",
+ "\n",
+ " readme_path = local_directory / \"README.md\"\n",
+ " readme = \"\"\n",
+ " if readme_path.exists():\n",
+ " with readme_path.open(\"r\", encoding=\"utf8\") as f:\n",
+ " readme = f.read()\n",
+ " else:\n",
+ " readme = model_card\n",
+ "\n",
+ " with readme_path.open(\"w\", encoding=\"utf-8\") as f:\n",
+ " f.write(readme)\n",
+ "\n",
+ " # Save our metrics to Readme metadata\n",
+ " metadata_save(readme_path, metadata)\n",
+ "\n",
+ " # Step 6: Record a video\n",
+ " video_path = local_directory / \"replay.mp4\"\n",
+ " print(\"VIDEO\")\n",
+ " record_video_pygame(eval_env, model, video_path, game_p, video_fps)\n",
+ "\n",
+ " # Step 7. Push everything to the Hub\n",
+ " print(\"PUSH\")\n",
+ " api.upload_folder(\n",
+ " repo_id=repo_id,\n",
+ " folder_path=local_directory,\n",
+ " path_in_repo=\".\",\n",
+ " )\n",
+ "\n",
+ " print(f\"Your model is pushed to the Hub. You can view your model here: {repo_url}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "w17w8CxzoURM"
+ },
+ "source": [
+ "### .\n",
+ "\n",
+ "By using `push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the Hub**.\n",
+ "\n",
+ "This way:\n",
+ "- You can **showcase our work** 🔥\n",
+ "- You can **visualize your agent playing** 👀\n",
"- You can **share with the community an agent that others can use** 💾\n",
"- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n"
]
@@ -1262,19 +1599,32 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 37,
"metadata": {
"id": "UNwkTS65Uq3Q"
},
- "outputs": [],
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'cartpole_policy' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[37], line 4\u001b[0m\n\u001b[1;32m 1\u001b[0m repo_id \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mturbo-maikol/Reinforce-rl-course-unit4-cartpole\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m#TODO Define your repo id {username/Reinforce-{model-id}}\u001b[39;00m\n\u001b[1;32m 2\u001b[0m push_to_hub(\n\u001b[1;32m 3\u001b[0m repo_id,\n\u001b[0;32m----> 4\u001b[0m \u001b[43mcartpole_policy\u001b[49m, \u001b[38;5;66;03m# The model we want to save\u001b[39;00m\n\u001b[1;32m 5\u001b[0m cartpole_hyperparameters, \u001b[38;5;66;03m# Hyperparameters\u001b[39;00m\n\u001b[1;32m 6\u001b[0m eval_env, \u001b[38;5;66;03m# Evaluation environment\u001b[39;00m\n\u001b[1;32m 7\u001b[0m video_fps\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m30\u001b[39m\n\u001b[1;32m 8\u001b[0m )\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'cartpole_policy' is not defined"
+ ]
+ }
+ ],
"source": [
- "repo_id = \"\" #TODO Define your repo id {username/Reinforce-{model-id}}\n",
- "push_to_hub(repo_id,\n",
- " cartpole_policy, # The model we want to save\n",
- " cartpole_hyperparameters, # Hyperparameters\n",
- " eval_env, # Evaluation environment\n",
- " video_fps=30\n",
- " )"
+ "repo_id = \"turbo-maikol/Reinforce-rl-course-unit4-cartpole\" #TODO Define your repo id {username/Reinforce-{model-id}}\n",
+ "push_to_hub(\n",
+ " repo_id,\n",
+ " cartpole_policy, # The model we want to save\n",
+ " cartpole_hyperparameters, # Hyperparameters\n",
+ " eval_env, # Evaluation environment\n",
+ " video_fps=30\n",
+ ")"
]
},
{
@@ -1290,56 +1640,145 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "JNLVmKKVKA6j"
+ },
"source": [
"## Second agent: PixelCopter 🚁\n",
"\n",
"### Study the PixelCopter environment 👀\n",
"- [The Environment documentation](https://pygame-learning-environment.readthedocs.io/en/latest/user/games/pixelcopter.html)\n"
- ],
- "metadata": {
- "id": "JNLVmKKVKA6j"
- }
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from ple.games.pixelcopter import Pixelcopter\n",
+ "from ple import PLE\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
"metadata": {
"id": "JBSc8mlfyin3"
},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[119, None]"
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "env_id = \"Pixelcopter-PLE-v0\"\n",
- "env = gym.make(env_id)\n",
- "eval_env = gym.make(env_id)\n",
- "s_size = env.observation_space.shape[0]\n",
- "a_size = env.action_space.n"
+ "# env_id = \"Pixelcopter-PLE-v0\"\n",
+ "# env = gym.make(env_id)\n",
+ "env = Pixelcopter()\n",
+ "p = PLE(env, fps=30, display_screen=True)\n",
+ "\n",
+ "p.init()\n",
+ "reward = 0.0\n",
+ "\n",
+ "actions = p.getActionSet()\n",
+ "# for i in range(10_000):\n",
+ " # if p.game_over():\n",
+ " # print(f\"{p.reset_game()}\")\n",
+ " # print(f\"{i: }\")\n",
+ "\n",
+ " # print(f\" - {np.array(list(env.getGameState().values()), dtype=np.float32)}\")\n",
+ " # reward = p.act(np.random.randint(0,2))\n",
+ " # print(f\" - {reward = }\")\n",
+ "actions"
]
},
{
"cell_type": "code",
- "source": [
- "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
- "print(\"The State Space is: \", s_size)\n",
- "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " - [24. 0. 7. 17. 48. 22. 31.]\n",
+ " - reward = 0.0\n"
+ ]
+ }
],
+ "source": [
+ "if p.game_over():\n",
+ " print(f\"{p.reset_game()}\")\n",
+ "\n",
+ "print(f\" - {np.array(list(env.getGameState().values()), dtype=np.float32)}\")\n",
+ "reward = p.act(actions[0])\n",
+ "print(f\" - {reward = }\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "s_size = len(env.getGameState())\n",
+ "a_size = 2"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
"metadata": {
"id": "L5u_zAHsKBy7"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "_____OBSERVATION SPACE_____ \n",
+ "\n",
+ "The State Space is: 7\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
+ "print(\"The State Space is: \", s_size)\n",
+ "# print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+ ]
},
{
"cell_type": "code",
- "source": [
- "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
- "print(\"The Action Space is: \", a_size)\n",
- "print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
- ],
+ "execution_count": 11,
"metadata": {
"id": "D7yJM9YXKNbq"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " _____ACTION SPACE_____ \n",
+ "\n",
+ "The Action Space is: 2\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"\\n _____ACTION SPACE_____ \\n\")\n",
+ "print(\"The Action Space is: \", a_size)\n",
+ "# print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
+ ]
},
{
"cell_type": "markdown",
@@ -1366,17 +1805,17 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "aV1466QP8crz"
+ },
"source": [
"### Define the new Policy 🧠\n",
"- We need to have a deeper neural network since the environment is more complex"
- ],
- "metadata": {
- "id": "aV1466QP8crz"
- }
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 27,
"metadata": {
"id": "I1eBkCiX2X_S"
},
@@ -1386,9 +1825,20 @@
" def __init__(self, s_size, a_size, h_size):\n",
" super(Policy, self).__init__()\n",
" # Define the three layers here\n",
+ " self.fc1 = nn.Linear(s_size, h_size)\n",
+ " self.fc2 = nn.Linear(h_size, h_size*2)\n",
+ " self.fc3 = nn.Linear(h_size*2, a_size)\n",
+ " # self.fc4 = nn.Linear(h_size, a_size)\n",
+ " self.relu = nn.ReLU()\n",
"\n",
" def forward(self, x):\n",
" # Define the forward process here\n",
+ " x = self.relu(self.fc1(x))\n",
+ " x = self.relu(self.fc2(x))\n",
+ " # x = self.relu(self.fc3(x))\n",
+ " # x = self.fc4(x) \n",
+ " x = self.fc3(x) \n",
+ "\n",
" return F.softmax(x, dim=1)\n",
" \n",
" def act(self, state):\n",
@@ -1401,15 +1851,20 @@
},
{
"cell_type": "markdown",
- "source": [
- "#### Solution"
- ],
"metadata": {
"id": "47iuAFqV8Ws-"
- }
+ },
+ "source": [
+ "#### Solution"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "wrNuVcHC8Xu7"
+ },
+ "outputs": [],
"source": [
"class Policy(nn.Module):\n",
" def __init__(self, s_size, a_size, h_size):\n",
@@ -1430,12 +1885,7 @@
" m = Categorical(probs)\n",
" action = m.sample()\n",
" return action.item(), m.log_prob(action)"
- ],
- "metadata": {
- "id": "wrNuVcHC8Xu7"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -1449,16 +1899,135 @@
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {
- "id": "y0uujOR_ypB6"
+ "id": "wyvXTJWm9GJG"
},
+ "source": [
+ "### Train it\n",
+ "- We're now ready to train our agent 🔥."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 359,
+ "metadata": {},
"outputs": [],
"source": [
+ "from collections import Counter\n",
+ "\n",
+ "def reinforce_pygame(game_p, policy, optimizer, n_training_episodes, max_t, gamma, print_every, max_pacience = None):\n",
+ " # Help us to calculate the score during the training\n",
+ " scores = deque(maxlen=100)\n",
+ " game_p.init()\n",
+ " actions_set = game_p.getActionSet()\n",
+ "\n",
+ " last_max = np.mean(scores)\n",
+ " pacience = 0\n",
+ " # Line 3 of pseudocode\n",
+ " for i_episode in range(1, n_training_episodes+1):\n",
+ " rewards, saved_log_probs, actions = [], [], []\n",
+ " game_p.reset_game()\n",
+ " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) # TODO: reset the environment\n",
+ "\n",
+ " # ========= Line 4 of pseudocode =========\n",
+ " for t in range(max_t):\n",
+ " action, log_prob = policy.act(state) # TODO get the action\n",
+ " action = actions_set[action]\n",
+ " actions.append(action)\n",
+ "\n",
+ " saved_log_probs.append(log_prob)\n",
+ " reward = game_p.act(action) # TODO: take an env step\n",
+ " rewards.append(reward)\n",
+ "\n",
+ " state = np.array(list(game_p.getGameState().values()), dtype=np.float32) \n",
+ " if game_p.game_over():\n",
+ " break\n",
+ "\n",
+ " scores.append(sum(rewards))\n",
+ " \n",
+ " # ========= Line 6 of pseudocode: calculate the return =========\n",
+ " returns = deque(maxlen=max_t) \n",
+ " n_steps = len(rewards) \n",
+ " \n",
+ " \"\"\"# ================ EXPLANATION ================\n",
+ " # Compute the discounted returns at each timestep,\n",
+ " # as the sum of the gamma-discounted return at time t (G_t) + the reward at time t\n",
+ "\n",
+ " # In O(N) time, where N is the number of time steps\n",
+ " # (this definition of the discounted return G_t follows the definition of this quantity \n",
+ " # shown at page 44 of Sutton&Barto 2017 2nd draft)\n",
+ " # G_t = r_(t+1) + r_(t+2) + ...\n",
+ " \n",
+ " # Given this formulation, the returns at each timestep t can be computed \n",
+ " # by re-using the computed future returns G_(t+1) to compute the current return G_t\n",
+ " # G_t = r_(t+1) + gamma*G_(t+1)\n",
+ " # G_(t-1) = r_t + gamma* G_t\n",
+ " # (this follows a dynamic programming approach, with which we memorize solutions in order \n",
+ " # to avoid computing them multiple times)\n",
+ " \n",
+ " # This is correct since the above is equivalent to (see also page 46 of Sutton&Barto 2017 2nd draft)\n",
+ " # G_(t-1) = r_t + gamma*r_(t+1) + gamma*gamma*r_(t+2) + ...\n",
+ " \n",
+ " ## Given the above, we calculate the returns at timestep t as: \n",
+ " # gamma[t] * return[t] + reward[t]\n",
+ " #\n",
+ " ## We compute this starting from the last timestep to the first, in order\n",
+ " ## to employ the formula presented above and avoid redundant computations that would be needed \n",
+ " ## if we were to do it from first to last.\n",
+ " \n",
+ " ## Hence, the queue \"returns\" will hold the returns in chronological order, from t=0 to t=n_steps\n",
+ " ## thanks to the appendleft() function which allows to append to the position 0 in constant time O(1)\n",
+ " ## a normal python list would instead require O(N) to do this.\"\"\"\n",
+ " disc_return_t = 0\n",
+ " for t in range(n_steps)[::-1]:\n",
+ " returns.appendleft(disc_return_t * gamma + rewards[t]) \n",
+ " disc_return_t = returns[0]\n",
+ " \n",
+ " ## standardization of the returns is employed to make training more stable\n",
+ " eps = np.finfo(np.float32).eps.item()\n",
+ " \n",
+ " ## eps is the smallest representable float, which is \n",
+ " # added to the standard deviation of the returns to avoid numerical instabilities\n",
+ " returns = torch.tensor(returns)\n",
+ " returns = (returns - returns.mean()) / (returns.std() + eps)\n",
+ " \n",
+ " # ========= Line 7=========\n",
+ " policy_loss = []\n",
+ " for log_prob, disc_return in zip(saved_log_probs, returns):\n",
+ " policy_loss.append(-log_prob * disc_return)\n",
+ " policy_loss = torch.cat(policy_loss).sum()\n",
+ " \n",
+ " # ========= Line 8: PyTorch prefers gradient descent =========\n",
+ " optimizer.zero_grad()\n",
+ " policy_loss.backward()\n",
+ " optimizer.step()\n",
+ " \n",
+ " mean = np.mean(scores)\n",
+ " if i_episode % print_every == 0:\n",
+ " print(f'Episode {i_episode} Average Score: {mean:.2f}. Move count {Counter(actions)}')\n",
+ "\n",
+ " if last_max >= mean:\n",
+ " pacience += 1\n",
+ " if max_pacience is not None and pacience >= max_pacience:\n",
+ " print(' - Breaking at Episode {}\\t with average Score: {:.2f} for max pacience {:.2f}'.format(i_episode, mean, last_max))\n",
+ " break\n",
+ " else:\n",
+ " last_max, pacience = mean, 0\n",
+ " \n",
+ " return list(scores)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "env_id = \"Pixelcopter-PLE-v0\"\n",
"pixelcopter_hyperparameters = {\n",
" \"h_size\": 64,\n",
- " \"n_training_episodes\": 50000,\n",
+ " \"n_training_episodes\": 100_000,\n",
" \"n_evaluation_episodes\": 10,\n",
" \"max_t\": 10000,\n",
" \"gamma\": 0.99,\n",
@@ -1466,74 +2035,135 @@
" \"env_id\": env_id,\n",
" \"state_space\": s_size,\n",
" \"action_space\": a_size,\n",
- "}"
+ "}\n",
+ "\n",
+ "device = torch.device(\"cuda\")\n",
+ "# Create policy and place it to the device\n",
+ "# torch.manual_seed(50)\n",
+ "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n",
+ "pixelcopter_optimizer = optim.Adam(pixelcopter_policy.parameters(), lr=pixelcopter_hyperparameters[\"lr\"])\n",
+ "\n",
+ "env = Pixelcopter()\n",
+ "game_p = PLE(env, fps=30, display_screen=True)\n",
+ "\n",
+ "# scores = reinforce_pygame(\n",
+ "# game_p,\n",
+ "# pixelcopter_policy,\n",
+ "# pixelcopter_optimizer,\n",
+ "# pixelcopter_hyperparameters[\"n_training_episodes\"], \n",
+ "# pixelcopter_hyperparameters[\"max_t\"],\n",
+ "# pixelcopter_hyperparameters[\"gamma\"], \n",
+ "# 100,\n",
+ "# )"
]
},
{
"cell_type": "markdown",
- "source": [
- "### Train it\n",
- "- We're now ready to train our agent 🔥."
- ],
"metadata": {
- "id": "wyvXTJWm9GJG"
- }
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "7mM2P_ckysFE"
+ "id": "8kwFQ-Ip85BE"
},
- "outputs": [],
"source": [
- "# Create policy and place it to the device\n",
- "# torch.manual_seed(50)\n",
- "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n",
- "pixelcopter_optimizer = optim.Adam(pixelcopter_policy.parameters(), lr=pixelcopter_hyperparameters[\"lr\"])"
+ "### Publish our trained model on the Hub 🔥"
]
},
{
"cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "v1HEqP-fy-Rf"
- },
+ "execution_count": 363,
+ "metadata": {},
"outputs": [],
"source": [
- "scores = reinforce(pixelcopter_policy,\n",
- " pixelcopter_optimizer,\n",
- " pixelcopter_hyperparameters[\"n_training_episodes\"], \n",
- " pixelcopter_hyperparameters[\"max_t\"],\n",
- " pixelcopter_hyperparameters[\"gamma\"], \n",
- " 1000)"
+ "torch.save(pixelcopter_policy.state_dict(), \"./saved_model.pth\")"
]
},
{
- "cell_type": "markdown",
- "source": [
- "### Publish our trained model on the Hub 🔥"
+ "cell_type": "code",
+ "execution_count": 39,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 39,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
],
- "metadata": {
- "id": "8kwFQ-Ip85BE"
- }
+ "source": [
+ "pixelcopter_hyperparameters = {\n",
+ " \"h_size\": 64,\n",
+ " \"n_training_episodes\": 100_000,\n",
+ " \"n_evaluation_episodes\": 10,\n",
+ " \"max_t\": 10000,\n",
+ " \"gamma\": 0.99,\n",
+ " \"lr\": 1e-4,\n",
+ " \"env_id\": env_id,\n",
+ " \"state_space\": s_size,\n",
+ " \"action_space\": a_size,\n",
+ "}\n",
+ "\n",
+ "# Create policy and place it to the device\n",
+ "# torch.manual_seed(50)\n",
+ "pixelcopter_policy = Policy(pixelcopter_hyperparameters[\"state_space\"], pixelcopter_hyperparameters[\"action_space\"], pixelcopter_hyperparameters[\"h_size\"]).to(device)\n",
+ "\n",
+ "pixelcopter_policy.load_state_dict(torch.load(\"./saved_model.pth\"))\n"
+ ]
},
{
"cell_type": "code",
- "source": [
- "repo_id = \"\" #TODO Define your repo id {username/Reinforce-{model-id}}\n",
- "push_to_hub(repo_id,\n",
- " pixelcopter_policy, # The model we want to save\n",
- " pixelcopter_hyperparameters, # Hyperparameters\n",
- " eval_env, # Evaluation environment\n",
- " video_fps=30\n",
- " )"
- ],
+ "execution_count": 40,
"metadata": {
"id": "6PtB7LRbTKWK"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Episode: 0.0000, mean reward: 18.0000\n",
+ "VIDEO\n",
+ " - Teminated video loop, mimsave...\n",
+ "PUSH\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B \n",
+ "\u001b[A\n",
+ "Processing Files (1 / 1) : 100%|██████████| 40.3kB / 40.3kB, ???B/s \n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "Processing Files (1 / 1) : 100%|██████████| 40.3kB / 40.3kB, 0.00B/s \n",
+ "New Data Upload : | | 0.00B / 0.00B, 0.00B/s \n",
+ " /tmp/tmpec45byqc/model.pt : 100%|██████████| 40.3kB / 40.3kB \n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Your model is pushed to the Hub. You can view your model here: https://huggingface.co/turbo-maikol/Reinforce-rl-course-unit4-pixelcopter\n"
+ ]
+ }
+ ],
+ "source": [
+ "repo_id = \"turbo-maikol/Reinforce-rl-course-unit4-pixelcopter\" #TODO Define your repo id {username/Reinforce-{model-id}}\n",
+ "\n",
+ "env = Pixelcopter()\n",
+ "game_p = PLE(env, fps=30, display_screen=True)\n",
+ "push_to_hub_pygame(\n",
+ " repo_id,\n",
+ " pixelcopter_policy, # The model we want to save\n",
+ " pixelcopter_hyperparameters, # Hyperparameters\n",
+ " env, # Evaluation environment\n",
+ " game_p,\n",
+ " video_fps=30,\n",
+ ")"
+ ]
},
{
"cell_type": "markdown",
@@ -1585,8 +2215,6 @@
"metadata": {
"accelerator": "GPU",
"colab": {
- "private_outputs": true,
- "provenance": [],
"collapsed_sections": [
"BPLwsPajb1f8",
"L_WSo0VUV99t",
@@ -1597,11 +2225,13 @@
"47iuAFqV8Ws-",
"x62pP0PHdA-y"
],
- "include_colab_link": true
+ "include_colab_link": true,
+ "private_outputs": true,
+ "provenance": []
},
"gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": ".venv",
"language": "python",
"name": "python3"
},
@@ -1615,7 +2245,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.10"
+ "version": "3.10.18"
}
},
"nbformat": 4,
diff --git a/notebooks/unit5/unit5.ipynb b/notebooks/unit5/unit5.ipynb
index cb9ec8b..580dffc 100644
--- a/notebooks/unit5/unit5.ipynb
+++ b/notebooks/unit5/unit5.ipynb
@@ -277,7 +277,10 @@
"# Go inside the repository and install the package (can take 3min)\n",
"%cd ml-agents\n",
"!pip3 install -e ./ml-agents-envs\n",
- "!pip3 install -e ./ml-agents"
+ "!pip3 install -e ./ml-agents\n",
+ "\n",
+ "uv pip install -e ./ml-agents/ml-agents-envs\n",
+ "uv pip install -e ./ml-agents/ml-agents\n"
]
},
{
@@ -584,7 +587,7 @@
},
"outputs": [],
"source": [
- "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message"
+ "!mlagents-push-to-hf --run-id=\"SnowballTarget1\" --local-dir=\"./results/SnowballTarget1\" --repo-id=\"turbo-maikol/rl-course-unit5-snowball\" --commit-message=\"First Push\""
]
},
{
@@ -796,7 +799,9 @@
},
"outputs": [],
"source": [
- "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message"
+ "!mlagents-push-to-hf --run-id= # Add your run id --local-dir= # Your local dir --repo-id= # Your repo id --commit-message= # Your commit message\n",
+ "\n",
+ "mlagents-push-to-hf --run-id=\"Pyramids Training\" --local-dir=\"./results/Pyramids Training\" --repo-id=\"turbo-maikol/rl-course-unit5-pyramids\" --commit-message=\"First Push\"\n"
]
},
{
diff --git a/notebooks/unit6/unit6.ipynb b/notebooks/unit6/unit6.ipynb
index e5d0081..2584c5a 100644
--- a/notebooks/unit6/unit6.ipynb
+++ b/notebooks/unit6/unit6.ipynb
@@ -1,31 +1,10 @@
{
- "nbformat": 4,
- "nbformat_minor": 0,
- "metadata": {
- "colab": {
- "provenance": [],
- "private_outputs": true,
- "collapsed_sections": [
- "tF42HvI7-gs5"
- ],
- "include_colab_link": true
- },
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3"
- },
- "language_info": {
- "name": "python"
- },
- "accelerator": "GPU",
- "gpuClass": "standard"
- },
"cells": [
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -33,6 +12,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "-PTReiOw-RAN"
+ },
"source": [
"# Unit 6: Advantage Actor Critic (A2C) using Robotics Simulations with Panda-Gym 🤖\n",
"\n",
@@ -43,37 +25,37 @@
"- `Reach`: the robot must place its end-effector at a target position.\n",
"\n",
"After that, you'll be able **to train in other robotics tasks**.\n"
- ],
- "metadata": {
- "id": "-PTReiOw-RAN"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "QInFitfWno1Q"
+ },
"source": [
"### 🎮 Environments:\n",
"\n",
"- [Panda-Gym](https://github.com/qgallouedec/panda-gym)\n",
"\n",
- "###📚 RL-Library:\n",
+ "### 📚 RL-Library:\n",
"\n",
"- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/)"
- ],
- "metadata": {
- "id": "QInFitfWno1Q"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
- ],
"metadata": {
"id": "2CcdX4g3oFlp"
- }
+ },
+ "source": [
+ "We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the GitHub Repo](https://github.com/huggingface/deep-rl-class/issues)."
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "MoubJX20oKaQ"
+ },
"source": [
"## Objectives of this notebook 🏆\n",
"\n",
@@ -85,13 +67,13 @@
"- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n",
"\n",
"\n"
- ],
- "metadata": {
- "id": "MoubJX20oKaQ"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "DoUNkTExoUED"
+ },
"source": [
"## This notebook is from the Deep Reinforcement Learning Course\n",
"
\n",
@@ -108,34 +90,34 @@
"\n",
"\n",
"The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
- ],
- "metadata": {
- "id": "DoUNkTExoUED"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "BTuQAUAPoa5E"
+ },
"source": [
"## Prerequisites 🏗️\n",
"Before diving into the notebook, you need to:\n",
"\n",
"🔲 📚 Study [Actor-Critic methods by reading Unit 6](https://huggingface.co/deep-rl-course/unit6/introduction) 🤗 "
- ],
- "metadata": {
- "id": "BTuQAUAPoa5E"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "# Let's train our first robots 🤖"
- ],
"metadata": {
"id": "iajHvVDWoo01"
- }
+ },
+ "source": [
+ "# Let's train our first robots 🤖"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "zbOENTE2os_D"
+ },
"source": [
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push your trained model to the Hub and get the following results:\n",
"\n",
@@ -144,46 +126,43 @@
"To find your result, go to the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) and find your model, **the result = mean_reward - std of reward**\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
- ],
- "metadata": {
- "id": "zbOENTE2os_D"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PU4FVzaoM6fC"
+ },
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "PU4FVzaoM6fC"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "KV0NyFdQM9ZG"
+ },
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "KV0NyFdQM9ZG"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "bTpYcVZVMzUI"
+ },
"source": [
"## Create a virtual display 🔽\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).\n",
"\n",
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
- ],
- "metadata": {
- "id": "bTpYcVZVMzUI"
- }
+ ]
},
{
"cell_type": "code",
@@ -202,21 +181,24 @@
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ww5PQH1gNLI4"
+ },
+ "outputs": [],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
- ],
- "metadata": {
- "id": "ww5PQH1gNLI4"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "e1obkbdJ_KnG"
+ },
"source": [
"### Install dependencies 🔽\n",
"\n",
@@ -228,48 +210,60 @@
"- `huggingface_hub`: Library allowing anyone to work with the Hub repositories.\n",
"\n",
"⏲ The installation can **take 10 minutes**."
- ],
- "metadata": {
- "id": "e1obkbdJ_KnG"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "!pip install stable-baselines3[extra]\n",
- "!pip install gymnasium"
- ],
+ "execution_count": null,
"metadata": {
"id": "TgZUkjKYSgvn"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!pip install stable-baselines3[extra]\n",
+ "!pip install gymnasium"
+ ]
},
{
"cell_type": "code",
- "source": [
- "!pip install huggingface_sb3\n",
- "!pip install huggingface_hub\n",
- "!pip install panda_gym"
- ],
+ "execution_count": null,
"metadata": {
"id": "ABneW6tOSpyU"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "uv pip install stable-baselines3[extra] gymnasium huggingface_sb3 huggingface_hub panda_gym"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "## Import the packages 📦"
- ],
"metadata": {
"id": "QTep3PQQABLr"
- }
+ },
+ "source": [
+ "## Import the packages 📦"
+ ]
},
{
"cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "HpiB8VdnQ7Bk"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+ " from .autonotebook import tqdm as notebook_tqdm\n"
+ ]
+ }
+ ],
"source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2\n",
+ "\n",
"import os\n",
"\n",
"import gymnasium as gym\n",
@@ -283,15 +277,13 @@
"from stable_baselines3.common.env_util import make_vec_env\n",
"\n",
"from huggingface_hub import notebook_login"
- ],
- "metadata": {
- "id": "HpiB8VdnQ7Bk"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "lfBwIS_oAVXI"
+ },
"source": [
"## PandaReachDense-v3 🦾\n",
"\n",
@@ -310,26 +302,38 @@
"\n",
"This way **the training will be easier**.\n",
"\n"
- ],
- "metadata": {
- "id": "lfBwIS_oAVXI"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "frVXOrnlBerQ"
+ },
"source": [
"### Create the environment\n",
"\n",
"#### The environment 🎮\n",
"\n",
"In `PandaReachDense-v3` the robotic arm must place its end-effector at a target position (green ball)."
- ],
- "metadata": {
- "id": "frVXOrnlBerQ"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "id": "zXzAu3HYF1WD"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n"
+ ]
+ }
+ ],
"source": [
"env_id = \"PandaReachDense-v3\"\n",
"\n",
@@ -339,28 +343,47 @@
"# Get the state space and action space\n",
"s_size = env.observation_space.shape\n",
"a_size = env.action_space"
- ],
- "metadata": {
- "id": "zXzAu3HYF1WD"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
"source": [
- "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
- "print(\"The State Space is: \", s_size)\n",
- "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
- ],
+ "s_size = env.observation_space.sample()[\"observation\"].shape[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
"metadata": {
"id": "E-U9dexcF-FB"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "_____OBSERVATION SPACE_____ \n",
+ "\n",
+ "The State Space is: 6\n",
+ "Sample observation OrderedDict([('achieved_goal', array([-5.6249957, 3.2377138, 9.631121 ], dtype=float32)), ('desired_goal', array([-5.9595466, 4.739131 , -3.3849702], dtype=float32)), ('observation', array([-3.4746149 , -1.6921669 , -9.1196995 , 1.4088092 , 0.84349155,\n",
+ " -9.425635 ], dtype=float32))])\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"_____OBSERVATION SPACE_____ \\n\")\n",
+ "print(\"The State Space is: \", s_size)\n",
+ "print(\"Sample observation\", env.observation_space.sample()) # Get a random observation"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "g_JClfElGFnF"
+ },
"source": [
"The observation space **is a dictionary with 3 different elements**:\n",
"- `achieved_goal`: (x,y,z) the current position of the end-effector.\n",
@@ -368,45 +391,57 @@
"- `observation`: position (x,y,z) and velocity of the end-effector (vx, vy, vz).\n",
"\n",
"Given it's a dictionary as observation, **we will need to use a MultiInputPolicy policy instead of MlpPolicy**."
- ],
- "metadata": {
- "id": "g_JClfElGFnF"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "id": "ib1Kxy4AF-FC"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " _____ACTION SPACE_____ \n",
+ "\n",
+ "The Action Space is: Box(-1.0, 1.0, (3,), float32)\n",
+ "Action Space Sample [-0.28385562 -0.9789819 -0.80975497]\n"
+ ]
+ }
+ ],
"source": [
"print(\"\\n _____ACTION SPACE_____ \\n\")\n",
"print(\"The Action Space is: \", a_size)\n",
"print(\"Action Space Sample\", env.action_space.sample()) # Take a random action"
- ],
- "metadata": {
- "id": "ib1Kxy4AF-FC"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "5MHTHEHZS4yp"
+ },
"source": [
"The action space is a vector with 3 values:\n",
"- Control x, y, z movement"
- ],
- "metadata": {
- "id": "5MHTHEHZS4yp"
- }
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "### Normalize observation and rewards"
- ],
"metadata": {
"id": "S5sXcg469ysB"
- }
+ },
+ "source": [
+ "### Normalize observation and rewards"
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "1ZyX6qf3Zva9"
+ },
"source": [
"A good practice in reinforcement learning is to [normalize input features](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html).\n",
"\n",
@@ -415,140 +450,9205 @@
"We also normalize rewards with this same wrapper by adding `norm_reward = True`\n",
"\n",
"[You should check the documentation to fill this cell](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize)"
- ],
- "metadata": {
- "id": "1ZyX6qf3Zva9"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "id": "1RsDtHHAQ9Ie"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n"
+ ]
+ }
+ ],
"source": [
"env = make_vec_env(env_id, n_envs=4)\n",
"\n",
"# Adding this wrapper to normalize the observation and the reward\n",
- "env = # TODO: Add the wrapper"
- ],
- "metadata": {
- "id": "1RsDtHHAQ9Ie"
- },
- "execution_count": null,
- "outputs": []
+ "env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10)# TODO: Add the wrapper"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "#### Solution"
- ],
"metadata": {
"id": "tF42HvI7-gs5"
- }
+ },
+ "source": [
+ "#### Solution"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2O67mqgC-hol"
+ },
+ "outputs": [],
"source": [
"env = make_vec_env(env_id, n_envs=4)\n",
"\n",
"env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)"
- ],
- "metadata": {
- "id": "2O67mqgC-hol"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "4JmEVU6z1ZA-"
+ },
"source": [
"### Create the A2C Model 🤖\n",
"\n",
"For more information about A2C implementation with StableBaselines3 check: https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html#notes\n",
"\n",
"To find the best parameters I checked the [official trained agents by Stable-Baselines3 team](https://huggingface.co/sb3)."
- ],
- "metadata": {
- "id": "4JmEVU6z1ZA-"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "model = # Create the A2C model and try to find the best parameters"
- ],
+ "execution_count": 26,
"metadata": {
"id": "vR3T4qFt164I"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using cuda device\n"
+ ]
+ }
+ ],
+ "source": [
+ "model = A2C(\"MultiInputPolicy\", env, verbose=1) # Create the A2C model and try to find the best parameters"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "#### Solution"
- ],
"metadata": {
"id": "nWAuOOLh-oQf"
- }
+ },
+ "source": [
+ "#### Solution"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "FKFLY54T-pU1"
+ },
+ "outputs": [],
"source": [
"model = A2C(policy = \"MultiInputPolicy\",\n",
" env = env,\n",
" verbose=1)"
- ],
- "metadata": {
- "id": "FKFLY54T-pU1"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "opyK3mpJ1-m9"
+ },
"source": [
"### Train the A2C agent 🏃\n",
"- Let's train our agent for 1,000,000 timesteps, don't forget to use GPU on Colab. It will take approximately ~25-40min"
- ],
- "metadata": {
- "id": "opyK3mpJ1-m9"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "model.learn(1_000_000)"
- ],
+ "execution_count": 27,
"metadata": {
"id": "4TuGHZD7RF1G"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 44.5 |\n",
+ "| ep_rew_mean | -12.4 |\n",
+ "| time/ | |\n",
+ "| fps | 313 |\n",
+ "| iterations | 100 |\n",
+ "| time_elapsed | 6 |\n",
+ "| total_timesteps | 2000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.22 |\n",
+ "| explained_variance | 0.9545538 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 99 |\n",
+ "| policy_loss | -0.349 |\n",
+ "| std | 0.988 |\n",
+ "| value_loss | 0.322 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 45.4 |\n",
+ "| ep_rew_mean | -13 |\n",
+ "| time/ | |\n",
+ "| fps | 316 |\n",
+ "| iterations | 200 |\n",
+ "| time_elapsed | 12 |\n",
+ "| total_timesteps | 4000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.97950953 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 199 |\n",
+ "| policy_loss | -1.26 |\n",
+ "| std | 0.998 |\n",
+ "| value_loss | 0.118 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 44.8 |\n",
+ "| ep_rew_mean | -13.4 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 300 |\n",
+ "| time_elapsed | 18 |\n",
+ "| total_timesteps | 6000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.9330401 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 299 |\n",
+ "| policy_loss | 0.0686 |\n",
+ "| std | 0.998 |\n",
+ "| value_loss | 0.38 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 43.2 |\n",
+ "| ep_rew_mean | -13.1 |\n",
+ "| time/ | |\n",
+ "| fps | 284 |\n",
+ "| iterations | 400 |\n",
+ "| time_elapsed | 28 |\n",
+ "| total_timesteps | 8000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.521664 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 399 |\n",
+ "| policy_loss | 0.192 |\n",
+ "| std | 0.999 |\n",
+ "| value_loss | 0.0753 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 43.8 |\n",
+ "| ep_rew_mean | -12.6 |\n",
+ "| time/ | |\n",
+ "| fps | 295 |\n",
+ "| iterations | 500 |\n",
+ "| time_elapsed | 33 |\n",
+ "| total_timesteps | 10000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.26 |\n",
+ "| explained_variance | 0.9645154 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 499 |\n",
+ "| policy_loss | 0.284 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0268 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 44.8 |\n",
+ "| ep_rew_mean | -12.5 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 600 |\n",
+ "| time_elapsed | 40 |\n",
+ "| total_timesteps | 12000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.27 |\n",
+ "| explained_variance | 0.9733915 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 599 |\n",
+ "| policy_loss | -0.31 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0512 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 43.1 |\n",
+ "| ep_rew_mean | -11.9 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 700 |\n",
+ "| time_elapsed | 46 |\n",
+ "| total_timesteps | 14000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.27 |\n",
+ "| explained_variance | 0.9599183 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 699 |\n",
+ "| policy_loss | -0.199 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0319 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 38.5 |\n",
+ "| ep_rew_mean | -9.69 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 800 |\n",
+ "| time_elapsed | 52 |\n",
+ "| total_timesteps | 16000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.26 |\n",
+ "| explained_variance | 0.9946942 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 799 |\n",
+ "| policy_loss | -0.0755 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0182 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 36.8 |\n",
+ "| ep_rew_mean | -8.98 |\n",
+ "| time/ | |\n",
+ "| fps | 289 |\n",
+ "| iterations | 900 |\n",
+ "| time_elapsed | 62 |\n",
+ "| total_timesteps | 18000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.29 |\n",
+ "| explained_variance | 0.9358697 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 899 |\n",
+ "| policy_loss | 0.139 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0267 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 39 |\n",
+ "| ep_rew_mean | -9.08 |\n",
+ "| time/ | |\n",
+ "| fps | 294 |\n",
+ "| iterations | 1000 |\n",
+ "| time_elapsed | 68 |\n",
+ "| total_timesteps | 20000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.3 |\n",
+ "| explained_variance | 0.6885923 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 999 |\n",
+ "| policy_loss | 0.508 |\n",
+ "| std | 1.02 |\n",
+ "| value_loss | 0.108 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 37 |\n",
+ "| ep_rew_mean | -8.26 |\n",
+ "| time/ | |\n",
+ "| fps | 295 |\n",
+ "| iterations | 1100 |\n",
+ "| time_elapsed | 74 |\n",
+ "| total_timesteps | 22000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.32 |\n",
+ "| explained_variance | 0.97540057 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1099 |\n",
+ "| policy_loss | -0.747 |\n",
+ "| std | 1.02 |\n",
+ "| value_loss | 0.0566 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 38.1 |\n",
+ "| ep_rew_mean | -8.9 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 1200 |\n",
+ "| time_elapsed | 80 |\n",
+ "| total_timesteps | 24000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.32 |\n",
+ "| explained_variance | 0.9357179 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1199 |\n",
+ "| policy_loss | -0.0341 |\n",
+ "| std | 1.02 |\n",
+ "| value_loss | 0.0184 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 40.9 |\n",
+ "| ep_rew_mean | -10.1 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 1300 |\n",
+ "| time_elapsed | 85 |\n",
+ "| total_timesteps | 26000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.9974439 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1299 |\n",
+ "| policy_loss | -0.35 |\n",
+ "| std | 1.03 |\n",
+ "| value_loss | 0.0159 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 39.5 |\n",
+ "| ep_rew_mean | -9.84 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 1400 |\n",
+ "| time_elapsed | 91 |\n",
+ "| total_timesteps | 28000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.99562174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1399 |\n",
+ "| policy_loss | -0.294 |\n",
+ "| std | 1.03 |\n",
+ "| value_loss | 0.0101 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 37.8 |\n",
+ "| ep_rew_mean | -8.64 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 1500 |\n",
+ "| time_elapsed | 100 |\n",
+ "| total_timesteps | 30000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.34 |\n",
+ "| explained_variance | 0.9823143 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1499 |\n",
+ "| policy_loss | 0.104 |\n",
+ "| std | 1.03 |\n",
+ "| value_loss | 0.0163 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 33.4 |\n",
+ "| ep_rew_mean | -6.76 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 1600 |\n",
+ "| time_elapsed | 105 |\n",
+ "| total_timesteps | 32000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.34 |\n",
+ "| explained_variance | 0.68713135 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1599 |\n",
+ "| policy_loss | -0.544 |\n",
+ "| std | 1.03 |\n",
+ "| value_loss | 0.0491 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 32.2 |\n",
+ "| ep_rew_mean | -5.76 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 1700 |\n",
+ "| time_elapsed | 110 |\n",
+ "| total_timesteps | 34000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.8737136 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1699 |\n",
+ "| policy_loss | -0.729 |\n",
+ "| std | 1.02 |\n",
+ "| value_loss | 0.0799 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 22.7 |\n",
+ "| ep_rew_mean | -3.62 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 1800 |\n",
+ "| time_elapsed | 116 |\n",
+ "| total_timesteps | 36000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.9746877 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1799 |\n",
+ "| policy_loss | -0.063 |\n",
+ "| std | 1.03 |\n",
+ "| value_loss | 0.00838 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 20.6 |\n",
+ "| ep_rew_mean | -2.93 |\n",
+ "| time/ | |\n",
+ "| fps | 309 |\n",
+ "| iterations | 1900 |\n",
+ "| time_elapsed | 122 |\n",
+ "| total_timesteps | 38000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.32 |\n",
+ "| explained_variance | 0.87348247 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1899 |\n",
+ "| policy_loss | -0.449 |\n",
+ "| std | 1.02 |\n",
+ "| value_loss | 0.0365 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 18.6 |\n",
+ "| ep_rew_mean | -2.43 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 2000 |\n",
+ "| time_elapsed | 132 |\n",
+ "| total_timesteps | 40000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.3 |\n",
+ "| explained_variance | 0.9903151 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1999 |\n",
+ "| policy_loss | -0.0678 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.00914 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 14.8 |\n",
+ "| ep_rew_mean | -1.72 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 2100 |\n",
+ "| time_elapsed | 139 |\n",
+ "| total_timesteps | 42000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.28 |\n",
+ "| explained_variance | -1.61925 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2099 |\n",
+ "| policy_loss | 1.93 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.395 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 8.37 |\n",
+ "| ep_rew_mean | -0.83 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 2200 |\n",
+ "| time_elapsed | 145 |\n",
+ "| total_timesteps | 44000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.24723881 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2199 |\n",
+ "| policy_loss | -0.415 |\n",
+ "| std | 0.998 |\n",
+ "| value_loss | 0.0231 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 6.63 |\n",
+ "| ep_rew_mean | -0.601 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 2300 |\n",
+ "| time_elapsed | 151 |\n",
+ "| total_timesteps | 46000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.18 |\n",
+ "| explained_variance | 0.5324587 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2299 |\n",
+ "| policy_loss | 0.143 |\n",
+ "| std | 0.977 |\n",
+ "| value_loss | 0.00811 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 4.89 |\n",
+ "| ep_rew_mean | -0.413 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 2400 |\n",
+ "| time_elapsed | 157 |\n",
+ "| total_timesteps | 48000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.14 |\n",
+ "| explained_variance | -0.40441775 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2399 |\n",
+ "| policy_loss | 0.153 |\n",
+ "| std | 0.962 |\n",
+ "| value_loss | 0.00267 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.79 |\n",
+ "| ep_rew_mean | -0.294 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 2500 |\n",
+ "| time_elapsed | 168 |\n",
+ "| total_timesteps | 50000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.11 |\n",
+ "| explained_variance | -0.5201168 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2499 |\n",
+ "| policy_loss | 0.119 |\n",
+ "| std | 0.955 |\n",
+ "| value_loss | 0.0067 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.5 |\n",
+ "| ep_rew_mean | -0.273 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 2600 |\n",
+ "| time_elapsed | 175 |\n",
+ "| total_timesteps | 52000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.06 |\n",
+ "| explained_variance | 0.44135898 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2599 |\n",
+ "| policy_loss | -0.151 |\n",
+ "| std | 0.938 |\n",
+ "| value_loss | 0.00198 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.36 |\n",
+ "| ep_rew_mean | -0.265 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 2700 |\n",
+ "| time_elapsed | 181 |\n",
+ "| total_timesteps | 54000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.22025794 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2699 |\n",
+ "| policy_loss | 0.0714 |\n",
+ "| std | 0.931 |\n",
+ "| value_loss | 0.00233 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.41 |\n",
+ "| ep_rew_mean | -0.266 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 2800 |\n",
+ "| time_elapsed | 188 |\n",
+ "| total_timesteps | 56000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.99 |\n",
+ "| explained_variance | 0.51552105 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2799 |\n",
+ "| policy_loss | -0.101 |\n",
+ "| std | 0.917 |\n",
+ "| value_loss | 0.0015 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.39 |\n",
+ "| ep_rew_mean | -0.275 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 2900 |\n",
+ "| time_elapsed | 195 |\n",
+ "| total_timesteps | 58000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.97 |\n",
+ "| explained_variance | 0.455292 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2899 |\n",
+ "| policy_loss | -0.0549 |\n",
+ "| std | 0.908 |\n",
+ "| value_loss | 0.0009 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.07 |\n",
+ "| ep_rew_mean | -0.243 |\n",
+ "| time/ | |\n",
+ "| fps | 292 |\n",
+ "| iterations | 3000 |\n",
+ "| time_elapsed | 205 |\n",
+ "| total_timesteps | 60000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.94 |\n",
+ "| explained_variance | 0.7100135 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2999 |\n",
+ "| policy_loss | 0.0327 |\n",
+ "| std | 0.9 |\n",
+ "| value_loss | 0.000537 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.12 |\n",
+ "| ep_rew_mean | -0.249 |\n",
+ "| time/ | |\n",
+ "| fps | 293 |\n",
+ "| iterations | 3100 |\n",
+ "| time_elapsed | 210 |\n",
+ "| total_timesteps | 62000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.9 |\n",
+ "| explained_variance | 0.5587412 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3099 |\n",
+ "| policy_loss | -0.0118 |\n",
+ "| std | 0.889 |\n",
+ "| value_loss | 0.000493 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.08 |\n",
+ "| ep_rew_mean | -0.247 |\n",
+ "| time/ | |\n",
+ "| fps | 295 |\n",
+ "| iterations | 3200 |\n",
+ "| time_elapsed | 216 |\n",
+ "| total_timesteps | 64000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.87 |\n",
+ "| explained_variance | 0.12363589 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3199 |\n",
+ "| policy_loss | 0.0663 |\n",
+ "| std | 0.878 |\n",
+ "| value_loss | 0.0016 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.44 |\n",
+ "| ep_rew_mean | -0.273 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 3300 |\n",
+ "| time_elapsed | 222 |\n",
+ "| total_timesteps | 66000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.84 |\n",
+ "| explained_variance | -0.74497736 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3299 |\n",
+ "| policy_loss | 0.111 |\n",
+ "| std | 0.873 |\n",
+ "| value_loss | 0.00291 |\n",
+ "---------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.25 |\n",
+ "| ep_rew_mean | -0.265 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 3400 |\n",
+ "| time_elapsed | 228 |\n",
+ "| total_timesteps | 68000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.8 |\n",
+ "| explained_variance | -0.30396366 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3399 |\n",
+ "| policy_loss | 0.216 |\n",
+ "| std | 0.86 |\n",
+ "| value_loss | 0.00472 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.13 |\n",
+ "| ep_rew_mean | -0.247 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 3500 |\n",
+ "| time_elapsed | 234 |\n",
+ "| total_timesteps | 70000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.78 |\n",
+ "| explained_variance | 0.67658997 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3499 |\n",
+ "| policy_loss | -0.0138 |\n",
+ "| std | 0.854 |\n",
+ "| value_loss | 0.00042 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.33 |\n",
+ "| ep_rew_mean | -0.263 |\n",
+ "| time/ | |\n",
+ "| fps | 294 |\n",
+ "| iterations | 3600 |\n",
+ "| time_elapsed | 244 |\n",
+ "| total_timesteps | 72000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.74 |\n",
+ "| explained_variance | -0.9163362 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3599 |\n",
+ "| policy_loss | 0.134 |\n",
+ "| std | 0.842 |\n",
+ "| value_loss | 0.00447 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.232 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 3700 |\n",
+ "| time_elapsed | 249 |\n",
+ "| total_timesteps | 74000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.7 |\n",
+ "| explained_variance | -0.2427007 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3699 |\n",
+ "| policy_loss | -0.0458 |\n",
+ "| std | 0.832 |\n",
+ "| value_loss | 0.000685 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 3800 |\n",
+ "| time_elapsed | 255 |\n",
+ "| total_timesteps | 76000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.69 |\n",
+ "| explained_variance | 0.70643455 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3799 |\n",
+ "| policy_loss | 0.087 |\n",
+ "| std | 0.827 |\n",
+ "| value_loss | 0.00111 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.92 |\n",
+ "| ep_rew_mean | -0.237 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 3900 |\n",
+ "| time_elapsed | 260 |\n",
+ "| total_timesteps | 78000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.64 |\n",
+ "| explained_variance | 0.3595901 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3899 |\n",
+ "| policy_loss | -0.207 |\n",
+ "| std | 0.815 |\n",
+ "| value_loss | 0.00451 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.01 |\n",
+ "| ep_rew_mean | -0.232 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 4000 |\n",
+ "| time_elapsed | 266 |\n",
+ "| total_timesteps | 80000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.62 |\n",
+ "| explained_variance | -0.26341498 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3999 |\n",
+ "| policy_loss | 0.0463 |\n",
+ "| std | 0.807 |\n",
+ "| value_loss | 0.00229 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 4100 |\n",
+ "| time_elapsed | 275 |\n",
+ "| total_timesteps | 82000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.6 |\n",
+ "| explained_variance | 0.66822636 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4099 |\n",
+ "| policy_loss | -0.00514 |\n",
+ "| std | 0.804 |\n",
+ "| value_loss | 0.000157 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.03 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 4200 |\n",
+ "| time_elapsed | 282 |\n",
+ "| total_timesteps | 84000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.56 |\n",
+ "| explained_variance | 0.62520474 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4199 |\n",
+ "| policy_loss | 0.0369 |\n",
+ "| std | 0.793 |\n",
+ "| value_loss | 0.00071 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.96 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 4300 |\n",
+ "| time_elapsed | 288 |\n",
+ "| total_timesteps | 86000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.53 |\n",
+ "| explained_variance | -0.7739824 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4299 |\n",
+ "| policy_loss | 0.0406 |\n",
+ "| std | 0.786 |\n",
+ "| value_loss | 0.00184 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.13 |\n",
+ "| ep_rew_mean | -0.251 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 4400 |\n",
+ "| time_elapsed | 294 |\n",
+ "| total_timesteps | 88000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.52 |\n",
+ "| explained_variance | 0.36605334 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4399 |\n",
+ "| policy_loss | -0.0104 |\n",
+ "| std | 0.784 |\n",
+ "| value_loss | 0.000911 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.94 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 4500 |\n",
+ "| time_elapsed | 300 |\n",
+ "| total_timesteps | 90000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | -1.494292 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4499 |\n",
+ "| policy_loss | 0.166 |\n",
+ "| std | 0.776 |\n",
+ "| value_loss | 0.00448 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 4600 |\n",
+ "| time_elapsed | 307 |\n",
+ "| total_timesteps | 92000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.86099774 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4599 |\n",
+ "| policy_loss | -0.00686 |\n",
+ "| std | 0.776 |\n",
+ "| value_loss | 0.000183 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.96 |\n",
+ "| ep_rew_mean | -0.236 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 4700 |\n",
+ "| time_elapsed | 317 |\n",
+ "| total_timesteps | 94000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.49 |\n",
+ "| explained_variance | 0.8097523 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4699 |\n",
+ "| policy_loss | -0.0231 |\n",
+ "| std | 0.775 |\n",
+ "| value_loss | 0.000184 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.241 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 4800 |\n",
+ "| time_elapsed | 323 |\n",
+ "| total_timesteps | 96000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.49 |\n",
+ "| explained_variance | 0.85981035 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4799 |\n",
+ "| policy_loss | 0.039 |\n",
+ "| std | 0.774 |\n",
+ "| value_loss | 0.000231 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.03 |\n",
+ "| ep_rew_mean | -0.232 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 4900 |\n",
+ "| time_elapsed | 328 |\n",
+ "| total_timesteps | 98000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.48 |\n",
+ "| explained_variance | 0.78069174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4899 |\n",
+ "| policy_loss | 0.0365 |\n",
+ "| std | 0.771 |\n",
+ "| value_loss | 0.000307 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 5000 |\n",
+ "| time_elapsed | 334 |\n",
+ "| total_timesteps | 100000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.46 |\n",
+ "| explained_variance | 0.9213156 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4999 |\n",
+ "| policy_loss | 0.0419 |\n",
+ "| std | 0.766 |\n",
+ "| value_loss | 0.000267 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.91 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 5100 |\n",
+ "| time_elapsed | 339 |\n",
+ "| total_timesteps | 102000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.43 |\n",
+ "| explained_variance | 0.39434808 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5099 |\n",
+ "| policy_loss | -0.036 |\n",
+ "| std | 0.759 |\n",
+ "| value_loss | 0.000629 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.95 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 5200 |\n",
+ "| time_elapsed | 350 |\n",
+ "| total_timesteps | 104000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.839466 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5199 |\n",
+ "| policy_loss | 0.0243 |\n",
+ "| std | 0.749 |\n",
+ "| value_loss | 0.000199 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.02 |\n",
+ "| ep_rew_mean | -0.238 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 5300 |\n",
+ "| time_elapsed | 356 |\n",
+ "| total_timesteps | 106000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | -1.5323973 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5299 |\n",
+ "| policy_loss | -0.0499 |\n",
+ "| std | 0.739 |\n",
+ "| value_loss | 0.0026 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.04 |\n",
+ "| ep_rew_mean | -0.24 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 5400 |\n",
+ "| time_elapsed | 362 |\n",
+ "| total_timesteps | 108000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | 0.73881704 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5399 |\n",
+ "| policy_loss | 0.0459 |\n",
+ "| std | 0.739 |\n",
+ "| value_loss | 0.000478 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.97 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 5500 |\n",
+ "| time_elapsed | 368 |\n",
+ "| total_timesteps | 110000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.32 |\n",
+ "| explained_variance | 0.8745833 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5499 |\n",
+ "| policy_loss | 0.0212 |\n",
+ "| std | 0.732 |\n",
+ "| value_loss | 0.000189 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.91 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 5600 |\n",
+ "| time_elapsed | 373 |\n",
+ "| total_timesteps | 112000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.31 |\n",
+ "| explained_variance | 0.44390965 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5599 |\n",
+ "| policy_loss | 0.04 |\n",
+ "| std | 0.729 |\n",
+ "| value_loss | 0.000846 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 5700 |\n",
+ "| time_elapsed | 383 |\n",
+ "| total_timesteps | 114000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.29 |\n",
+ "| explained_variance | 0.76370406 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5699 |\n",
+ "| policy_loss | 0.042 |\n",
+ "| std | 0.726 |\n",
+ "| value_loss | 0.000353 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.97 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 5800 |\n",
+ "| time_elapsed | 389 |\n",
+ "| total_timesteps | 116000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.48743385 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5799 |\n",
+ "| policy_loss | 0.0545 |\n",
+ "| std | 0.719 |\n",
+ "| value_loss | 0.000668 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.06 |\n",
+ "| ep_rew_mean | -0.24 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 5900 |\n",
+ "| time_elapsed | 395 |\n",
+ "| total_timesteps | 118000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.24 |\n",
+ "| explained_variance | 0.48620242 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5899 |\n",
+ "| policy_loss | -0.00115 |\n",
+ "| std | 0.713 |\n",
+ "| value_loss | 0.0011 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 6000 |\n",
+ "| time_elapsed | 402 |\n",
+ "| total_timesteps | 120000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.22 |\n",
+ "| explained_variance | 0.48468244 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5999 |\n",
+ "| policy_loss | -0.0515 |\n",
+ "| std | 0.708 |\n",
+ "| value_loss | 0.000484 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.96 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 6100 |\n",
+ "| time_elapsed | 408 |\n",
+ "| total_timesteps | 122000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.2 |\n",
+ "| explained_variance | 0.36996192 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6099 |\n",
+ "| policy_loss | 0.0359 |\n",
+ "| std | 0.704 |\n",
+ "| value_loss | 0.000894 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 6200 |\n",
+ "| time_elapsed | 414 |\n",
+ "| total_timesteps | 124000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.17 |\n",
+ "| explained_variance | 0.96674925 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6199 |\n",
+ "| policy_loss | 0.0178 |\n",
+ "| std | 0.696 |\n",
+ "| value_loss | 8.14e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 6300 |\n",
+ "| time_elapsed | 424 |\n",
+ "| total_timesteps | 126000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.16 |\n",
+ "| explained_variance | 0.15048164 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6299 |\n",
+ "| policy_loss | 0.0595 |\n",
+ "| std | 0.695 |\n",
+ "| value_loss | 0.000888 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.237 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 6400 |\n",
+ "| time_elapsed | 430 |\n",
+ "| total_timesteps | 128000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.15 |\n",
+ "| explained_variance | 0.840007 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6399 |\n",
+ "| policy_loss | 0.0199 |\n",
+ "| std | 0.693 |\n",
+ "| value_loss | 0.000293 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 6500 |\n",
+ "| time_elapsed | 435 |\n",
+ "| total_timesteps | 130000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.14 |\n",
+ "| explained_variance | 0.93121815 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6499 |\n",
+ "| policy_loss | -0.00323 |\n",
+ "| std | 0.689 |\n",
+ "| value_loss | 8.66e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 6600 |\n",
+ "| time_elapsed | 441 |\n",
+ "| total_timesteps | 132000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.1 |\n",
+ "| explained_variance | 0.86104846 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6599 |\n",
+ "| policy_loss | 0.0496 |\n",
+ "| std | 0.681 |\n",
+ "| value_loss | 0.000438 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 6700 |\n",
+ "| time_elapsed | 446 |\n",
+ "| total_timesteps | 134000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.09 |\n",
+ "| explained_variance | 0.90795654 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6699 |\n",
+ "| policy_loss | 0.017 |\n",
+ "| std | 0.678 |\n",
+ "| value_loss | 0.000259 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 6800 |\n",
+ "| time_elapsed | 456 |\n",
+ "| total_timesteps | 136000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.08 |\n",
+ "| explained_variance | 0.5615423 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6799 |\n",
+ "| policy_loss | -0.0315 |\n",
+ "| std | 0.677 |\n",
+ "| value_loss | 0.000951 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.239 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 6900 |\n",
+ "| time_elapsed | 462 |\n",
+ "| total_timesteps | 138000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.07 |\n",
+ "| explained_variance | 0.53915024 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6899 |\n",
+ "| policy_loss | -0.0635 |\n",
+ "| std | 0.673 |\n",
+ "| value_loss | 0.000951 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 7000 |\n",
+ "| time_elapsed | 468 |\n",
+ "| total_timesteps | 140000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.06 |\n",
+ "| explained_variance | 0.89674634 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6999 |\n",
+ "| policy_loss | -0.025 |\n",
+ "| std | 0.672 |\n",
+ "| value_loss | 0.000152 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 7100 |\n",
+ "| time_elapsed | 474 |\n",
+ "| total_timesteps | 142000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.03 |\n",
+ "| explained_variance | 0.78443396 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7099 |\n",
+ "| policy_loss | 0.0301 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 0.00029 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 7200 |\n",
+ "| time_elapsed | 480 |\n",
+ "| total_timesteps | 144000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.01 |\n",
+ "| explained_variance | 0.8908392 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7199 |\n",
+ "| policy_loss | 0.0116 |\n",
+ "| std | 0.66 |\n",
+ "| value_loss | 9.44e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 7300 |\n",
+ "| time_elapsed | 486 |\n",
+ "| total_timesteps | 146000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.02 |\n",
+ "| explained_variance | 0.8968644 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7299 |\n",
+ "| policy_loss | 0.00186 |\n",
+ "| std | 0.663 |\n",
+ "| value_loss | 0.000115 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 7400 |\n",
+ "| time_elapsed | 495 |\n",
+ "| total_timesteps | 148000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3 |\n",
+ "| explained_variance | 0.9469945 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7399 |\n",
+ "| policy_loss | 0.0135 |\n",
+ "| std | 0.658 |\n",
+ "| value_loss | 6.55e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 7500 |\n",
+ "| time_elapsed | 501 |\n",
+ "| total_timesteps | 150000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3 |\n",
+ "| explained_variance | 0.95443714 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7499 |\n",
+ "| policy_loss | -0.0519 |\n",
+ "| std | 0.657 |\n",
+ "| value_loss | 0.000292 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 7600 |\n",
+ "| time_elapsed | 507 |\n",
+ "| total_timesteps | 152000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.96 |\n",
+ "| explained_variance | 0.89563817 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7599 |\n",
+ "| policy_loss | -0.0471 |\n",
+ "| std | 0.649 |\n",
+ "| value_loss | 0.000453 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 7700 |\n",
+ "| time_elapsed | 513 |\n",
+ "| total_timesteps | 154000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.95 |\n",
+ "| explained_variance | 0.379259 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7699 |\n",
+ "| policy_loss | -0.0699 |\n",
+ "| std | 0.645 |\n",
+ "| value_loss | 0.000852 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.06 |\n",
+ "| ep_rew_mean | -0.258 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 7800 |\n",
+ "| time_elapsed | 520 |\n",
+ "| total_timesteps | 156000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.92 |\n",
+ "| explained_variance | 0.9296672 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7799 |\n",
+ "| policy_loss | 0.0176 |\n",
+ "| std | 0.64 |\n",
+ "| value_loss | 0.000166 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 7900 |\n",
+ "| time_elapsed | 530 |\n",
+ "| total_timesteps | 158000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.91 |\n",
+ "| explained_variance | 0.87379307 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7899 |\n",
+ "| policy_loss | -0.0405 |\n",
+ "| std | 0.638 |\n",
+ "| value_loss | 0.000283 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.07 |\n",
+ "| ep_rew_mean | -0.249 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8000 |\n",
+ "| time_elapsed | 536 |\n",
+ "| total_timesteps | 160000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.9 |\n",
+ "| explained_variance | 0.4368084 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7999 |\n",
+ "| policy_loss | -0.00266 |\n",
+ "| std | 0.636 |\n",
+ "| value_loss | 0.000391 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8100 |\n",
+ "| time_elapsed | 542 |\n",
+ "| total_timesteps | 162000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.88 |\n",
+ "| explained_variance | 0.75877607 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8099 |\n",
+ "| policy_loss | -0.0239 |\n",
+ "| std | 0.633 |\n",
+ "| value_loss | 0.000364 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.96 |\n",
+ "| ep_rew_mean | -0.238 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8200 |\n",
+ "| time_elapsed | 548 |\n",
+ "| total_timesteps | 164000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.89 |\n",
+ "| explained_variance | 0.703933 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8199 |\n",
+ "| policy_loss | 0.00433 |\n",
+ "| std | 0.633 |\n",
+ "| value_loss | 0.000467 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 8300 |\n",
+ "| time_elapsed | 554 |\n",
+ "| total_timesteps | 166000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.89 |\n",
+ "| explained_variance | 0.7966901 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8299 |\n",
+ "| policy_loss | 0.0451 |\n",
+ "| std | 0.635 |\n",
+ "| value_loss | 0.00083 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 8400 |\n",
+ "| time_elapsed | 564 |\n",
+ "| total_timesteps | 168000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.88 |\n",
+ "| explained_variance | 0.7485693 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8399 |\n",
+ "| policy_loss | -0.00633 |\n",
+ "| std | 0.633 |\n",
+ "| value_loss | 0.000419 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 8500 |\n",
+ "| time_elapsed | 570 |\n",
+ "| total_timesteps | 170000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.88 |\n",
+ "| explained_variance | 0.7972623 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8499 |\n",
+ "| policy_loss | -0.0462 |\n",
+ "| std | 0.631 |\n",
+ "| value_loss | 0.000487 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8600 |\n",
+ "| time_elapsed | 576 |\n",
+ "| total_timesteps | 172000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.85 |\n",
+ "| explained_variance | 0.90221983 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8599 |\n",
+ "| policy_loss | 0.0116 |\n",
+ "| std | 0.625 |\n",
+ "| value_loss | 0.000125 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8700 |\n",
+ "| time_elapsed | 582 |\n",
+ "| total_timesteps | 174000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.81 |\n",
+ "| explained_variance | 0.98035014 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8699 |\n",
+ "| policy_loss | -9.69e-05 |\n",
+ "| std | 0.618 |\n",
+ "| value_loss | 2.93e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.01 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 8800 |\n",
+ "| time_elapsed | 588 |\n",
+ "| total_timesteps | 176000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.81 |\n",
+ "| explained_variance | 0.91983217 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8799 |\n",
+ "| policy_loss | -0.0214 |\n",
+ "| std | 0.618 |\n",
+ "| value_loss | 0.000142 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.02 |\n",
+ "| ep_rew_mean | -0.244 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 8900 |\n",
+ "| time_elapsed | 598 |\n",
+ "| total_timesteps | 178000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.8 |\n",
+ "| explained_variance | 0.8616807 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8899 |\n",
+ "| policy_loss | 0.0346 |\n",
+ "| std | 0.615 |\n",
+ "| value_loss | 0.000569 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9000 |\n",
+ "| time_elapsed | 604 |\n",
+ "| total_timesteps | 180000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.78 |\n",
+ "| explained_variance | 0.951248 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8999 |\n",
+ "| policy_loss | -0.0212 |\n",
+ "| std | 0.611 |\n",
+ "| value_loss | 7.19e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9100 |\n",
+ "| time_elapsed | 611 |\n",
+ "| total_timesteps | 182000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.75 |\n",
+ "| explained_variance | 0.9046812 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9099 |\n",
+ "| policy_loss | -0.0139 |\n",
+ "| std | 0.605 |\n",
+ "| value_loss | 0.000137 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9200 |\n",
+ "| time_elapsed | 617 |\n",
+ "| total_timesteps | 184000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.73 |\n",
+ "| explained_variance | 0.91447115 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9199 |\n",
+ "| policy_loss | 0.0295 |\n",
+ "| std | 0.602 |\n",
+ "| value_loss | 0.000232 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.56 |\n",
+ "| ep_rew_mean | -0.188 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9300 |\n",
+ "| time_elapsed | 624 |\n",
+ "| total_timesteps | 186000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.71 |\n",
+ "| explained_variance | 0.93495154 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9299 |\n",
+ "| policy_loss | -0.00335 |\n",
+ "| std | 0.597 |\n",
+ "| value_loss | 7.19e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 296 |\n",
+ "| iterations | 9400 |\n",
+ "| time_elapsed | 633 |\n",
+ "| total_timesteps | 188000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.72 |\n",
+ "| explained_variance | 0.8910692 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9399 |\n",
+ "| policy_loss | -0.021 |\n",
+ "| std | 0.599 |\n",
+ "| value_loss | 0.000176 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9500 |\n",
+ "| time_elapsed | 639 |\n",
+ "| total_timesteps | 190000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.71 |\n",
+ "| explained_variance | 0.8695343 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9499 |\n",
+ "| policy_loss | 0.00743 |\n",
+ "| std | 0.596 |\n",
+ "| value_loss | 0.000188 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 9600 |\n",
+ "| time_elapsed | 645 |\n",
+ "| total_timesteps | 192000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.7 |\n",
+ "| explained_variance | 0.9704885 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9599 |\n",
+ "| policy_loss | 0.0164 |\n",
+ "| std | 0.595 |\n",
+ "| value_loss | 0.000116 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 9700 |\n",
+ "| time_elapsed | 650 |\n",
+ "| total_timesteps | 194000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.67 |\n",
+ "| explained_variance | 0.9151263 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9699 |\n",
+ "| policy_loss | -0.00882 |\n",
+ "| std | 0.59 |\n",
+ "| value_loss | 0.000251 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 9800 |\n",
+ "| time_elapsed | 656 |\n",
+ "| total_timesteps | 196000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.66 |\n",
+ "| explained_variance | 0.9109387 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9799 |\n",
+ "| policy_loss | -0.0265 |\n",
+ "| std | 0.587 |\n",
+ "| value_loss | 0.000159 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 9900 |\n",
+ "| time_elapsed | 662 |\n",
+ "| total_timesteps | 198000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.63 |\n",
+ "| explained_variance | 0.90884244 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9899 |\n",
+ "| policy_loss | -0.0342 |\n",
+ "| std | 0.581 |\n",
+ "| value_loss | 0.000265 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3 |\n",
+ "| ep_rew_mean | -0.239 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10000 |\n",
+ "| time_elapsed | 672 |\n",
+ "| total_timesteps | 200000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.62 |\n",
+ "| explained_variance | 0.9498335 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9999 |\n",
+ "| policy_loss | 0.0115 |\n",
+ "| std | 0.58 |\n",
+ "| value_loss | 0.000148 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10100 |\n",
+ "| time_elapsed | 678 |\n",
+ "| total_timesteps | 202000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.61 |\n",
+ "| explained_variance | 0.9500687 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10099 |\n",
+ "| policy_loss | 0.0277 |\n",
+ "| std | 0.578 |\n",
+ "| value_loss | 0.000288 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.93 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10200 |\n",
+ "| time_elapsed | 684 |\n",
+ "| total_timesteps | 204000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.58 |\n",
+ "| explained_variance | 0.9149855 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10199 |\n",
+ "| policy_loss | 0.0532 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 0.000604 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.35 |\n",
+ "| ep_rew_mean | -0.263 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 10300 |\n",
+ "| time_elapsed | 690 |\n",
+ "| total_timesteps | 206000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.57 |\n",
+ "| explained_variance | 0.42474955 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10299 |\n",
+ "| policy_loss | -0.212 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 0.0399 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 10400 |\n",
+ "| time_elapsed | 696 |\n",
+ "| total_timesteps | 208000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.57 |\n",
+ "| explained_variance | 0.30399168 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10399 |\n",
+ "| policy_loss | 0.00621 |\n",
+ "| std | 0.57 |\n",
+ "| value_loss | 0.00118 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.95 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10500 |\n",
+ "| time_elapsed | 705 |\n",
+ "| total_timesteps | 210000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.56 |\n",
+ "| explained_variance | 0.8920195 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10499 |\n",
+ "| policy_loss | -0.0377 |\n",
+ "| std | 0.569 |\n",
+ "| value_loss | 0.000283 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.225 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10600 |\n",
+ "| time_elapsed | 711 |\n",
+ "| total_timesteps | 212000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.55 |\n",
+ "| explained_variance | 0.87574327 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10599 |\n",
+ "| policy_loss | 0.0187 |\n",
+ "| std | 0.566 |\n",
+ "| value_loss | 0.000248 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 10700 |\n",
+ "| time_elapsed | 718 |\n",
+ "| total_timesteps | 214000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.55 |\n",
+ "| explained_variance | 0.4303016 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10699 |\n",
+ "| policy_loss | -0.0253 |\n",
+ "| std | 0.566 |\n",
+ "| value_loss | 0.00045 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 10800 |\n",
+ "| time_elapsed | 724 |\n",
+ "| total_timesteps | 216000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.53 |\n",
+ "| explained_variance | 0.94789743 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10799 |\n",
+ "| policy_loss | 0.00158 |\n",
+ "| std | 0.564 |\n",
+ "| value_loss | 0.000105 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.91 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 10900 |\n",
+ "| time_elapsed | 730 |\n",
+ "| total_timesteps | 218000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.52 |\n",
+ "| explained_variance | 0.89148146 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10899 |\n",
+ "| policy_loss | -0.00433 |\n",
+ "| std | 0.561 |\n",
+ "| value_loss | 0.000233 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11000 |\n",
+ "| time_elapsed | 736 |\n",
+ "| total_timesteps | 220000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.52 |\n",
+ "| explained_variance | 0.6723757 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10999 |\n",
+ "| policy_loss | 0.0337 |\n",
+ "| std | 0.561 |\n",
+ "| value_loss | 0.000452 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 297 |\n",
+ "| iterations | 11100 |\n",
+ "| time_elapsed | 745 |\n",
+ "| total_timesteps | 222000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.51 |\n",
+ "| explained_variance | 0.9861437 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11099 |\n",
+ "| policy_loss | 0.00968 |\n",
+ "| std | 0.56 |\n",
+ "| value_loss | 6.46e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11200 |\n",
+ "| time_elapsed | 751 |\n",
+ "| total_timesteps | 224000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.52 |\n",
+ "| explained_variance | 0.9299464 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11199 |\n",
+ "| policy_loss | -0.00854 |\n",
+ "| std | 0.56 |\n",
+ "| value_loss | 8.38e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.96 |\n",
+ "| ep_rew_mean | -0.236 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11300 |\n",
+ "| time_elapsed | 757 |\n",
+ "| total_timesteps | 226000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.5 |\n",
+ "| explained_variance | 0.8100773 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11299 |\n",
+ "| policy_loss | 0.0223 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 0.000133 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11400 |\n",
+ "| time_elapsed | 762 |\n",
+ "| total_timesteps | 228000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.48 |\n",
+ "| explained_variance | 0.9284025 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11399 |\n",
+ "| policy_loss | 0.0199 |\n",
+ "| std | 0.553 |\n",
+ "| value_loss | 0.000181 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 11500 |\n",
+ "| time_elapsed | 768 |\n",
+ "| total_timesteps | 230000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.46 |\n",
+ "| explained_variance | 0.91747606 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11499 |\n",
+ "| policy_loss | 0.00503 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 9.29e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11600 |\n",
+ "| time_elapsed | 777 |\n",
+ "| total_timesteps | 232000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.44 |\n",
+ "| explained_variance | 0.94396347 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11599 |\n",
+ "| policy_loss | 0.0116 |\n",
+ "| std | 0.546 |\n",
+ "| value_loss | 0.000102 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 298 |\n",
+ "| iterations | 11700 |\n",
+ "| time_elapsed | 783 |\n",
+ "| total_timesteps | 234000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.41 |\n",
+ "| explained_variance | 0.90888155 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11699 |\n",
+ "| policy_loss | -0.00239 |\n",
+ "| std | 0.542 |\n",
+ "| value_loss | 0.00012 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 11800 |\n",
+ "| time_elapsed | 789 |\n",
+ "| total_timesteps | 236000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.41 |\n",
+ "| explained_variance | 0.8832614 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11799 |\n",
+ "| policy_loss | 0.00491 |\n",
+ "| std | 0.542 |\n",
+ "| value_loss | 0.000124 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 11900 |\n",
+ "| time_elapsed | 795 |\n",
+ "| total_timesteps | 238000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.41 |\n",
+ "| explained_variance | 0.74971235 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11899 |\n",
+ "| policy_loss | -0.0369 |\n",
+ "| std | 0.542 |\n",
+ "| value_loss | 0.000599 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 12000 |\n",
+ "| time_elapsed | 800 |\n",
+ "| total_timesteps | 240000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.41 |\n",
+ "| explained_variance | 0.9652594 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11999 |\n",
+ "| policy_loss | 0.0374 |\n",
+ "| std | 0.542 |\n",
+ "| value_loss | 0.00025 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 12100 |\n",
+ "| time_elapsed | 806 |\n",
+ "| total_timesteps | 242000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.4 |\n",
+ "| explained_variance | 0.7710167 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12099 |\n",
+ "| policy_loss | 0.0269 |\n",
+ "| std | 0.54 |\n",
+ "| value_loss | 0.000402 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 12200 |\n",
+ "| time_elapsed | 815 |\n",
+ "| total_timesteps | 244000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.39 |\n",
+ "| explained_variance | 0.91825235 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12199 |\n",
+ "| policy_loss | 0.0442 |\n",
+ "| std | 0.539 |\n",
+ "| value_loss | 0.000402 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.92 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 12300 |\n",
+ "| time_elapsed | 821 |\n",
+ "| total_timesteps | 246000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.39 |\n",
+ "| explained_variance | 0.9738185 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12299 |\n",
+ "| policy_loss | 0.0144 |\n",
+ "| std | 0.538 |\n",
+ "| value_loss | 9.16e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 299 |\n",
+ "| iterations | 12400 |\n",
+ "| time_elapsed | 826 |\n",
+ "| total_timesteps | 248000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.4 |\n",
+ "| explained_variance | 0.89304084 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12399 |\n",
+ "| policy_loss | 0.0163 |\n",
+ "| std | 0.539 |\n",
+ "| value_loss | 0.000238 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 12500 |\n",
+ "| time_elapsed | 832 |\n",
+ "| total_timesteps | 250000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.4 |\n",
+ "| explained_variance | 0.9213255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12499 |\n",
+ "| policy_loss | -0.00162 |\n",
+ "| std | 0.538 |\n",
+ "| value_loss | 7.67e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 12600 |\n",
+ "| time_elapsed | 837 |\n",
+ "| total_timesteps | 252000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.37 |\n",
+ "| explained_variance | 0.8798297 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12599 |\n",
+ "| policy_loss | -0.0144 |\n",
+ "| std | 0.534 |\n",
+ "| value_loss | 0.000193 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 12700 |\n",
+ "| time_elapsed | 843 |\n",
+ "| total_timesteps | 254000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.35 |\n",
+ "| explained_variance | 0.9373393 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12699 |\n",
+ "| policy_loss | -0.0111 |\n",
+ "| std | 0.53 |\n",
+ "| value_loss | 9.91e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 12800 |\n",
+ "| time_elapsed | 853 |\n",
+ "| total_timesteps | 256000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.33 |\n",
+ "| explained_variance | 0.9766309 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12799 |\n",
+ "| policy_loss | -0.0042 |\n",
+ "| std | 0.527 |\n",
+ "| value_loss | 7.05e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 12900 |\n",
+ "| time_elapsed | 858 |\n",
+ "| total_timesteps | 258000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.34 |\n",
+ "| explained_variance | 0.9444415 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12899 |\n",
+ "| policy_loss | 0.0116 |\n",
+ "| std | 0.528 |\n",
+ "| value_loss | 0.000126 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 13000 |\n",
+ "| time_elapsed | 864 |\n",
+ "| total_timesteps | 260000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.32 |\n",
+ "| explained_variance | 0.8382063 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12999 |\n",
+ "| policy_loss | -0.0395 |\n",
+ "| std | 0.524 |\n",
+ "| value_loss | 0.000643 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 13100 |\n",
+ "| time_elapsed | 870 |\n",
+ "| total_timesteps | 262000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.32 |\n",
+ "| explained_variance | 0.9722576 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13099 |\n",
+ "| policy_loss | 0.0103 |\n",
+ "| std | 0.525 |\n",
+ "| value_loss | 7.43e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 13200 |\n",
+ "| time_elapsed | 875 |\n",
+ "| total_timesteps | 264000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.3 |\n",
+ "| explained_variance | 0.9009595 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13199 |\n",
+ "| policy_loss | -0.00122 |\n",
+ "| std | 0.522 |\n",
+ "| value_loss | 9.14e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 13300 |\n",
+ "| time_elapsed | 885 |\n",
+ "| total_timesteps | 266000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.3 |\n",
+ "| explained_variance | 0.95411175 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13299 |\n",
+ "| policy_loss | -0.00844 |\n",
+ "| std | 0.522 |\n",
+ "| value_loss | 8.08e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.235 |\n",
+ "| time/ | |\n",
+ "| fps | 300 |\n",
+ "| iterations | 13400 |\n",
+ "| time_elapsed | 890 |\n",
+ "| total_timesteps | 268000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.3 |\n",
+ "| explained_variance | 0.9836456 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13399 |\n",
+ "| policy_loss | 0.0253 |\n",
+ "| std | 0.522 |\n",
+ "| value_loss | 0.000132 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 13500 |\n",
+ "| time_elapsed | 896 |\n",
+ "| total_timesteps | 270000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.3 |\n",
+ "| explained_variance | 0.9624939 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13499 |\n",
+ "| policy_loss | -0.000755 |\n",
+ "| std | 0.522 |\n",
+ "| value_loss | 7.38e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.202 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 13600 |\n",
+ "| time_elapsed | 901 |\n",
+ "| total_timesteps | 272000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.27 |\n",
+ "| explained_variance | 0.0788098 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13599 |\n",
+ "| policy_loss | -0.0734 |\n",
+ "| std | 0.516 |\n",
+ "| value_loss | 0.00111 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.91 |\n",
+ "| ep_rew_mean | -0.24 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 13700 |\n",
+ "| time_elapsed | 907 |\n",
+ "| total_timesteps | 274000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.25 |\n",
+ "| explained_variance | 0.94273585 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13699 |\n",
+ "| policy_loss | 0.00141 |\n",
+ "| std | 0.513 |\n",
+ "| value_loss | 0.000132 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 13800 |\n",
+ "| time_elapsed | 912 |\n",
+ "| total_timesteps | 276000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.25 |\n",
+ "| explained_variance | 0.8948201 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13799 |\n",
+ "| policy_loss | 0.0349 |\n",
+ "| std | 0.514 |\n",
+ "| value_loss | 0.000385 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 13900 |\n",
+ "| time_elapsed | 921 |\n",
+ "| total_timesteps | 278000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.25 |\n",
+ "| explained_variance | 0.69086885 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13899 |\n",
+ "| policy_loss | -0.014 |\n",
+ "| std | 0.513 |\n",
+ "| value_loss | 0.000472 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 301 |\n",
+ "| iterations | 14000 |\n",
+ "| time_elapsed | 927 |\n",
+ "| total_timesteps | 280000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.24 |\n",
+ "| explained_variance | 0.92939675 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13999 |\n",
+ "| policy_loss | 0.00778 |\n",
+ "| std | 0.511 |\n",
+ "| value_loss | 0.000158 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.91 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14100 |\n",
+ "| time_elapsed | 933 |\n",
+ "| total_timesteps | 282000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.23 |\n",
+ "| explained_variance | 0.97067803 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14099 |\n",
+ "| policy_loss | -0.00249 |\n",
+ "| std | 0.51 |\n",
+ "| value_loss | 5.35e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14200 |\n",
+ "| time_elapsed | 938 |\n",
+ "| total_timesteps | 284000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.23 |\n",
+ "| explained_variance | 0.9633729 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14199 |\n",
+ "| policy_loss | 0.000154 |\n",
+ "| std | 0.51 |\n",
+ "| value_loss | 5.12e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14300 |\n",
+ "| time_elapsed | 944 |\n",
+ "| total_timesteps | 286000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.23 |\n",
+ "| explained_variance | 0.97925717 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14299 |\n",
+ "| policy_loss | -0.0138 |\n",
+ "| std | 0.51 |\n",
+ "| value_loss | 8.23e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.97 |\n",
+ "| ep_rew_mean | -0.241 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14400 |\n",
+ "| time_elapsed | 950 |\n",
+ "| total_timesteps | 288000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.23 |\n",
+ "| explained_variance | 0.8687136 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14399 |\n",
+ "| policy_loss | -0.0248 |\n",
+ "| std | 0.511 |\n",
+ "| value_loss | 0.000259 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14500 |\n",
+ "| time_elapsed | 959 |\n",
+ "| total_timesteps | 290000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.21 |\n",
+ "| explained_variance | 0.98206425 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14499 |\n",
+ "| policy_loss | -0.0198 |\n",
+ "| std | 0.508 |\n",
+ "| value_loss | 0.000166 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14600 |\n",
+ "| time_elapsed | 965 |\n",
+ "| total_timesteps | 292000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.19 |\n",
+ "| explained_variance | 0.98284197 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14599 |\n",
+ "| policy_loss | 0.0095 |\n",
+ "| std | 0.505 |\n",
+ "| value_loss | 4.55e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 14700 |\n",
+ "| time_elapsed | 970 |\n",
+ "| total_timesteps | 294000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.16 |\n",
+ "| explained_variance | 0.7622324 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14699 |\n",
+ "| policy_loss | -0.0373 |\n",
+ "| std | 0.499 |\n",
+ "| value_loss | 0.000854 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 14800 |\n",
+ "| time_elapsed | 976 |\n",
+ "| total_timesteps | 296000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.15 |\n",
+ "| explained_variance | 0.94090515 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14799 |\n",
+ "| policy_loss | -0.0101 |\n",
+ "| std | 0.497 |\n",
+ "| value_loss | 0.000149 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.99 |\n",
+ "| ep_rew_mean | -0.242 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 14900 |\n",
+ "| time_elapsed | 982 |\n",
+ "| total_timesteps | 298000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.15 |\n",
+ "| explained_variance | 0.94472414 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14899 |\n",
+ "| policy_loss | 0.0115 |\n",
+ "| std | 0.498 |\n",
+ "| value_loss | 0.000171 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.04 |\n",
+ "| ep_rew_mean | -0.237 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15000 |\n",
+ "| time_elapsed | 988 |\n",
+ "| total_timesteps | 300000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.16 |\n",
+ "| explained_variance | 0.93526465 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14999 |\n",
+ "| policy_loss | 0.0374 |\n",
+ "| std | 0.499 |\n",
+ "| value_loss | 0.000519 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 302 |\n",
+ "| iterations | 15100 |\n",
+ "| time_elapsed | 997 |\n",
+ "| total_timesteps | 302000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.16 |\n",
+ "| explained_variance | 0.9759287 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15099 |\n",
+ "| policy_loss | 0.0122 |\n",
+ "| std | 0.499 |\n",
+ "| value_loss | 7.45e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.195 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15200 |\n",
+ "| time_elapsed | 1003 |\n",
+ "| total_timesteps | 304000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.14 |\n",
+ "| explained_variance | 0.96417016 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15199 |\n",
+ "| policy_loss | 0.0111 |\n",
+ "| std | 0.497 |\n",
+ "| value_loss | 7.27e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.225 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15300 |\n",
+ "| time_elapsed | 1009 |\n",
+ "| total_timesteps | 306000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.15 |\n",
+ "| explained_variance | 0.96453744 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15299 |\n",
+ "| policy_loss | 0.011 |\n",
+ "| std | 0.499 |\n",
+ "| value_loss | 0.00012 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15400 |\n",
+ "| time_elapsed | 1014 |\n",
+ "| total_timesteps | 308000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.13 |\n",
+ "| explained_variance | 0.6340892 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15399 |\n",
+ "| policy_loss | -0.0265 |\n",
+ "| std | 0.496 |\n",
+ "| value_loss | 0.000917 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.89 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15500 |\n",
+ "| time_elapsed | 1020 |\n",
+ "| total_timesteps | 310000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.12 |\n",
+ "| explained_variance | 0.9757865 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15499 |\n",
+ "| policy_loss | 0.00557 |\n",
+ "| std | 0.493 |\n",
+ "| value_loss | 5.41e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15600 |\n",
+ "| time_elapsed | 1029 |\n",
+ "| total_timesteps | 312000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.1 |\n",
+ "| explained_variance | 0.9816367 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15599 |\n",
+ "| policy_loss | -0.00155 |\n",
+ "| std | 0.49 |\n",
+ "| value_loss | 4.53e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.87 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15700 |\n",
+ "| time_elapsed | 1034 |\n",
+ "| total_timesteps | 314000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.07 |\n",
+ "| explained_variance | 0.59498584 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15699 |\n",
+ "| policy_loss | 0.00891 |\n",
+ "| std | 0.484 |\n",
+ "| value_loss | 0.0011 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 303 |\n",
+ "| iterations | 15800 |\n",
+ "| time_elapsed | 1039 |\n",
+ "| total_timesteps | 316000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2.04 |\n",
+ "| explained_variance | 0.9804266 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15799 |\n",
+ "| policy_loss | -0.0119 |\n",
+ "| std | 0.48 |\n",
+ "| value_loss | 6.58e-05 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 15900 |\n",
+ "| time_elapsed | 1045 |\n",
+ "| total_timesteps | 318000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -2 |\n",
+ "| explained_variance | 0.974797 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15899 |\n",
+ "| policy_loss | -0.0222 |\n",
+ "| std | 0.475 |\n",
+ "| value_loss | 0.000213 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.227 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16000 |\n",
+ "| time_elapsed | 1050 |\n",
+ "| total_timesteps | 320000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.98 |\n",
+ "| explained_variance | 0.90541655 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15999 |\n",
+ "| policy_loss | 0.0202 |\n",
+ "| std | 0.47 |\n",
+ "| value_loss | 0.000274 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.232 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16100 |\n",
+ "| time_elapsed | 1056 |\n",
+ "| total_timesteps | 322000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.98 |\n",
+ "| explained_variance | 0.9099645 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16099 |\n",
+ "| policy_loss | 0.00779 |\n",
+ "| std | 0.471 |\n",
+ "| value_loss | 0.000138 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16200 |\n",
+ "| time_elapsed | 1065 |\n",
+ "| total_timesteps | 324000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.98 |\n",
+ "| explained_variance | 0.90357727 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16199 |\n",
+ "| policy_loss | -0.0167 |\n",
+ "| std | 0.471 |\n",
+ "| value_loss | 0.000208 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.93 |\n",
+ "| ep_rew_mean | -0.232 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16300 |\n",
+ "| time_elapsed | 1071 |\n",
+ "| total_timesteps | 326000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.97 |\n",
+ "| explained_variance | 0.87062764 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16299 |\n",
+ "| policy_loss | -0.00104 |\n",
+ "| std | 0.469 |\n",
+ "| value_loss | 0.000104 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16400 |\n",
+ "| time_elapsed | 1076 |\n",
+ "| total_timesteps | 328000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.95 |\n",
+ "| explained_variance | 0.96559095 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16399 |\n",
+ "| policy_loss | 0.00231 |\n",
+ "| std | 0.467 |\n",
+ "| value_loss | 5.08e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16500 |\n",
+ "| time_elapsed | 1082 |\n",
+ "| total_timesteps | 330000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.97 |\n",
+ "| explained_variance | 0.9584811 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16499 |\n",
+ "| policy_loss | 0.00624 |\n",
+ "| std | 0.469 |\n",
+ "| value_loss | 0.000136 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 16600 |\n",
+ "| time_elapsed | 1087 |\n",
+ "| total_timesteps | 332000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.95 |\n",
+ "| explained_variance | 0.9770625 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16599 |\n",
+ "| policy_loss | 0.00544 |\n",
+ "| std | 0.467 |\n",
+ "| value_loss | 6.65e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 16700 |\n",
+ "| time_elapsed | 1093 |\n",
+ "| total_timesteps | 334000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.95 |\n",
+ "| explained_variance | 0.63326836 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16699 |\n",
+ "| policy_loss | -0.0177 |\n",
+ "| std | 0.467 |\n",
+ "| value_loss | 0.00115 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 304 |\n",
+ "| iterations | 16800 |\n",
+ "| time_elapsed | 1102 |\n",
+ "| total_timesteps | 336000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.93 |\n",
+ "| explained_variance | 0.98614395 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16799 |\n",
+ "| policy_loss | 0.00795 |\n",
+ "| std | 0.463 |\n",
+ "| value_loss | 4.68e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 16900 |\n",
+ "| time_elapsed | 1108 |\n",
+ "| total_timesteps | 338000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.89 |\n",
+ "| explained_variance | 0.9440542 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16899 |\n",
+ "| policy_loss | -0.0238 |\n",
+ "| std | 0.458 |\n",
+ "| value_loss | 0.000362 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17000 |\n",
+ "| time_elapsed | 1113 |\n",
+ "| total_timesteps | 340000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.88 |\n",
+ "| explained_variance | 0.9288571 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16999 |\n",
+ "| policy_loss | 0.0118 |\n",
+ "| std | 0.456 |\n",
+ "| value_loss | 0.00026 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17100 |\n",
+ "| time_elapsed | 1119 |\n",
+ "| total_timesteps | 342000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.88 |\n",
+ "| explained_variance | 0.9744407 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17099 |\n",
+ "| policy_loss | -0.0129 |\n",
+ "| std | 0.455 |\n",
+ "| value_loss | 7.61e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17200 |\n",
+ "| time_elapsed | 1125 |\n",
+ "| total_timesteps | 344000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.87 |\n",
+ "| explained_variance | 0.9596539 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17199 |\n",
+ "| policy_loss | -0.0125 |\n",
+ "| std | 0.455 |\n",
+ "| value_loss | 0.000129 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 17300 |\n",
+ "| time_elapsed | 1130 |\n",
+ "| total_timesteps | 346000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.86 |\n",
+ "| explained_variance | 0.97371745 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17299 |\n",
+ "| policy_loss | 0.0135 |\n",
+ "| std | 0.453 |\n",
+ "| value_loss | 0.000125 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17400 |\n",
+ "| time_elapsed | 1139 |\n",
+ "| total_timesteps | 348000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.84 |\n",
+ "| explained_variance | 0.97407156 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17399 |\n",
+ "| policy_loss | -0.000319 |\n",
+ "| std | 0.451 |\n",
+ "| value_loss | 7.55e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.227 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17500 |\n",
+ "| time_elapsed | 1145 |\n",
+ "| total_timesteps | 350000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.83 |\n",
+ "| explained_variance | 0.9800369 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17499 |\n",
+ "| policy_loss | -0.00272 |\n",
+ "| std | 0.449 |\n",
+ "| value_loss | 7.61e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.57 |\n",
+ "| ep_rew_mean | -0.192 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17600 |\n",
+ "| time_elapsed | 1151 |\n",
+ "| total_timesteps | 352000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.83 |\n",
+ "| explained_variance | 0.9593339 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17599 |\n",
+ "| policy_loss | -0.0111 |\n",
+ "| std | 0.449 |\n",
+ "| value_loss | 9e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 17700 |\n",
+ "| time_elapsed | 1156 |\n",
+ "| total_timesteps | 354000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.81 |\n",
+ "| explained_variance | 0.9596555 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17699 |\n",
+ "| policy_loss | -0.00942 |\n",
+ "| std | 0.447 |\n",
+ "| value_loss | 6.95e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 17800 |\n",
+ "| time_elapsed | 1162 |\n",
+ "| total_timesteps | 356000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.81 |\n",
+ "| explained_variance | 0.98978764 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17799 |\n",
+ "| policy_loss | -0.00202 |\n",
+ "| std | 0.446 |\n",
+ "| value_loss | 3.37e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 17900 |\n",
+ "| time_elapsed | 1171 |\n",
+ "| total_timesteps | 358000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.81 |\n",
+ "| explained_variance | 0.7599305 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17899 |\n",
+ "| policy_loss | -0.0287 |\n",
+ "| std | 0.446 |\n",
+ "| value_loss | 0.000693 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 305 |\n",
+ "| iterations | 18000 |\n",
+ "| time_elapsed | 1177 |\n",
+ "| total_timesteps | 360000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.8 |\n",
+ "| explained_variance | 0.98177564 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17999 |\n",
+ "| policy_loss | 0.0091 |\n",
+ "| std | 0.446 |\n",
+ "| value_loss | 0.00011 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.202 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18100 |\n",
+ "| time_elapsed | 1182 |\n",
+ "| total_timesteps | 362000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.79 |\n",
+ "| explained_variance | 0.934681 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18099 |\n",
+ "| policy_loss | -0.0094 |\n",
+ "| std | 0.444 |\n",
+ "| value_loss | 0.000113 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18200 |\n",
+ "| time_elapsed | 1188 |\n",
+ "| total_timesteps | 364000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.79 |\n",
+ "| explained_variance | 0.95457464 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18199 |\n",
+ "| policy_loss | -0.00085 |\n",
+ "| std | 0.444 |\n",
+ "| value_loss | 6.5e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18300 |\n",
+ "| time_elapsed | 1194 |\n",
+ "| total_timesteps | 366000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.77 |\n",
+ "| explained_variance | 0.95278776 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18299 |\n",
+ "| policy_loss | -0.00984 |\n",
+ "| std | 0.44 |\n",
+ "| value_loss | 8.43e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18400 |\n",
+ "| time_elapsed | 1199 |\n",
+ "| total_timesteps | 368000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.76 |\n",
+ "| explained_variance | 0.94221777 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18399 |\n",
+ "| policy_loss | 0.000339 |\n",
+ "| std | 0.439 |\n",
+ "| value_loss | 0.000151 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18500 |\n",
+ "| time_elapsed | 1208 |\n",
+ "| total_timesteps | 370000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.75 |\n",
+ "| explained_variance | 0.80928534 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18499 |\n",
+ "| policy_loss | 0.00918 |\n",
+ "| std | 0.438 |\n",
+ "| value_loss | 0.000224 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18600 |\n",
+ "| time_elapsed | 1214 |\n",
+ "| total_timesteps | 372000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.74 |\n",
+ "| explained_variance | 0.9688626 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18599 |\n",
+ "| policy_loss | -8.22e-06 |\n",
+ "| std | 0.437 |\n",
+ "| value_loss | 0.000102 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18700 |\n",
+ "| time_elapsed | 1219 |\n",
+ "| total_timesteps | 374000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.72 |\n",
+ "| explained_variance | 0.9825928 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18699 |\n",
+ "| policy_loss | -0.00274 |\n",
+ "| std | 0.434 |\n",
+ "| value_loss | 5.74e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 18800 |\n",
+ "| time_elapsed | 1225 |\n",
+ "| total_timesteps | 376000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.7 |\n",
+ "| explained_variance | 0.9257292 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18799 |\n",
+ "| policy_loss | 0.0254 |\n",
+ "| std | 0.431 |\n",
+ "| value_loss | 0.000312 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 18900 |\n",
+ "| time_elapsed | 1230 |\n",
+ "| total_timesteps | 378000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.67 |\n",
+ "| explained_variance | 0.62272656 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18899 |\n",
+ "| policy_loss | 0.0324 |\n",
+ "| std | 0.428 |\n",
+ "| value_loss | 0.00136 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19000 |\n",
+ "| time_elapsed | 1236 |\n",
+ "| total_timesteps | 380000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.67 |\n",
+ "| explained_variance | 0.8762253 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18999 |\n",
+ "| policy_loss | 0.0126 |\n",
+ "| std | 0.427 |\n",
+ "| value_loss | 0.000196 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 306 |\n",
+ "| iterations | 19100 |\n",
+ "| time_elapsed | 1245 |\n",
+ "| total_timesteps | 382000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.67 |\n",
+ "| explained_variance | 0.9610008 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19099 |\n",
+ "| policy_loss | -0.00729 |\n",
+ "| std | 0.428 |\n",
+ "| value_loss | 9.97e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19200 |\n",
+ "| time_elapsed | 1250 |\n",
+ "| total_timesteps | 384000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.65 |\n",
+ "| explained_variance | 0.9789909 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19199 |\n",
+ "| policy_loss | 0.00905 |\n",
+ "| std | 0.425 |\n",
+ "| value_loss | 8.39e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19300 |\n",
+ "| time_elapsed | 1256 |\n",
+ "| total_timesteps | 386000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.65 |\n",
+ "| explained_variance | 0.9824363 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19299 |\n",
+ "| policy_loss | 0.0189 |\n",
+ "| std | 0.425 |\n",
+ "| value_loss | 0.000169 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19400 |\n",
+ "| time_elapsed | 1262 |\n",
+ "| total_timesteps | 388000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.64 |\n",
+ "| explained_variance | 0.86109364 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19399 |\n",
+ "| policy_loss | -0.0201 |\n",
+ "| std | 0.424 |\n",
+ "| value_loss | 0.000426 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19500 |\n",
+ "| time_elapsed | 1267 |\n",
+ "| total_timesteps | 390000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.63 |\n",
+ "| explained_variance | 0.70620143 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19499 |\n",
+ "| policy_loss | 0.0118 |\n",
+ "| std | 0.423 |\n",
+ "| value_loss | 0.000389 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19600 |\n",
+ "| time_elapsed | 1273 |\n",
+ "| total_timesteps | 392000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.61 |\n",
+ "| explained_variance | 0.39207745 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19599 |\n",
+ "| policy_loss | 0.0517 |\n",
+ "| std | 0.42 |\n",
+ "| value_loss | 0.00316 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19700 |\n",
+ "| time_elapsed | 1282 |\n",
+ "| total_timesteps | 394000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.6 |\n",
+ "| explained_variance | 0.96949595 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19699 |\n",
+ "| policy_loss | 0.0161 |\n",
+ "| std | 0.419 |\n",
+ "| value_loss | 0.000128 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19800 |\n",
+ "| time_elapsed | 1288 |\n",
+ "| total_timesteps | 396000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.59 |\n",
+ "| explained_variance | 0.9210366 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19799 |\n",
+ "| policy_loss | -0.0121 |\n",
+ "| std | 0.419 |\n",
+ "| value_loss | 0.000303 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.227 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 19900 |\n",
+ "| time_elapsed | 1294 |\n",
+ "| total_timesteps | 398000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.59 |\n",
+ "| explained_variance | 0.91853833 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19899 |\n",
+ "| policy_loss | -0.0153 |\n",
+ "| std | 0.42 |\n",
+ "| value_loss | 0.000214 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.64 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20000 |\n",
+ "| time_elapsed | 1300 |\n",
+ "| total_timesteps | 400000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.58 |\n",
+ "| explained_variance | 0.8020137 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19999 |\n",
+ "| policy_loss | -0.0282 |\n",
+ "| std | 0.418 |\n",
+ "| value_loss | 0.000849 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20100 |\n",
+ "| time_elapsed | 1306 |\n",
+ "| total_timesteps | 402000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.56 |\n",
+ "| explained_variance | 0.95373213 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20099 |\n",
+ "| policy_loss | 0.00944 |\n",
+ "| std | 0.416 |\n",
+ "| value_loss | 0.000117 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20200 |\n",
+ "| time_elapsed | 1315 |\n",
+ "| total_timesteps | 404000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.55 |\n",
+ "| explained_variance | 0.9797992 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20199 |\n",
+ "| policy_loss | -0.0116 |\n",
+ "| std | 0.414 |\n",
+ "| value_loss | 0.000106 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20300 |\n",
+ "| time_elapsed | 1320 |\n",
+ "| total_timesteps | 406000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.54 |\n",
+ "| explained_variance | 0.9802046 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20299 |\n",
+ "| policy_loss | -0.04 |\n",
+ "| std | 0.412 |\n",
+ "| value_loss | 0.000222 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.51 |\n",
+ "| ep_rew_mean | -0.185 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20400 |\n",
+ "| time_elapsed | 1326 |\n",
+ "| total_timesteps | 408000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.53 |\n",
+ "| explained_variance | 0.97018176 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20399 |\n",
+ "| policy_loss | -0.000202 |\n",
+ "| std | 0.412 |\n",
+ "| value_loss | 4.38e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.235 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20500 |\n",
+ "| time_elapsed | 1332 |\n",
+ "| total_timesteps | 410000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.5 |\n",
+ "| explained_variance | 0.957062 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20499 |\n",
+ "| policy_loss | -0.000156 |\n",
+ "| std | 0.407 |\n",
+ "| value_loss | 8.57e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20600 |\n",
+ "| time_elapsed | 1338 |\n",
+ "| total_timesteps | 412000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.49 |\n",
+ "| explained_variance | 0.94187564 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20599 |\n",
+ "| policy_loss | 0.00251 |\n",
+ "| std | 0.405 |\n",
+ "| value_loss | 8.7e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20700 |\n",
+ "| time_elapsed | 1345 |\n",
+ "| total_timesteps | 414000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.49 |\n",
+ "| explained_variance | 0.88869613 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20699 |\n",
+ "| policy_loss | 0.00114 |\n",
+ "| std | 0.407 |\n",
+ "| value_loss | 0.000132 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20800 |\n",
+ "| time_elapsed | 1354 |\n",
+ "| total_timesteps | 416000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.48 |\n",
+ "| explained_variance | 0.92743963 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20799 |\n",
+ "| policy_loss | 0.00954 |\n",
+ "| std | 0.406 |\n",
+ "| value_loss | 0.000317 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.93 |\n",
+ "| ep_rew_mean | -0.227 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 20900 |\n",
+ "| time_elapsed | 1360 |\n",
+ "| total_timesteps | 418000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.46 |\n",
+ "| explained_variance | 0.90706563 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20899 |\n",
+ "| policy_loss | 0.0194 |\n",
+ "| std | 0.403 |\n",
+ "| value_loss | 0.000414 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.02 |\n",
+ "| ep_rew_mean | -0.237 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 21000 |\n",
+ "| time_elapsed | 1364 |\n",
+ "| total_timesteps | 420000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.44 |\n",
+ "| explained_variance | -3.516026 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20999 |\n",
+ "| policy_loss | 0.0564 |\n",
+ "| std | 0.401 |\n",
+ "| value_loss | 0.0189 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21100 |\n",
+ "| time_elapsed | 1370 |\n",
+ "| total_timesteps | 422000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.42 |\n",
+ "| explained_variance | 0.9308317 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21099 |\n",
+ "| policy_loss | -0.00902 |\n",
+ "| std | 0.398 |\n",
+ "| value_loss | 0.000172 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21200 |\n",
+ "| time_elapsed | 1376 |\n",
+ "| total_timesteps | 424000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.4 |\n",
+ "| explained_variance | 0.9731085 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21199 |\n",
+ "| policy_loss | 0.00145 |\n",
+ "| std | 0.396 |\n",
+ "| value_loss | 6.53e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21300 |\n",
+ "| time_elapsed | 1381 |\n",
+ "| total_timesteps | 426000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.4 |\n",
+ "| explained_variance | 0.32687587 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21299 |\n",
+ "| policy_loss | -0.0387 |\n",
+ "| std | 0.396 |\n",
+ "| value_loss | 0.0013 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 21400 |\n",
+ "| time_elapsed | 1391 |\n",
+ "| total_timesteps | 428000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.39 |\n",
+ "| explained_variance | 0.9532345 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21399 |\n",
+ "| policy_loss | -0.00503 |\n",
+ "| std | 0.395 |\n",
+ "| value_loss | 9.31e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 307 |\n",
+ "| iterations | 21500 |\n",
+ "| time_elapsed | 1396 |\n",
+ "| total_timesteps | 430000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.4 |\n",
+ "| explained_variance | 0.87219936 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21499 |\n",
+ "| policy_loss | -0.00404 |\n",
+ "| std | 0.396 |\n",
+ "| value_loss | 0.000149 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21600 |\n",
+ "| time_elapsed | 1401 |\n",
+ "| total_timesteps | 432000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.38 |\n",
+ "| explained_variance | 0.9910213 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21599 |\n",
+ "| policy_loss | 0.00199 |\n",
+ "| std | 0.393 |\n",
+ "| value_loss | 3.43e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21700 |\n",
+ "| time_elapsed | 1406 |\n",
+ "| total_timesteps | 434000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.37 |\n",
+ "| explained_variance | 0.9567144 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21699 |\n",
+ "| policy_loss | -0.0111 |\n",
+ "| std | 0.393 |\n",
+ "| value_loss | 0.000193 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 21800 |\n",
+ "| time_elapsed | 1411 |\n",
+ "| total_timesteps | 436000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.38 |\n",
+ "| explained_variance | 0.97068435 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21799 |\n",
+ "| policy_loss | 0.00666 |\n",
+ "| std | 0.393 |\n",
+ "| value_loss | 0.000148 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 309 |\n",
+ "| iterations | 21900 |\n",
+ "| time_elapsed | 1417 |\n",
+ "| total_timesteps | 438000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.34 |\n",
+ "| explained_variance | 0.9512937 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21899 |\n",
+ "| policy_loss | 0.00756 |\n",
+ "| std | 0.388 |\n",
+ "| value_loss | 0.00022 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.241 |\n",
+ "| time/ | |\n",
+ "| fps | 308 |\n",
+ "| iterations | 22000 |\n",
+ "| time_elapsed | 1424 |\n",
+ "| total_timesteps | 440000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.33 |\n",
+ "| explained_variance | 0.9791408 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21999 |\n",
+ "| policy_loss | -0.00404 |\n",
+ "| std | 0.388 |\n",
+ "| value_loss | 8.91e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 309 |\n",
+ "| iterations | 22100 |\n",
+ "| time_elapsed | 1429 |\n",
+ "| total_timesteps | 442000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.33 |\n",
+ "| explained_variance | 0.9878471 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22099 |\n",
+ "| policy_loss | 0.00572 |\n",
+ "| std | 0.388 |\n",
+ "| value_loss | 3.05e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 309 |\n",
+ "| iterations | 22200 |\n",
+ "| time_elapsed | 1434 |\n",
+ "| total_timesteps | 444000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.32 |\n",
+ "| explained_variance | 0.9775455 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22199 |\n",
+ "| policy_loss | -0.0153 |\n",
+ "| std | 0.386 |\n",
+ "| value_loss | 0.000116 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 309 |\n",
+ "| iterations | 22300 |\n",
+ "| time_elapsed | 1438 |\n",
+ "| total_timesteps | 446000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.32 |\n",
+ "| explained_variance | 0.9002794 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22299 |\n",
+ "| policy_loss | -0.0161 |\n",
+ "| std | 0.386 |\n",
+ "| value_loss | 0.000522 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 310 |\n",
+ "| iterations | 22400 |\n",
+ "| time_elapsed | 1442 |\n",
+ "| total_timesteps | 448000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.3 |\n",
+ "| explained_variance | 0.9801052 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22399 |\n",
+ "| policy_loss | -0.00323 |\n",
+ "| std | 0.385 |\n",
+ "| value_loss | 6.21e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.14 |\n",
+ "| ep_rew_mean | -0.249 |\n",
+ "| time/ | |\n",
+ "| fps | 311 |\n",
+ "| iterations | 22500 |\n",
+ "| time_elapsed | 1446 |\n",
+ "| total_timesteps | 450000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.29 |\n",
+ "| explained_variance | 0.94702476 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22499 |\n",
+ "| policy_loss | 0.00382 |\n",
+ "| std | 0.384 |\n",
+ "| value_loss | 0.00018 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 311 |\n",
+ "| iterations | 22600 |\n",
+ "| time_elapsed | 1450 |\n",
+ "| total_timesteps | 452000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.27 |\n",
+ "| explained_variance | 0.857116 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22599 |\n",
+ "| policy_loss | 0.0067 |\n",
+ "| std | 0.382 |\n",
+ "| value_loss | 0.000226 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 311 |\n",
+ "| iterations | 22700 |\n",
+ "| time_elapsed | 1458 |\n",
+ "| total_timesteps | 454000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.26 |\n",
+ "| explained_variance | 0.9482857 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22699 |\n",
+ "| policy_loss | 0.00906 |\n",
+ "| std | 0.381 |\n",
+ "| value_loss | 0.000168 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.225 |\n",
+ "| time/ | |\n",
+ "| fps | 311 |\n",
+ "| iterations | 22800 |\n",
+ "| time_elapsed | 1462 |\n",
+ "| total_timesteps | 456000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.25 |\n",
+ "| explained_variance | 0.97008514 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22799 |\n",
+ "| policy_loss | 0.015 |\n",
+ "| std | 0.381 |\n",
+ "| value_loss | 0.000137 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 312 |\n",
+ "| iterations | 22900 |\n",
+ "| time_elapsed | 1466 |\n",
+ "| total_timesteps | 458000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.24 |\n",
+ "| explained_variance | 0.94469047 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22899 |\n",
+ "| policy_loss | -0.0147 |\n",
+ "| std | 0.379 |\n",
+ "| value_loss | 0.000413 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 312 |\n",
+ "| iterations | 23000 |\n",
+ "| time_elapsed | 1470 |\n",
+ "| total_timesteps | 460000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.23 |\n",
+ "| explained_variance | 0.9643724 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22999 |\n",
+ "| policy_loss | -0.0134 |\n",
+ "| std | 0.377 |\n",
+ "| value_loss | 0.000259 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 313 |\n",
+ "| iterations | 23100 |\n",
+ "| time_elapsed | 1473 |\n",
+ "| total_timesteps | 462000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.23 |\n",
+ "| explained_variance | 0.9723082 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23099 |\n",
+ "| policy_loss | -0.00796 |\n",
+ "| std | 0.376 |\n",
+ "| value_loss | 0.000132 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.227 |\n",
+ "| time/ | |\n",
+ "| fps | 314 |\n",
+ "| iterations | 23200 |\n",
+ "| time_elapsed | 1477 |\n",
+ "| total_timesteps | 464000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.22 |\n",
+ "| explained_variance | 0.9105156 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23199 |\n",
+ "| policy_loss | 0.00883 |\n",
+ "| std | 0.375 |\n",
+ "| value_loss | 0.00018 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 314 |\n",
+ "| iterations | 23300 |\n",
+ "| time_elapsed | 1481 |\n",
+ "| total_timesteps | 466000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.21 |\n",
+ "| explained_variance | 0.9802915 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23299 |\n",
+ "| policy_loss | 0.00295 |\n",
+ "| std | 0.374 |\n",
+ "| value_loss | 6.99e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.58 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 314 |\n",
+ "| iterations | 23400 |\n",
+ "| time_elapsed | 1485 |\n",
+ "| total_timesteps | 468000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.21 |\n",
+ "| explained_variance | 0.94888604 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23399 |\n",
+ "| policy_loss | 0.00301 |\n",
+ "| std | 0.375 |\n",
+ "| value_loss | 0.000121 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 315 |\n",
+ "| iterations | 23500 |\n",
+ "| time_elapsed | 1489 |\n",
+ "| total_timesteps | 470000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.21 |\n",
+ "| explained_variance | 0.9344809 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23499 |\n",
+ "| policy_loss | 0.0167 |\n",
+ "| std | 0.375 |\n",
+ "| value_loss | 0.000243 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 315 |\n",
+ "| iterations | 23600 |\n",
+ "| time_elapsed | 1497 |\n",
+ "| total_timesteps | 472000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.2 |\n",
+ "| explained_variance | 0.9716975 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23599 |\n",
+ "| policy_loss | 0.0191 |\n",
+ "| std | 0.373 |\n",
+ "| value_loss | 0.000384 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 315 |\n",
+ "| iterations | 23700 |\n",
+ "| time_elapsed | 1501 |\n",
+ "| total_timesteps | 474000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.19 |\n",
+ "| explained_variance | 0.9716921 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23699 |\n",
+ "| policy_loss | 0.00755 |\n",
+ "| std | 0.372 |\n",
+ "| value_loss | 7.3e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 316 |\n",
+ "| iterations | 23800 |\n",
+ "| time_elapsed | 1505 |\n",
+ "| total_timesteps | 476000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.2 |\n",
+ "| explained_variance | 0.90949667 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23799 |\n",
+ "| policy_loss | 0.0125 |\n",
+ "| std | 0.375 |\n",
+ "| value_loss | 0.000263 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.95 |\n",
+ "| ep_rew_mean | -0.242 |\n",
+ "| time/ | |\n",
+ "| fps | 316 |\n",
+ "| iterations | 23900 |\n",
+ "| time_elapsed | 1509 |\n",
+ "| total_timesteps | 478000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.18 |\n",
+ "| explained_variance | 0.96989167 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23899 |\n",
+ "| policy_loss | -0.00168 |\n",
+ "| std | 0.372 |\n",
+ "| value_loss | 0.000114 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 317 |\n",
+ "| iterations | 24000 |\n",
+ "| time_elapsed | 1513 |\n",
+ "| total_timesteps | 480000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.17 |\n",
+ "| explained_variance | 0.95877844 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23999 |\n",
+ "| policy_loss | -0.00787 |\n",
+ "| std | 0.37 |\n",
+ "| value_loss | 0.000112 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 317 |\n",
+ "| iterations | 24100 |\n",
+ "| time_elapsed | 1517 |\n",
+ "| total_timesteps | 482000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.14 |\n",
+ "| explained_variance | 0.95311856 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24099 |\n",
+ "| policy_loss | 0.00426 |\n",
+ "| std | 0.366 |\n",
+ "| value_loss | 0.000191 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 318 |\n",
+ "| iterations | 24200 |\n",
+ "| time_elapsed | 1521 |\n",
+ "| total_timesteps | 484000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.15 |\n",
+ "| explained_variance | 0.8985226 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24199 |\n",
+ "| policy_loss | 0.0182 |\n",
+ "| std | 0.367 |\n",
+ "| value_loss | 0.000501 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 318 |\n",
+ "| iterations | 24300 |\n",
+ "| time_elapsed | 1525 |\n",
+ "| total_timesteps | 486000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.12 |\n",
+ "| explained_variance | 0.8981765 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24299 |\n",
+ "| policy_loss | -0.00977 |\n",
+ "| std | 0.365 |\n",
+ "| value_loss | 0.000423 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 318 |\n",
+ "| iterations | 24400 |\n",
+ "| time_elapsed | 1533 |\n",
+ "| total_timesteps | 488000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.12 |\n",
+ "| explained_variance | 0.81411386 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24399 |\n",
+ "| policy_loss | -0.00471 |\n",
+ "| std | 0.364 |\n",
+ "| value_loss | 0.000483 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 318 |\n",
+ "| iterations | 24500 |\n",
+ "| time_elapsed | 1536 |\n",
+ "| total_timesteps | 490000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.11 |\n",
+ "| explained_variance | 0.97057235 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24499 |\n",
+ "| policy_loss | 0.000997 |\n",
+ "| std | 0.363 |\n",
+ "| value_loss | 8.4e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 319 |\n",
+ "| iterations | 24600 |\n",
+ "| time_elapsed | 1540 |\n",
+ "| total_timesteps | 492000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.1 |\n",
+ "| explained_variance | 0.9207422 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24599 |\n",
+ "| policy_loss | 0.00211 |\n",
+ "| std | 0.362 |\n",
+ "| value_loss | 0.000173 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 319 |\n",
+ "| iterations | 24700 |\n",
+ "| time_elapsed | 1545 |\n",
+ "| total_timesteps | 494000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.11 |\n",
+ "| explained_variance | 0.7847704 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24699 |\n",
+ "| policy_loss | -0.0204 |\n",
+ "| std | 0.363 |\n",
+ "| value_loss | 0.0011 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 24800 |\n",
+ "| time_elapsed | 1549 |\n",
+ "| total_timesteps | 496000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.09 |\n",
+ "| explained_variance | 0.53824604 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24799 |\n",
+ "| policy_loss | -0.00417 |\n",
+ "| std | 0.363 |\n",
+ "| value_loss | 0.00134 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 24900 |\n",
+ "| time_elapsed | 1553 |\n",
+ "| total_timesteps | 498000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.08 |\n",
+ "| explained_variance | 0.96404845 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24899 |\n",
+ "| policy_loss | 0.00329 |\n",
+ "| std | 0.361 |\n",
+ "| value_loss | 0.000237 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 25000 |\n",
+ "| time_elapsed | 1557 |\n",
+ "| total_timesteps | 500000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.06 |\n",
+ "| explained_variance | 0.97420067 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24999 |\n",
+ "| policy_loss | 0.00337 |\n",
+ "| std | 0.358 |\n",
+ "| value_loss | 7.87e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.89 |\n",
+ "| ep_rew_mean | -0.241 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 25100 |\n",
+ "| time_elapsed | 1565 |\n",
+ "| total_timesteps | 502000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.05 |\n",
+ "| explained_variance | 0.9236581 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25099 |\n",
+ "| policy_loss | -0.0275 |\n",
+ "| std | 0.356 |\n",
+ "| value_loss | 0.00045 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 25200 |\n",
+ "| time_elapsed | 1569 |\n",
+ "| total_timesteps | 504000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.03 |\n",
+ "| explained_variance | 0.94585973 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25199 |\n",
+ "| policy_loss | 0.00344 |\n",
+ "| std | 0.355 |\n",
+ "| value_loss | 0.000152 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 25300 |\n",
+ "| time_elapsed | 1574 |\n",
+ "| total_timesteps | 506000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.02 |\n",
+ "| explained_variance | 0.9588999 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25299 |\n",
+ "| policy_loss | -0.00463 |\n",
+ "| std | 0.354 |\n",
+ "| value_loss | 0.000145 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 25400 |\n",
+ "| time_elapsed | 1578 |\n",
+ "| total_timesteps | 508000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1.03 |\n",
+ "| explained_variance | 0.98241514 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25399 |\n",
+ "| policy_loss | -0.0109 |\n",
+ "| std | 0.354 |\n",
+ "| value_loss | 7.51e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 25500 |\n",
+ "| time_elapsed | 1583 |\n",
+ "| total_timesteps | 510000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1 |\n",
+ "| explained_variance | 0.9718913 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25499 |\n",
+ "| policy_loss | -0.00206 |\n",
+ "| std | 0.351 |\n",
+ "| value_loss | 6.36e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 25600 |\n",
+ "| time_elapsed | 1589 |\n",
+ "| total_timesteps | 512000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -1 |\n",
+ "| explained_variance | 0.9023121 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25599 |\n",
+ "| policy_loss | -0.00161 |\n",
+ "| std | 0.351 |\n",
+ "| value_loss | 0.000443 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.89 |\n",
+ "| ep_rew_mean | -0.238 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 25700 |\n",
+ "| time_elapsed | 1595 |\n",
+ "| total_timesteps | 514000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.978 |\n",
+ "| explained_variance | 0.93654746 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25699 |\n",
+ "| policy_loss | -0.00729 |\n",
+ "| std | 0.348 |\n",
+ "| value_loss | 0.000471 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.202 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 25800 |\n",
+ "| time_elapsed | 1605 |\n",
+ "| total_timesteps | 516000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.987 |\n",
+ "| explained_variance | 0.97084665 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25799 |\n",
+ "| policy_loss | 0.000631 |\n",
+ "| std | 0.349 |\n",
+ "| value_loss | 0.000164 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 25900 |\n",
+ "| time_elapsed | 1611 |\n",
+ "| total_timesteps | 518000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.972 |\n",
+ "| explained_variance | 0.97883755 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25899 |\n",
+ "| policy_loss | 0.00163 |\n",
+ "| std | 0.347 |\n",
+ "| value_loss | 8.23e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 26000 |\n",
+ "| time_elapsed | 1617 |\n",
+ "| total_timesteps | 520000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.952 |\n",
+ "| explained_variance | 0.9676675 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25999 |\n",
+ "| policy_loss | 0.00262 |\n",
+ "| std | 0.345 |\n",
+ "| value_loss | 0.000103 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 26100 |\n",
+ "| time_elapsed | 1623 |\n",
+ "| total_timesteps | 522000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.943 |\n",
+ "| explained_variance | 0.97848845 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26099 |\n",
+ "| policy_loss | -0.0121 |\n",
+ "| std | 0.344 |\n",
+ "| value_loss | 9.36e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 26200 |\n",
+ "| time_elapsed | 1629 |\n",
+ "| total_timesteps | 524000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.929 |\n",
+ "| explained_variance | 0.9451687 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26199 |\n",
+ "| policy_loss | -0.000101 |\n",
+ "| std | 0.343 |\n",
+ "| value_loss | 0.000122 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26300 |\n",
+ "| time_elapsed | 1639 |\n",
+ "| total_timesteps | 526000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.919 |\n",
+ "| explained_variance | 0.98636997 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26299 |\n",
+ "| policy_loss | -0.00152 |\n",
+ "| std | 0.341 |\n",
+ "| value_loss | 3.07e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26400 |\n",
+ "| time_elapsed | 1645 |\n",
+ "| total_timesteps | 528000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.914 |\n",
+ "| explained_variance | 0.97304374 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26399 |\n",
+ "| policy_loss | -0.00223 |\n",
+ "| std | 0.341 |\n",
+ "| value_loss | 9.55e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26500 |\n",
+ "| time_elapsed | 1651 |\n",
+ "| total_timesteps | 530000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.896 |\n",
+ "| explained_variance | 0.9557487 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26499 |\n",
+ "| policy_loss | -0.00242 |\n",
+ "| std | 0.339 |\n",
+ "| value_loss | 0.000109 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26600 |\n",
+ "| time_elapsed | 1657 |\n",
+ "| total_timesteps | 532000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.896 |\n",
+ "| explained_variance | 0.95472234 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26599 |\n",
+ "| policy_loss | -0.00131 |\n",
+ "| std | 0.339 |\n",
+ "| value_loss | 0.000107 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26700 |\n",
+ "| time_elapsed | 1664 |\n",
+ "| total_timesteps | 534000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.898 |\n",
+ "| explained_variance | 0.97528225 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26699 |\n",
+ "| policy_loss | -0.00439 |\n",
+ "| std | 0.34 |\n",
+ "| value_loss | 8e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26800 |\n",
+ "| time_elapsed | 1673 |\n",
+ "| total_timesteps | 536000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.896 |\n",
+ "| explained_variance | 0.9557003 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26799 |\n",
+ "| policy_loss | -0.00705 |\n",
+ "| std | 0.339 |\n",
+ "| value_loss | 0.000122 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 26900 |\n",
+ "| time_elapsed | 1677 |\n",
+ "| total_timesteps | 538000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.899 |\n",
+ "| explained_variance | 0.9643465 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26899 |\n",
+ "| policy_loss | -0.00129 |\n",
+ "| std | 0.339 |\n",
+ "| value_loss | 0.000202 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 320 |\n",
+ "| iterations | 27000 |\n",
+ "| time_elapsed | 1682 |\n",
+ "| total_timesteps | 540000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.878 |\n",
+ "| explained_variance | 0.9536327 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26999 |\n",
+ "| policy_loss | 0.00474 |\n",
+ "| std | 0.337 |\n",
+ "| value_loss | 0.000155 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 27100 |\n",
+ "| time_elapsed | 1686 |\n",
+ "| total_timesteps | 542000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.872 |\n",
+ "| explained_variance | 0.9857932 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27099 |\n",
+ "| policy_loss | 0.00156 |\n",
+ "| std | 0.337 |\n",
+ "| value_loss | 4.18e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 321 |\n",
+ "| iterations | 27200 |\n",
+ "| time_elapsed | 1690 |\n",
+ "| total_timesteps | 544000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.867 |\n",
+ "| explained_variance | 0.95144564 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27199 |\n",
+ "| policy_loss | 0.00798 |\n",
+ "| std | 0.335 |\n",
+ "| value_loss | 0.000209 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 27300 |\n",
+ "| time_elapsed | 1695 |\n",
+ "| total_timesteps | 546000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.861 |\n",
+ "| explained_variance | 0.96456105 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27299 |\n",
+ "| policy_loss | 0.00775 |\n",
+ "| std | 0.335 |\n",
+ "| value_loss | 0.000121 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 27400 |\n",
+ "| time_elapsed | 1699 |\n",
+ "| total_timesteps | 548000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.85 |\n",
+ "| explained_variance | 0.9638762 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27399 |\n",
+ "| policy_loss | 0.00218 |\n",
+ "| std | 0.334 |\n",
+ "| value_loss | 0.000101 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.58 |\n",
+ "| ep_rew_mean | -0.189 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 27500 |\n",
+ "| time_elapsed | 1703 |\n",
+ "| total_timesteps | 550000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.827 |\n",
+ "| explained_variance | 0.97292376 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27499 |\n",
+ "| policy_loss | 0.000216 |\n",
+ "| std | 0.332 |\n",
+ "| value_loss | 0.000109 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.221 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 27600 |\n",
+ "| time_elapsed | 1711 |\n",
+ "| total_timesteps | 552000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.817 |\n",
+ "| explained_variance | 0.9647702 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27599 |\n",
+ "| policy_loss | -0.000127 |\n",
+ "| std | 0.331 |\n",
+ "| value_loss | 0.000103 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 322 |\n",
+ "| iterations | 27700 |\n",
+ "| time_elapsed | 1716 |\n",
+ "| total_timesteps | 554000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.804 |\n",
+ "| explained_variance | 0.8263781 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27699 |\n",
+ "| policy_loss | 0.0203 |\n",
+ "| std | 0.329 |\n",
+ "| value_loss | 0.00182 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 27800 |\n",
+ "| time_elapsed | 1720 |\n",
+ "| total_timesteps | 556000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.797 |\n",
+ "| explained_variance | 0.97323596 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27799 |\n",
+ "| policy_loss | 0.00182 |\n",
+ "| std | 0.328 |\n",
+ "| value_loss | 0.000173 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 27900 |\n",
+ "| time_elapsed | 1725 |\n",
+ "| total_timesteps | 558000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.774 |\n",
+ "| explained_variance | 0.22033525 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27899 |\n",
+ "| policy_loss | 0.00906 |\n",
+ "| std | 0.325 |\n",
+ "| value_loss | 0.00211 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 28000 |\n",
+ "| time_elapsed | 1730 |\n",
+ "| total_timesteps | 560000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.764 |\n",
+ "| explained_variance | 0.9667121 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27999 |\n",
+ "| policy_loss | -0.00939 |\n",
+ "| std | 0.324 |\n",
+ "| value_loss | 0.000226 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 28100 |\n",
+ "| time_elapsed | 1735 |\n",
+ "| total_timesteps | 562000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.76 |\n",
+ "| explained_variance | 0.9180963 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28099 |\n",
+ "| policy_loss | 0.000922 |\n",
+ "| std | 0.324 |\n",
+ "| value_loss | 0.00013 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 28200 |\n",
+ "| time_elapsed | 1740 |\n",
+ "| total_timesteps | 564000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.758 |\n",
+ "| explained_variance | 0.58245325 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28199 |\n",
+ "| policy_loss | -0.0188 |\n",
+ "| std | 0.324 |\n",
+ "| value_loss | 0.00212 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 28300 |\n",
+ "| time_elapsed | 1748 |\n",
+ "| total_timesteps | 566000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.747 |\n",
+ "| explained_variance | 0.98212785 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28299 |\n",
+ "| policy_loss | -0.00883 |\n",
+ "| std | 0.322 |\n",
+ "| value_loss | 0.000125 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 323 |\n",
+ "| iterations | 28400 |\n",
+ "| time_elapsed | 1753 |\n",
+ "| total_timesteps | 568000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.77 |\n",
+ "| explained_variance | 0.9311479 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28399 |\n",
+ "| policy_loss | -0.00106 |\n",
+ "| std | 0.325 |\n",
+ "| value_loss | 8.91e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 28500 |\n",
+ "| time_elapsed | 1757 |\n",
+ "| total_timesteps | 570000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.724 |\n",
+ "| explained_variance | 0.99005914 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28499 |\n",
+ "| policy_loss | 0.000466 |\n",
+ "| std | 0.32 |\n",
+ "| value_loss | 2.75e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.194 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 28600 |\n",
+ "| time_elapsed | 1762 |\n",
+ "| total_timesteps | 572000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.688 |\n",
+ "| explained_variance | 0.97127336 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28599 |\n",
+ "| policy_loss | -0.00282 |\n",
+ "| std | 0.316 |\n",
+ "| value_loss | 0.000111 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 28700 |\n",
+ "| time_elapsed | 1767 |\n",
+ "| total_timesteps | 574000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.673 |\n",
+ "| explained_variance | 0.97094136 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28699 |\n",
+ "| policy_loss | -0.00301 |\n",
+ "| std | 0.315 |\n",
+ "| value_loss | 0.00014 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 28800 |\n",
+ "| time_elapsed | 1772 |\n",
+ "| total_timesteps | 576000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.658 |\n",
+ "| explained_variance | 0.9512483 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28799 |\n",
+ "| policy_loss | -0.00559 |\n",
+ "| std | 0.313 |\n",
+ "| value_loss | 0.000173 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 28900 |\n",
+ "| time_elapsed | 1780 |\n",
+ "| total_timesteps | 578000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.65 |\n",
+ "| explained_variance | 0.97945994 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28899 |\n",
+ "| policy_loss | -0.00472 |\n",
+ "| std | 0.312 |\n",
+ "| value_loss | 0.000126 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.225 |\n",
+ "| time/ | |\n",
+ "| fps | 324 |\n",
+ "| iterations | 29000 |\n",
+ "| time_elapsed | 1785 |\n",
+ "| total_timesteps | 580000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.631 |\n",
+ "| explained_variance | 0.9418761 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28999 |\n",
+ "| policy_loss | -2.69e-05 |\n",
+ "| std | 0.311 |\n",
+ "| value_loss | 0.000187 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29100 |\n",
+ "| time_elapsed | 1790 |\n",
+ "| total_timesteps | 582000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.62 |\n",
+ "| explained_variance | 0.9561486 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29099 |\n",
+ "| policy_loss | 0.0037 |\n",
+ "| std | 0.309 |\n",
+ "| value_loss | 0.0002 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29200 |\n",
+ "| time_elapsed | 1795 |\n",
+ "| total_timesteps | 584000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.618 |\n",
+ "| explained_variance | 0.9853928 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29199 |\n",
+ "| policy_loss | -0.00218 |\n",
+ "| std | 0.309 |\n",
+ "| value_loss | 5.64e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29300 |\n",
+ "| time_elapsed | 1800 |\n",
+ "| total_timesteps | 586000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.611 |\n",
+ "| explained_variance | 0.97477955 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29299 |\n",
+ "| policy_loss | 0.00242 |\n",
+ "| std | 0.309 |\n",
+ "| value_loss | 0.000135 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29400 |\n",
+ "| time_elapsed | 1804 |\n",
+ "| total_timesteps | 588000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.613 |\n",
+ "| explained_variance | 0.6755737 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29399 |\n",
+ "| policy_loss | -0.0265 |\n",
+ "| std | 0.309 |\n",
+ "| value_loss | 0.00202 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 29500 |\n",
+ "| time_elapsed | 1809 |\n",
+ "| total_timesteps | 590000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.596 |\n",
+ "| explained_variance | 0.9880968 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29499 |\n",
+ "| policy_loss | 0.00439 |\n",
+ "| std | 0.307 |\n",
+ "| value_loss | 5.99e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29600 |\n",
+ "| time_elapsed | 1818 |\n",
+ "| total_timesteps | 592000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.589 |\n",
+ "| explained_variance | 0.93901527 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29599 |\n",
+ "| policy_loss | 0.00187 |\n",
+ "| std | 0.306 |\n",
+ "| value_loss | 0.000172 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.4 |\n",
+ "| ep_rew_mean | -0.275 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29700 |\n",
+ "| time_elapsed | 1823 |\n",
+ "| total_timesteps | 594000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.584 |\n",
+ "| explained_variance | -1.5040169 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29699 |\n",
+ "| policy_loss | -0.0121 |\n",
+ "| std | 0.306 |\n",
+ "| value_loss | 0.00608 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.08 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 325 |\n",
+ "| iterations | 29800 |\n",
+ "| time_elapsed | 1828 |\n",
+ "| total_timesteps | 596000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.602 |\n",
+ "| explained_variance | 0.35909313 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29799 |\n",
+ "| policy_loss | -0.00559 |\n",
+ "| std | 0.308 |\n",
+ "| value_loss | 0.00806 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.95 |\n",
+ "| ep_rew_mean | -0.23 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 29900 |\n",
+ "| time_elapsed | 1833 |\n",
+ "| total_timesteps | 598000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.617 |\n",
+ "| explained_variance | -10.621065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29899 |\n",
+ "| policy_loss | -0.0245 |\n",
+ "| std | 0.309 |\n",
+ "| value_loss | 0.0672 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30000 |\n",
+ "| time_elapsed | 1838 |\n",
+ "| total_timesteps | 600000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.585 |\n",
+ "| explained_variance | 0.41773236 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29999 |\n",
+ "| policy_loss | -0.0285 |\n",
+ "| std | 0.305 |\n",
+ "| value_loss | 0.00465 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30100 |\n",
+ "| time_elapsed | 1843 |\n",
+ "| total_timesteps | 602000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.596 |\n",
+ "| explained_variance | 0.9414502 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30099 |\n",
+ "| policy_loss | 0.00102 |\n",
+ "| std | 0.307 |\n",
+ "| value_loss | 0.000178 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30200 |\n",
+ "| time_elapsed | 1852 |\n",
+ "| total_timesteps | 604000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.586 |\n",
+ "| explained_variance | 0.598702 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30199 |\n",
+ "| policy_loss | -0.0509 |\n",
+ "| std | 0.306 |\n",
+ "| value_loss | 0.00411 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.94 |\n",
+ "| ep_rew_mean | -0.233 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30300 |\n",
+ "| time_elapsed | 1857 |\n",
+ "| total_timesteps | 606000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.568 |\n",
+ "| explained_variance | 0.9546901 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30299 |\n",
+ "| policy_loss | 0.00376 |\n",
+ "| std | 0.304 |\n",
+ "| value_loss | 0.000196 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30400 |\n",
+ "| time_elapsed | 1861 |\n",
+ "| total_timesteps | 608000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.556 |\n",
+ "| explained_variance | 0.9634257 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30399 |\n",
+ "| policy_loss | -0.00117 |\n",
+ "| std | 0.304 |\n",
+ "| value_loss | 0.000128 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.192 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30500 |\n",
+ "| time_elapsed | 1866 |\n",
+ "| total_timesteps | 610000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.559 |\n",
+ "| explained_variance | 0.9729128 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30499 |\n",
+ "| policy_loss | -0.0041 |\n",
+ "| std | 0.304 |\n",
+ "| value_loss | 8.32e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 326 |\n",
+ "| iterations | 30600 |\n",
+ "| time_elapsed | 1871 |\n",
+ "| total_timesteps | 612000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.542 |\n",
+ "| explained_variance | 0.98225373 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30599 |\n",
+ "| policy_loss | -0.00189 |\n",
+ "| std | 0.303 |\n",
+ "| value_loss | 9.61e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 30700 |\n",
+ "| time_elapsed | 1876 |\n",
+ "| total_timesteps | 614000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.539 |\n",
+ "| explained_variance | 0.92065257 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30699 |\n",
+ "| policy_loss | -0.00712 |\n",
+ "| std | 0.302 |\n",
+ "| value_loss | 0.000221 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 30800 |\n",
+ "| time_elapsed | 1881 |\n",
+ "| total_timesteps | 616000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.539 |\n",
+ "| explained_variance | 0.9718508 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30799 |\n",
+ "| policy_loss | -0.00467 |\n",
+ "| std | 0.303 |\n",
+ "| value_loss | 9.48e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 30900 |\n",
+ "| time_elapsed | 1889 |\n",
+ "| total_timesteps | 618000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.526 |\n",
+ "| explained_variance | 0.97893435 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30899 |\n",
+ "| policy_loss | 0.00621 |\n",
+ "| std | 0.302 |\n",
+ "| value_loss | 0.000222 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31000 |\n",
+ "| time_elapsed | 1893 |\n",
+ "| total_timesteps | 620000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.512 |\n",
+ "| explained_variance | 0.95006245 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30999 |\n",
+ "| policy_loss | 0.00233 |\n",
+ "| std | 0.301 |\n",
+ "| value_loss | 0.000139 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.84 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31100 |\n",
+ "| time_elapsed | 1898 |\n",
+ "| total_timesteps | 622000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.512 |\n",
+ "| explained_variance | 0.93657434 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31099 |\n",
+ "| policy_loss | -0.000242 |\n",
+ "| std | 0.301 |\n",
+ "| value_loss | 0.000198 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31200 |\n",
+ "| time_elapsed | 1905 |\n",
+ "| total_timesteps | 624000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.466 |\n",
+ "| explained_variance | 0.9877893 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31199 |\n",
+ "| policy_loss | 0.00217 |\n",
+ "| std | 0.297 |\n",
+ "| value_loss | 4.42e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31300 |\n",
+ "| time_elapsed | 1911 |\n",
+ "| total_timesteps | 626000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.437 |\n",
+ "| explained_variance | 0.98232895 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31299 |\n",
+ "| policy_loss | -0.00259 |\n",
+ "| std | 0.294 |\n",
+ "| value_loss | 6.34e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31400 |\n",
+ "| time_elapsed | 1916 |\n",
+ "| total_timesteps | 628000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.447 |\n",
+ "| explained_variance | 0.97575915 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31399 |\n",
+ "| policy_loss | 0.000886 |\n",
+ "| std | 0.295 |\n",
+ "| value_loss | 8.11e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31500 |\n",
+ "| time_elapsed | 1925 |\n",
+ "| total_timesteps | 630000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.438 |\n",
+ "| explained_variance | 0.97809935 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31499 |\n",
+ "| policy_loss | 0.00624 |\n",
+ "| std | 0.294 |\n",
+ "| value_loss | 7.33e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31600 |\n",
+ "| time_elapsed | 1930 |\n",
+ "| total_timesteps | 632000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.411 |\n",
+ "| explained_variance | 0.95562655 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31599 |\n",
+ "| policy_loss | -0.00718 |\n",
+ "| std | 0.291 |\n",
+ "| value_loss | 0.00027 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 327 |\n",
+ "| iterations | 31700 |\n",
+ "| time_elapsed | 1934 |\n",
+ "| total_timesteps | 634000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.402 |\n",
+ "| explained_variance | 0.98256177 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31699 |\n",
+ "| policy_loss | -0.000849 |\n",
+ "| std | 0.29 |\n",
+ "| value_loss | 5.61e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 328 |\n",
+ "| iterations | 31800 |\n",
+ "| time_elapsed | 1938 |\n",
+ "| total_timesteps | 636000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.398 |\n",
+ "| explained_variance | 0.49550873 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31799 |\n",
+ "| policy_loss | -0.0132 |\n",
+ "| std | 0.289 |\n",
+ "| value_loss | 0.00197 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 328 |\n",
+ "| iterations | 31900 |\n",
+ "| time_elapsed | 1942 |\n",
+ "| total_timesteps | 638000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.37 |\n",
+ "| explained_variance | 0.9492866 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31899 |\n",
+ "| policy_loss | 0.00263 |\n",
+ "| std | 0.287 |\n",
+ "| value_loss | 0.000154 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 328 |\n",
+ "| iterations | 32000 |\n",
+ "| time_elapsed | 1946 |\n",
+ "| total_timesteps | 640000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.369 |\n",
+ "| explained_variance | 0.94642025 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31999 |\n",
+ "| policy_loss | 0.00381 |\n",
+ "| std | 0.287 |\n",
+ "| value_loss | 0.000234 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.225 |\n",
+ "| time/ | |\n",
+ "| fps | 328 |\n",
+ "| iterations | 32100 |\n",
+ "| time_elapsed | 1951 |\n",
+ "| total_timesteps | 642000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.361 |\n",
+ "| explained_variance | 0.98188424 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32099 |\n",
+ "| policy_loss | 0.00164 |\n",
+ "| std | 0.286 |\n",
+ "| value_loss | 6.05e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 329 |\n",
+ "| iterations | 32200 |\n",
+ "| time_elapsed | 1955 |\n",
+ "| total_timesteps | 644000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.343 |\n",
+ "| explained_variance | 0.9386951 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32199 |\n",
+ "| policy_loss | -0.00134 |\n",
+ "| std | 0.285 |\n",
+ "| value_loss | 0.00101 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 328 |\n",
+ "| iterations | 32300 |\n",
+ "| time_elapsed | 1963 |\n",
+ "| total_timesteps | 646000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.338 |\n",
+ "| explained_variance | 0.9384046 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32299 |\n",
+ "| policy_loss | 0.00246 |\n",
+ "| std | 0.285 |\n",
+ "| value_loss | 0.000127 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.5 |\n",
+ "| ep_rew_mean | -0.19 |\n",
+ "| time/ | |\n",
+ "| fps | 329 |\n",
+ "| iterations | 32400 |\n",
+ "| time_elapsed | 1968 |\n",
+ "| total_timesteps | 648000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.325 |\n",
+ "| explained_variance | 0.9613883 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32399 |\n",
+ "| policy_loss | 0.00174 |\n",
+ "| std | 0.284 |\n",
+ "| value_loss | 0.00013 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 329 |\n",
+ "| iterations | 32500 |\n",
+ "| time_elapsed | 1972 |\n",
+ "| total_timesteps | 650000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.317 |\n",
+ "| explained_variance | 0.9388133 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32499 |\n",
+ "| policy_loss | -0.00116 |\n",
+ "| std | 0.283 |\n",
+ "| value_loss | 0.000238 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 329 |\n",
+ "| iterations | 32600 |\n",
+ "| time_elapsed | 1976 |\n",
+ "| total_timesteps | 652000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.301 |\n",
+ "| explained_variance | 0.97595453 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32599 |\n",
+ "| policy_loss | 0.0013 |\n",
+ "| std | 0.281 |\n",
+ "| value_loss | 7.5e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 329 |\n",
+ "| iterations | 32700 |\n",
+ "| time_elapsed | 1981 |\n",
+ "| total_timesteps | 654000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.303 |\n",
+ "| explained_variance | 0.9642559 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32699 |\n",
+ "| policy_loss | -0.00392 |\n",
+ "| std | 0.281 |\n",
+ "| value_loss | 0.000116 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 32800 |\n",
+ "| time_elapsed | 1986 |\n",
+ "| total_timesteps | 656000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.308 |\n",
+ "| explained_variance | 0.96840066 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32799 |\n",
+ "| policy_loss | -0.00165 |\n",
+ "| std | 0.281 |\n",
+ "| value_loss | 0.000215 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 32900 |\n",
+ "| time_elapsed | 1990 |\n",
+ "| total_timesteps | 658000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.28 |\n",
+ "| explained_variance | 0.8290812 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32899 |\n",
+ "| policy_loss | -0.00931 |\n",
+ "| std | 0.279 |\n",
+ "| value_loss | 0.000542 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33000 |\n",
+ "| time_elapsed | 1999 |\n",
+ "| total_timesteps | 660000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.271 |\n",
+ "| explained_variance | 0.9427947 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32999 |\n",
+ "| policy_loss | -0.00779 |\n",
+ "| std | 0.278 |\n",
+ "| value_loss | 0.000146 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33100 |\n",
+ "| time_elapsed | 2003 |\n",
+ "| total_timesteps | 662000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.252 |\n",
+ "| explained_variance | 0.98349446 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33099 |\n",
+ "| policy_loss | -0.000681 |\n",
+ "| std | 0.277 |\n",
+ "| value_loss | 8.71e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.88 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33200 |\n",
+ "| time_elapsed | 2008 |\n",
+ "| total_timesteps | 664000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.251 |\n",
+ "| explained_variance | 0.91295236 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33199 |\n",
+ "| policy_loss | 0.00814 |\n",
+ "| std | 0.278 |\n",
+ "| value_loss | 0.000507 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33300 |\n",
+ "| time_elapsed | 2013 |\n",
+ "| total_timesteps | 666000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.248 |\n",
+ "| explained_variance | 0.9723704 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33299 |\n",
+ "| policy_loss | 0.00126 |\n",
+ "| std | 0.278 |\n",
+ "| value_loss | 0.000105 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.54 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 33400 |\n",
+ "| time_elapsed | 2018 |\n",
+ "| total_timesteps | 668000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.226 |\n",
+ "| explained_variance | 0.96713173 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33399 |\n",
+ "| policy_loss | 0.000928 |\n",
+ "| std | 0.275 |\n",
+ "| value_loss | 0.000142 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 33500 |\n",
+ "| time_elapsed | 2023 |\n",
+ "| total_timesteps | 670000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.221 |\n",
+ "| explained_variance | 0.98223233 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33499 |\n",
+ "| policy_loss | -0.000939 |\n",
+ "| std | 0.275 |\n",
+ "| value_loss | 6.86e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33600 |\n",
+ "| time_elapsed | 2031 |\n",
+ "| total_timesteps | 672000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.215 |\n",
+ "| explained_variance | 0.97093904 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33599 |\n",
+ "| policy_loss | 0.00312 |\n",
+ "| std | 0.275 |\n",
+ "| value_loss | 0.000142 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 330 |\n",
+ "| iterations | 33700 |\n",
+ "| time_elapsed | 2037 |\n",
+ "| total_timesteps | 674000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.212 |\n",
+ "| explained_variance | 0.9846972 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33699 |\n",
+ "| policy_loss | -0.0108 |\n",
+ "| std | 0.274 |\n",
+ "| value_loss | 0.000128 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.196 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 33800 |\n",
+ "| time_elapsed | 2042 |\n",
+ "| total_timesteps | 676000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.224 |\n",
+ "| explained_variance | 0.9849943 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33799 |\n",
+ "| policy_loss | 0.000105 |\n",
+ "| std | 0.276 |\n",
+ "| value_loss | 0.000179 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.05 |\n",
+ "| ep_rew_mean | -0.245 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 33900 |\n",
+ "| time_elapsed | 2046 |\n",
+ "| total_timesteps | 678000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.218 |\n",
+ "| explained_variance | 0.7833823 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33899 |\n",
+ "| policy_loss | 0.000391 |\n",
+ "| std | 0.276 |\n",
+ "| value_loss | 0.000815 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 34000 |\n",
+ "| time_elapsed | 2051 |\n",
+ "| total_timesteps | 680000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.196 |\n",
+ "| explained_variance | 0.98827285 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33999 |\n",
+ "| policy_loss | -0.00283 |\n",
+ "| std | 0.273 |\n",
+ "| value_loss | 4.13e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 34100 |\n",
+ "| time_elapsed | 2055 |\n",
+ "| total_timesteps | 682000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.19 |\n",
+ "| explained_variance | 0.97333074 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34099 |\n",
+ "| policy_loss | 0.00383 |\n",
+ "| std | 0.273 |\n",
+ "| value_loss | 6.79e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 332 |\n",
+ "| iterations | 34200 |\n",
+ "| time_elapsed | 2060 |\n",
+ "| total_timesteps | 684000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.18 |\n",
+ "| explained_variance | 0.96711797 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34199 |\n",
+ "| policy_loss | 0.00052 |\n",
+ "| std | 0.272 |\n",
+ "| value_loss | 7.96e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 331 |\n",
+ "| iterations | 34300 |\n",
+ "| time_elapsed | 2067 |\n",
+ "| total_timesteps | 686000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.167 |\n",
+ "| explained_variance | 0.98783183 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34299 |\n",
+ "| policy_loss | -0.00374 |\n",
+ "| std | 0.27 |\n",
+ "| value_loss | 6.82e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 332 |\n",
+ "| iterations | 34400 |\n",
+ "| time_elapsed | 2072 |\n",
+ "| total_timesteps | 688000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.147 |\n",
+ "| explained_variance | 0.97669905 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34399 |\n",
+ "| policy_loss | -0.00153 |\n",
+ "| std | 0.269 |\n",
+ "| value_loss | 9.47e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.64 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 332 |\n",
+ "| iterations | 34500 |\n",
+ "| time_elapsed | 2076 |\n",
+ "| total_timesteps | 690000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.131 |\n",
+ "| explained_variance | 0.9739228 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34499 |\n",
+ "| policy_loss | -0.00306 |\n",
+ "| std | 0.267 |\n",
+ "| value_loss | 0.000116 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.192 |\n",
+ "| time/ | |\n",
+ "| fps | 332 |\n",
+ "| iterations | 34600 |\n",
+ "| time_elapsed | 2080 |\n",
+ "| total_timesteps | 692000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.112 |\n",
+ "| explained_variance | 0.98734295 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34599 |\n",
+ "| policy_loss | -0.000203 |\n",
+ "| std | 0.266 |\n",
+ "| value_loss | 5.21e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 332 |\n",
+ "| iterations | 34700 |\n",
+ "| time_elapsed | 2084 |\n",
+ "| total_timesteps | 694000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.101 |\n",
+ "| explained_variance | 0.9875052 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34699 |\n",
+ "| policy_loss | -0.000905 |\n",
+ "| std | 0.265 |\n",
+ "| value_loss | 6e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 333 |\n",
+ "| iterations | 34800 |\n",
+ "| time_elapsed | 2088 |\n",
+ "| total_timesteps | 696000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.0791 |\n",
+ "| explained_variance | 0.8980861 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34799 |\n",
+ "| policy_loss | 0.00288 |\n",
+ "| std | 0.263 |\n",
+ "| value_loss | 0.000259 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 333 |\n",
+ "| iterations | 34900 |\n",
+ "| time_elapsed | 2093 |\n",
+ "| total_timesteps | 698000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.0663 |\n",
+ "| explained_variance | 0.89876664 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34899 |\n",
+ "| policy_loss | -0.00562 |\n",
+ "| std | 0.262 |\n",
+ "| value_loss | 0.000626 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.82 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 333 |\n",
+ "| iterations | 35000 |\n",
+ "| time_elapsed | 2097 |\n",
+ "| total_timesteps | 700000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.0387 |\n",
+ "| explained_variance | 0.9740894 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34999 |\n",
+ "| policy_loss | 0.00189 |\n",
+ "| std | 0.261 |\n",
+ "| value_loss | 0.000131 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.97 |\n",
+ "| ep_rew_mean | -0.243 |\n",
+ "| time/ | |\n",
+ "| fps | 333 |\n",
+ "| iterations | 35100 |\n",
+ "| time_elapsed | 2105 |\n",
+ "| total_timesteps | 702000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.049 |\n",
+ "| explained_variance | 0.9775146 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35099 |\n",
+ "| policy_loss | -0.000261 |\n",
+ "| std | 0.262 |\n",
+ "| value_loss | 6.19e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 333 |\n",
+ "| iterations | 35200 |\n",
+ "| time_elapsed | 2109 |\n",
+ "| total_timesteps | 704000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.016 |\n",
+ "| explained_variance | 0.9900468 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35199 |\n",
+ "| policy_loss | -0.00254 |\n",
+ "| std | 0.26 |\n",
+ "| value_loss | 4.96e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35300 |\n",
+ "| time_elapsed | 2113 |\n",
+ "| total_timesteps | 706000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -0.00382 |\n",
+ "| explained_variance | 0.9701562 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35299 |\n",
+ "| policy_loss | 0.000467 |\n",
+ "| std | 0.258 |\n",
+ "| value_loss | 8.83e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35400 |\n",
+ "| time_elapsed | 2117 |\n",
+ "| total_timesteps | 708000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0109 |\n",
+ "| explained_variance | 0.98845273 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35399 |\n",
+ "| policy_loss | 0.00228 |\n",
+ "| std | 0.258 |\n",
+ "| value_loss | 7.26e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.53 |\n",
+ "| ep_rew_mean | -0.192 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35500 |\n",
+ "| time_elapsed | 2122 |\n",
+ "| total_timesteps | 710000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0242 |\n",
+ "| explained_variance | 0.98698086 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35499 |\n",
+ "| policy_loss | 0.000431 |\n",
+ "| std | 0.257 |\n",
+ "| value_loss | 5.86e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35600 |\n",
+ "| time_elapsed | 2126 |\n",
+ "| total_timesteps | 712000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.055 |\n",
+ "| explained_variance | 0.9600166 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35599 |\n",
+ "| policy_loss | -0.00257 |\n",
+ "| std | 0.254 |\n",
+ "| value_loss | 0.00043 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 35700 |\n",
+ "| time_elapsed | 2131 |\n",
+ "| total_timesteps | 714000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0481 |\n",
+ "| explained_variance | 0.98819304 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35699 |\n",
+ "| policy_loss | -0.000964 |\n",
+ "| std | 0.255 |\n",
+ "| value_loss | 0.000127 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35800 |\n",
+ "| time_elapsed | 2139 |\n",
+ "| total_timesteps | 716000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0711 |\n",
+ "| explained_variance | 0.6373999 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35799 |\n",
+ "| policy_loss | -0.00345 |\n",
+ "| std | 0.253 |\n",
+ "| value_loss | 0.00253 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 334 |\n",
+ "| iterations | 35900 |\n",
+ "| time_elapsed | 2144 |\n",
+ "| total_timesteps | 718000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0727 |\n",
+ "| explained_variance | 0.9650853 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35899 |\n",
+ "| policy_loss | 0.000592 |\n",
+ "| std | 0.253 |\n",
+ "| value_loss | 0.000188 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36000 |\n",
+ "| time_elapsed | 2149 |\n",
+ "| total_timesteps | 720000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.0974 |\n",
+ "| explained_variance | 0.96997017 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35999 |\n",
+ "| policy_loss | 0.00803 |\n",
+ "| std | 0.251 |\n",
+ "| value_loss | 0.000125 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.57 |\n",
+ "| ep_rew_mean | -0.195 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36100 |\n",
+ "| time_elapsed | 2153 |\n",
+ "| total_timesteps | 722000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.116 |\n",
+ "| explained_variance | 0.9551712 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36099 |\n",
+ "| policy_loss | 0.00695 |\n",
+ "| std | 0.249 |\n",
+ "| value_loss | 0.000214 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.195 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36200 |\n",
+ "| time_elapsed | 2158 |\n",
+ "| total_timesteps | 724000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.115 |\n",
+ "| explained_variance | 0.9668451 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36199 |\n",
+ "| policy_loss | -0.00064 |\n",
+ "| std | 0.249 |\n",
+ "| value_loss | 9.82e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36300 |\n",
+ "| time_elapsed | 2162 |\n",
+ "| total_timesteps | 726000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.138 |\n",
+ "| explained_variance | 0.9729329 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36299 |\n",
+ "| policy_loss | 0.00271 |\n",
+ "| std | 0.247 |\n",
+ "| value_loss | 7.26e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36400 |\n",
+ "| time_elapsed | 2167 |\n",
+ "| total_timesteps | 728000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.139 |\n",
+ "| explained_variance | 0.93757355 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36399 |\n",
+ "| policy_loss | 0.00147 |\n",
+ "| std | 0.246 |\n",
+ "| value_loss | 0.00028 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36500 |\n",
+ "| time_elapsed | 2175 |\n",
+ "| total_timesteps | 730000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.159 |\n",
+ "| explained_variance | 0.9726939 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36499 |\n",
+ "| policy_loss | 0.000194 |\n",
+ "| std | 0.245 |\n",
+ "| value_loss | 0.000112 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 335 |\n",
+ "| iterations | 36600 |\n",
+ "| time_elapsed | 2179 |\n",
+ "| total_timesteps | 732000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.161 |\n",
+ "| explained_variance | 0.8855412 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36599 |\n",
+ "| policy_loss | -0.00331 |\n",
+ "| std | 0.245 |\n",
+ "| value_loss | 0.000437 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.52 |\n",
+ "| ep_rew_mean | -0.181 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 36700 |\n",
+ "| time_elapsed | 2184 |\n",
+ "| total_timesteps | 734000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.179 |\n",
+ "| explained_variance | 0.9100865 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36699 |\n",
+ "| policy_loss | -0.00193 |\n",
+ "| std | 0.244 |\n",
+ "| value_loss | 0.000243 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 36800 |\n",
+ "| time_elapsed | 2189 |\n",
+ "| total_timesteps | 736000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.179 |\n",
+ "| explained_variance | 0.9853102 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36799 |\n",
+ "| policy_loss | -0.000901 |\n",
+ "| std | 0.243 |\n",
+ "| value_loss | 0.000174 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 36900 |\n",
+ "| time_elapsed | 2194 |\n",
+ "| total_timesteps | 738000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.185 |\n",
+ "| explained_variance | 0.9825456 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36899 |\n",
+ "| policy_loss | -0.00222 |\n",
+ "| std | 0.243 |\n",
+ "| value_loss | 6.28e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37000 |\n",
+ "| time_elapsed | 2198 |\n",
+ "| total_timesteps | 740000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.185 |\n",
+ "| explained_variance | 0.9939991 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36999 |\n",
+ "| policy_loss | 0.00416 |\n",
+ "| std | 0.243 |\n",
+ "| value_loss | 0.000158 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37100 |\n",
+ "| time_elapsed | 2203 |\n",
+ "| total_timesteps | 742000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.198 |\n",
+ "| explained_variance | 0.99118507 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37099 |\n",
+ "| policy_loss | -0.00179 |\n",
+ "| std | 0.241 |\n",
+ "| value_loss | 3.91e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37200 |\n",
+ "| time_elapsed | 2212 |\n",
+ "| total_timesteps | 744000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.21 |\n",
+ "| explained_variance | 0.9838521 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37199 |\n",
+ "| policy_loss | -0.00318 |\n",
+ "| std | 0.24 |\n",
+ "| value_loss | 0.000107 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37300 |\n",
+ "| time_elapsed | 2217 |\n",
+ "| total_timesteps | 746000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.221 |\n",
+ "| explained_variance | 0.98889595 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37299 |\n",
+ "| policy_loss | -0.00152 |\n",
+ "| std | 0.239 |\n",
+ "| value_loss | 5.86e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37400 |\n",
+ "| time_elapsed | 2222 |\n",
+ "| total_timesteps | 748000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.231 |\n",
+ "| explained_variance | 0.9741186 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37399 |\n",
+ "| policy_loss | 0.00128 |\n",
+ "| std | 0.239 |\n",
+ "| value_loss | 0.000114 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.92 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37500 |\n",
+ "| time_elapsed | 2226 |\n",
+ "| total_timesteps | 750000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.255 |\n",
+ "| explained_variance | 0.9857825 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37499 |\n",
+ "| policy_loss | 0.00347 |\n",
+ "| std | 0.236 |\n",
+ "| value_loss | 9.37e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 37600 |\n",
+ "| time_elapsed | 2231 |\n",
+ "| total_timesteps | 752000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.274 |\n",
+ "| explained_variance | 0.97550106 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37599 |\n",
+ "| policy_loss | -0.00678 |\n",
+ "| std | 0.235 |\n",
+ "| value_loss | 9.08e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 37700 |\n",
+ "| time_elapsed | 2236 |\n",
+ "| total_timesteps | 754000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.292 |\n",
+ "| explained_variance | 0.979112 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37699 |\n",
+ "| policy_loss | -0.000219 |\n",
+ "| std | 0.234 |\n",
+ "| value_loss | 8.7e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 37800 |\n",
+ "| time_elapsed | 2241 |\n",
+ "| total_timesteps | 756000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.317 |\n",
+ "| explained_variance | 0.97191083 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37799 |\n",
+ "| policy_loss | 0.00196 |\n",
+ "| std | 0.232 |\n",
+ "| value_loss | 0.000147 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 336 |\n",
+ "| iterations | 37900 |\n",
+ "| time_elapsed | 2249 |\n",
+ "| total_timesteps | 758000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.31 |\n",
+ "| explained_variance | 0.98574996 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37899 |\n",
+ "| policy_loss | -0.00227 |\n",
+ "| std | 0.232 |\n",
+ "| value_loss | 5.8e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38000 |\n",
+ "| time_elapsed | 2254 |\n",
+ "| total_timesteps | 760000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.312 |\n",
+ "| explained_variance | 0.9653875 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37999 |\n",
+ "| policy_loss | -0.00167 |\n",
+ "| std | 0.232 |\n",
+ "| value_loss | 0.000143 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38100 |\n",
+ "| time_elapsed | 2258 |\n",
+ "| total_timesteps | 762000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.337 |\n",
+ "| explained_variance | 0.98960257 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38099 |\n",
+ "| policy_loss | 0.00316 |\n",
+ "| std | 0.23 |\n",
+ "| value_loss | 4.85e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.57 |\n",
+ "| ep_rew_mean | -0.193 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38200 |\n",
+ "| time_elapsed | 2263 |\n",
+ "| total_timesteps | 764000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.342 |\n",
+ "| explained_variance | 0.9582597 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38199 |\n",
+ "| policy_loss | -0.0103 |\n",
+ "| std | 0.23 |\n",
+ "| value_loss | 0.000147 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38300 |\n",
+ "| time_elapsed | 2268 |\n",
+ "| total_timesteps | 766000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.363 |\n",
+ "| explained_variance | 0.99025834 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38299 |\n",
+ "| policy_loss | -0.002 |\n",
+ "| std | 0.228 |\n",
+ "| value_loss | 8.47e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.226 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38400 |\n",
+ "| time_elapsed | 2272 |\n",
+ "| total_timesteps | 768000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.371 |\n",
+ "| explained_variance | 0.9848292 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38399 |\n",
+ "| policy_loss | -0.0029 |\n",
+ "| std | 0.228 |\n",
+ "| value_loss | 9.26e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 38500 |\n",
+ "| time_elapsed | 2277 |\n",
+ "| total_timesteps | 770000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.377 |\n",
+ "| explained_variance | 0.9523325 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38499 |\n",
+ "| policy_loss | 0.000736 |\n",
+ "| std | 0.227 |\n",
+ "| value_loss | 0.000545 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.52 |\n",
+ "| ep_rew_mean | -0.188 |\n",
+ "| time/ | |\n",
+ "| fps | 337 |\n",
+ "| iterations | 38600 |\n",
+ "| time_elapsed | 2285 |\n",
+ "| total_timesteps | 772000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.379 |\n",
+ "| explained_variance | 0.98056465 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38599 |\n",
+ "| policy_loss | -0.00449 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 0.000108 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 38700 |\n",
+ "| time_elapsed | 2289 |\n",
+ "| total_timesteps | 774000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.392 |\n",
+ "| explained_variance | 0.98470736 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38699 |\n",
+ "| policy_loss | -0.00333 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 3.93e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 38800 |\n",
+ "| time_elapsed | 2293 |\n",
+ "| total_timesteps | 776000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.412 |\n",
+ "| explained_variance | 0.8823349 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38799 |\n",
+ "| policy_loss | 0.00308 |\n",
+ "| std | 0.224 |\n",
+ "| value_loss | 0.000418 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.9 |\n",
+ "| ep_rew_mean | -0.229 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 38900 |\n",
+ "| time_elapsed | 2298 |\n",
+ "| total_timesteps | 778000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.418 |\n",
+ "| explained_variance | 0.73990154 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38899 |\n",
+ "| policy_loss | 0.00874 |\n",
+ "| std | 0.223 |\n",
+ "| value_loss | 0.00106 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.64 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39000 |\n",
+ "| time_elapsed | 2302 |\n",
+ "| total_timesteps | 780000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.396 |\n",
+ "| explained_variance | 0.966514 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38999 |\n",
+ "| policy_loss | -0.00503 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.000231 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.234 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39100 |\n",
+ "| time_elapsed | 2307 |\n",
+ "| total_timesteps | 782000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.402 |\n",
+ "| explained_variance | 0.97168076 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39099 |\n",
+ "| policy_loss | 0.00411 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.000162 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.55 |\n",
+ "| ep_rew_mean | -0.187 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 39200 |\n",
+ "| time_elapsed | 2312 |\n",
+ "| total_timesteps | 784000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.393 |\n",
+ "| explained_variance | 0.9620045 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39199 |\n",
+ "| policy_loss | 0.00139 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 0.000167 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.93 |\n",
+ "| ep_rew_mean | -0.228 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39300 |\n",
+ "| time_elapsed | 2321 |\n",
+ "| total_timesteps | 786000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.386 |\n",
+ "| explained_variance | 0.90208817 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39299 |\n",
+ "| policy_loss | -0.00257 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 0.000814 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39400 |\n",
+ "| time_elapsed | 2328 |\n",
+ "| total_timesteps | 788000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.402 |\n",
+ "| explained_variance | 0.9760422 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39399 |\n",
+ "| policy_loss | 0.00142 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.000173 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39500 |\n",
+ "| time_elapsed | 2334 |\n",
+ "| total_timesteps | 790000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.415 |\n",
+ "| explained_variance | 0.97800255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39499 |\n",
+ "| policy_loss | -0.00893 |\n",
+ "| std | 0.224 |\n",
+ "| value_loss | 0.000148 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39600 |\n",
+ "| time_elapsed | 2339 |\n",
+ "| total_timesteps | 792000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.409 |\n",
+ "| explained_variance | 0.5413128 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39599 |\n",
+ "| policy_loss | 0.00744 |\n",
+ "| std | 0.224 |\n",
+ "| value_loss | 0.00194 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39700 |\n",
+ "| time_elapsed | 2344 |\n",
+ "| total_timesteps | 794000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.399 |\n",
+ "| explained_variance | 0.984776 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39699 |\n",
+ "| policy_loss | -0.00425 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 7.3e-05 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.224 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39800 |\n",
+ "| time_elapsed | 2349 |\n",
+ "| total_timesteps | 796000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.399 |\n",
+ "| explained_variance | 0.96791893 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39799 |\n",
+ "| policy_loss | 0.00499 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.000191 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 39900 |\n",
+ "| time_elapsed | 2358 |\n",
+ "| total_timesteps | 798000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.383 |\n",
+ "| explained_variance | -2.428947 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39899 |\n",
+ "| policy_loss | -0.0136 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 0.0102 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.65 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40000 |\n",
+ "| time_elapsed | 2363 |\n",
+ "| total_timesteps | 800000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.369 |\n",
+ "| explained_variance | 0.98433495 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39999 |\n",
+ "| policy_loss | -0.00361 |\n",
+ "| std | 0.228 |\n",
+ "| value_loss | 9.41e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40100 |\n",
+ "| time_elapsed | 2368 |\n",
+ "| total_timesteps | 802000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.365 |\n",
+ "| explained_variance | 0.9824995 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40099 |\n",
+ "| policy_loss | 0.000342 |\n",
+ "| std | 0.228 |\n",
+ "| value_loss | 5.56e-05 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.55 |\n",
+ "| ep_rew_mean | -0.193 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40200 |\n",
+ "| time_elapsed | 2373 |\n",
+ "| total_timesteps | 804000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.402 |\n",
+ "| explained_variance | 0.985323 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40199 |\n",
+ "| policy_loss | 0.00106 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 5.57e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 7.44 |\n",
+ "| ep_rew_mean | -0.644 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40300 |\n",
+ "| time_elapsed | 2378 |\n",
+ "| total_timesteps | 806000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.396 |\n",
+ "| explained_variance | 0.8747574 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40299 |\n",
+ "| policy_loss | -0.0966 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.127 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.99 |\n",
+ "| ep_rew_mean | -0.314 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40400 |\n",
+ "| time_elapsed | 2384 |\n",
+ "| total_timesteps | 808000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.4 |\n",
+ "| explained_variance | -3.9502196 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40399 |\n",
+ "| policy_loss | -0.000899 |\n",
+ "| std | 0.224 |\n",
+ "| value_loss | 0.0228 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.68 |\n",
+ "| ep_rew_mean | -0.293 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40500 |\n",
+ "| time_elapsed | 2392 |\n",
+ "| total_timesteps | 810000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.379 |\n",
+ "| explained_variance | 0.84740096 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40499 |\n",
+ "| policy_loss | -0.000565 |\n",
+ "| std | 0.226 |\n",
+ "| value_loss | 0.0476 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40600 |\n",
+ "| time_elapsed | 2398 |\n",
+ "| total_timesteps | 812000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.395 |\n",
+ "| explained_variance | 0.80924624 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40599 |\n",
+ "| policy_loss | -0.00864 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.00108 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 3.04 |\n",
+ "| ep_rew_mean | -0.236 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40700 |\n",
+ "| time_elapsed | 2402 |\n",
+ "| total_timesteps | 814000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.417 |\n",
+ "| explained_variance | 0.73390794 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40699 |\n",
+ "| policy_loss | 0.0243 |\n",
+ "| std | 0.224 |\n",
+ "| value_loss | 0.0164 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 40800 |\n",
+ "| time_elapsed | 2407 |\n",
+ "| total_timesteps | 816000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.406 |\n",
+ "| explained_variance | 0.97868705 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40799 |\n",
+ "| policy_loss | -0.00146 |\n",
+ "| std | 0.225 |\n",
+ "| value_loss | 0.000133 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 40900 |\n",
+ "| time_elapsed | 2412 |\n",
+ "| total_timesteps | 818000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.429 |\n",
+ "| explained_variance | 0.8363371 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40899 |\n",
+ "| policy_loss | -0.00689 |\n",
+ "| std | 0.223 |\n",
+ "| value_loss | 0.000464 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41000 |\n",
+ "| time_elapsed | 2417 |\n",
+ "| total_timesteps | 820000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.445 |\n",
+ "| explained_variance | 0.977923 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40999 |\n",
+ "| policy_loss | -0.00173 |\n",
+ "| std | 0.222 |\n",
+ "| value_loss | 0.000178 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 41100 |\n",
+ "| time_elapsed | 2426 |\n",
+ "| total_timesteps | 822000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.458 |\n",
+ "| explained_variance | 0.63355607 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41099 |\n",
+ "| policy_loss | 0.0177 |\n",
+ "| std | 0.22 |\n",
+ "| value_loss | 0.00102 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 338 |\n",
+ "| iterations | 41200 |\n",
+ "| time_elapsed | 2431 |\n",
+ "| total_timesteps | 824000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.453 |\n",
+ "| explained_variance | 0.9759229 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41199 |\n",
+ "| policy_loss | -0.0158 |\n",
+ "| std | 0.22 |\n",
+ "| value_loss | 0.000228 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41300 |\n",
+ "| time_elapsed | 2436 |\n",
+ "| total_timesteps | 826000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.445 |\n",
+ "| explained_variance | 0.99040455 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41299 |\n",
+ "| policy_loss | 0.00678 |\n",
+ "| std | 0.221 |\n",
+ "| value_loss | 7.98e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41400 |\n",
+ "| time_elapsed | 2441 |\n",
+ "| total_timesteps | 828000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.458 |\n",
+ "| explained_variance | 0.9926231 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41399 |\n",
+ "| policy_loss | 0.00175 |\n",
+ "| std | 0.22 |\n",
+ "| value_loss | 2.89e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.58 |\n",
+ "| ep_rew_mean | -0.196 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41500 |\n",
+ "| time_elapsed | 2447 |\n",
+ "| total_timesteps | 830000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.47 |\n",
+ "| explained_variance | 0.97897565 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41499 |\n",
+ "| policy_loss | 0.0038 |\n",
+ "| std | 0.219 |\n",
+ "| value_loss | 9.45e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41600 |\n",
+ "| time_elapsed | 2452 |\n",
+ "| total_timesteps | 832000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.446 |\n",
+ "| explained_variance | 0.9452324 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41599 |\n",
+ "| policy_loss | -0.011 |\n",
+ "| std | 0.22 |\n",
+ "| value_loss | 0.000302 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.202 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41700 |\n",
+ "| time_elapsed | 2457 |\n",
+ "| total_timesteps | 834000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.471 |\n",
+ "| explained_variance | 0.9743598 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41699 |\n",
+ "| policy_loss | 0.00613 |\n",
+ "| std | 0.218 |\n",
+ "| value_loss | 0.000198 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.78 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41800 |\n",
+ "| time_elapsed | 2465 |\n",
+ "| total_timesteps | 836000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.465 |\n",
+ "| explained_variance | 0.6682483 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41799 |\n",
+ "| policy_loss | -0.0067 |\n",
+ "| std | 0.219 |\n",
+ "| value_loss | 0.00284 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 41900 |\n",
+ "| time_elapsed | 2469 |\n",
+ "| total_timesteps | 838000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.488 |\n",
+ "| explained_variance | 0.9824863 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41899 |\n",
+ "| policy_loss | -0.00377 |\n",
+ "| std | 0.217 |\n",
+ "| value_loss | 9.89e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 42000 |\n",
+ "| time_elapsed | 2473 |\n",
+ "| total_timesteps | 840000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.508 |\n",
+ "| explained_variance | 0.97226715 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41999 |\n",
+ "| policy_loss | -0.0114 |\n",
+ "| std | 0.216 |\n",
+ "| value_loss | 0.000727 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 42100 |\n",
+ "| time_elapsed | 2477 |\n",
+ "| total_timesteps | 842000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.504 |\n",
+ "| explained_variance | 0.98028255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42099 |\n",
+ "| policy_loss | 0.00354 |\n",
+ "| std | 0.217 |\n",
+ "| value_loss | 0.000129 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.57 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 42200 |\n",
+ "| time_elapsed | 2482 |\n",
+ "| total_timesteps | 844000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.489 |\n",
+ "| explained_variance | 0.96648335 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42199 |\n",
+ "| policy_loss | -0.00583 |\n",
+ "| std | 0.218 |\n",
+ "| value_loss | 0.00013 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42300 |\n",
+ "| time_elapsed | 2487 |\n",
+ "| total_timesteps | 846000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.503 |\n",
+ "| explained_variance | 0.9749611 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42299 |\n",
+ "| policy_loss | 0.00564 |\n",
+ "| std | 0.218 |\n",
+ "| value_loss | 0.000109 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42400 |\n",
+ "| time_elapsed | 2491 |\n",
+ "| total_timesteps | 848000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.524 |\n",
+ "| explained_variance | 0.99248254 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42399 |\n",
+ "| policy_loss | -0.000632 |\n",
+ "| std | 0.216 |\n",
+ "| value_loss | 4.3e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42500 |\n",
+ "| time_elapsed | 2499 |\n",
+ "| total_timesteps | 850000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.546 |\n",
+ "| explained_variance | 0.98732716 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42499 |\n",
+ "| policy_loss | 0.000381 |\n",
+ "| std | 0.214 |\n",
+ "| value_loss | 4.16e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42600 |\n",
+ "| time_elapsed | 2504 |\n",
+ "| total_timesteps | 852000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.548 |\n",
+ "| explained_variance | 0.9690981 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42599 |\n",
+ "| policy_loss | -0.0125 |\n",
+ "| std | 0.213 |\n",
+ "| value_loss | 0.000401 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.58 |\n",
+ "| ep_rew_mean | -0.194 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42700 |\n",
+ "| time_elapsed | 2510 |\n",
+ "| total_timesteps | 854000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.548 |\n",
+ "| explained_variance | 0.948852 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42699 |\n",
+ "| policy_loss | 0.00354 |\n",
+ "| std | 0.214 |\n",
+ "| value_loss | 9.49e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42800 |\n",
+ "| time_elapsed | 2514 |\n",
+ "| total_timesteps | 856000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.541 |\n",
+ "| explained_variance | 0.9658161 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42799 |\n",
+ "| policy_loss | -3.41e-05 |\n",
+ "| std | 0.214 |\n",
+ "| value_loss | 0.000158 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 42900 |\n",
+ "| time_elapsed | 2519 |\n",
+ "| total_timesteps | 858000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.531 |\n",
+ "| explained_variance | 0.9794916 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42899 |\n",
+ "| policy_loss | 0.00583 |\n",
+ "| std | 0.214 |\n",
+ "| value_loss | 9.42e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.47 |\n",
+ "| ep_rew_mean | -0.187 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43000 |\n",
+ "| time_elapsed | 2524 |\n",
+ "| total_timesteps | 860000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.527 |\n",
+ "| explained_variance | 0.98890656 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42999 |\n",
+ "| policy_loss | -0.00859 |\n",
+ "| std | 0.214 |\n",
+ "| value_loss | 0.000106 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43100 |\n",
+ "| time_elapsed | 2528 |\n",
+ "| total_timesteps | 862000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.519 |\n",
+ "| explained_variance | 0.97553414 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43099 |\n",
+ "| policy_loss | -0.00251 |\n",
+ "| std | 0.215 |\n",
+ "| value_loss | 6.28e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43200 |\n",
+ "| time_elapsed | 2537 |\n",
+ "| total_timesteps | 864000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.553 |\n",
+ "| explained_variance | 0.9948936 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43199 |\n",
+ "| policy_loss | 0.000498 |\n",
+ "| std | 0.212 |\n",
+ "| value_loss | 3.8e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43300 |\n",
+ "| time_elapsed | 2542 |\n",
+ "| total_timesteps | 866000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.561 |\n",
+ "| explained_variance | 0.9760311 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43299 |\n",
+ "| policy_loss | -0.00212 |\n",
+ "| std | 0.212 |\n",
+ "| value_loss | 0.000129 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.231 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43400 |\n",
+ "| time_elapsed | 2547 |\n",
+ "| total_timesteps | 868000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.558 |\n",
+ "| explained_variance | 0.9611102 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43399 |\n",
+ "| policy_loss | -0.000699 |\n",
+ "| std | 0.212 |\n",
+ "| value_loss | 0.000191 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43500 |\n",
+ "| time_elapsed | 2551 |\n",
+ "| total_timesteps | 870000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.573 |\n",
+ "| explained_variance | 0.98930174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43499 |\n",
+ "| policy_loss | -0.0037 |\n",
+ "| std | 0.211 |\n",
+ "| value_loss | 4.31e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.201 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 43600 |\n",
+ "| time_elapsed | 2556 |\n",
+ "| total_timesteps | 872000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.586 |\n",
+ "| explained_variance | 0.98348564 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43599 |\n",
+ "| policy_loss | 0.00287 |\n",
+ "| std | 0.21 |\n",
+ "| value_loss | 0.000106 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.199 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 43700 |\n",
+ "| time_elapsed | 2562 |\n",
+ "| total_timesteps | 874000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.616 |\n",
+ "| explained_variance | 0.69001275 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43699 |\n",
+ "| policy_loss | 0.0157 |\n",
+ "| std | 0.208 |\n",
+ "| value_loss | 0.00192 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43800 |\n",
+ "| time_elapsed | 2571 |\n",
+ "| total_timesteps | 876000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.615 |\n",
+ "| explained_variance | 0.97150284 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43799 |\n",
+ "| policy_loss | 0.000559 |\n",
+ "| std | 0.208 |\n",
+ "| value_loss | 0.000119 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.85 |\n",
+ "| ep_rew_mean | -0.218 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 43900 |\n",
+ "| time_elapsed | 2576 |\n",
+ "| total_timesteps | 878000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.633 |\n",
+ "| explained_variance | 0.98416793 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43899 |\n",
+ "| policy_loss | 0.00093 |\n",
+ "| std | 0.206 |\n",
+ "| value_loss | 0.000116 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.64 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 44000 |\n",
+ "| time_elapsed | 2582 |\n",
+ "| total_timesteps | 880000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.618 |\n",
+ "| explained_variance | 0.9784064 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43999 |\n",
+ "| policy_loss | 0.00364 |\n",
+ "| std | 0.208 |\n",
+ "| value_loss | 0.000147 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 44100 |\n",
+ "| time_elapsed | 2586 |\n",
+ "| total_timesteps | 882000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.635 |\n",
+ "| explained_variance | 0.9662014 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44099 |\n",
+ "| policy_loss | -0.00547 |\n",
+ "| std | 0.207 |\n",
+ "| value_loss | 0.000243 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.89 |\n",
+ "| ep_rew_mean | -0.241 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44200 |\n",
+ "| time_elapsed | 2591 |\n",
+ "| total_timesteps | 884000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.656 |\n",
+ "| explained_variance | 0.7693075 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44199 |\n",
+ "| policy_loss | 0.0143 |\n",
+ "| std | 0.206 |\n",
+ "| value_loss | 0.00191 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44300 |\n",
+ "| time_elapsed | 2595 |\n",
+ "| total_timesteps | 886000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.646 |\n",
+ "| explained_variance | 0.9649852 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44299 |\n",
+ "| policy_loss | -0.00818 |\n",
+ "| std | 0.206 |\n",
+ "| value_loss | 0.000203 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44400 |\n",
+ "| time_elapsed | 2600 |\n",
+ "| total_timesteps | 888000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.677 |\n",
+ "| explained_variance | 0.9866615 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44399 |\n",
+ "| policy_loss | -0.00452 |\n",
+ "| std | 0.204 |\n",
+ "| value_loss | 0.000119 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44500 |\n",
+ "| time_elapsed | 2608 |\n",
+ "| total_timesteps | 890000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.688 |\n",
+ "| explained_variance | 0.98133665 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44499 |\n",
+ "| policy_loss | 0.00382 |\n",
+ "| std | 0.204 |\n",
+ "| value_loss | 0.000157 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.189 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44600 |\n",
+ "| time_elapsed | 2612 |\n",
+ "| total_timesteps | 892000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.697 |\n",
+ "| explained_variance | 0.9878949 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44599 |\n",
+ "| policy_loss | -0.000211 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 6.87e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.7 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44700 |\n",
+ "| time_elapsed | 2617 |\n",
+ "| total_timesteps | 894000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.71 |\n",
+ "| explained_variance | 0.9808317 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44699 |\n",
+ "| policy_loss | -0.000497 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000117 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44800 |\n",
+ "| time_elapsed | 2622 |\n",
+ "| total_timesteps | 896000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.737 |\n",
+ "| explained_variance | 0.9543187 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44799 |\n",
+ "| policy_loss | -0.0002 |\n",
+ "| std | 0.201 |\n",
+ "| value_loss | 0.00014 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 44900 |\n",
+ "| time_elapsed | 2626 |\n",
+ "| total_timesteps | 898000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.736 |\n",
+ "| explained_variance | 0.9573474 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44899 |\n",
+ "| policy_loss | -0.016 |\n",
+ "| std | 0.2 |\n",
+ "| value_loss | 0.000362 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.47 |\n",
+ "| ep_rew_mean | -0.181 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45000 |\n",
+ "| time_elapsed | 2631 |\n",
+ "| total_timesteps | 900000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.706 |\n",
+ "| explained_variance | 0.9849114 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44999 |\n",
+ "| policy_loss | 0.00167 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.000118 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.207 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45100 |\n",
+ "| time_elapsed | 2636 |\n",
+ "| total_timesteps | 902000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.696 |\n",
+ "| explained_variance | 0.80178624 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45099 |\n",
+ "| policy_loss | 0.00337 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.00101 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.64 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 45200 |\n",
+ "| time_elapsed | 2644 |\n",
+ "| total_timesteps | 904000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.717 |\n",
+ "| explained_variance | 0.9752399 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45199 |\n",
+ "| policy_loss | -0.00655 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.00014 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.204 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45300 |\n",
+ "| time_elapsed | 2649 |\n",
+ "| total_timesteps | 906000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.709 |\n",
+ "| explained_variance | 0.98351824 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45299 |\n",
+ "| policy_loss | -0.00476 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 8.73e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45400 |\n",
+ "| time_elapsed | 2653 |\n",
+ "| total_timesteps | 908000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.71 |\n",
+ "| explained_variance | 0.99506396 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45399 |\n",
+ "| policy_loss | -0.00952 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.000135 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45500 |\n",
+ "| time_elapsed | 2658 |\n",
+ "| total_timesteps | 910000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.711 |\n",
+ "| explained_variance | 0.9815719 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45499 |\n",
+ "| policy_loss | 0.00707 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.000139 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.45 |\n",
+ "| ep_rew_mean | -0.171 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 45600 |\n",
+ "| time_elapsed | 2665 |\n",
+ "| total_timesteps | 912000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.718 |\n",
+ "| explained_variance | 0.93521357 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45599 |\n",
+ "| policy_loss | 0.016 |\n",
+ "| std | 0.203 |\n",
+ "| value_loss | 0.00045 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.93 |\n",
+ "| ep_rew_mean | -0.235 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 45700 |\n",
+ "| time_elapsed | 2676 |\n",
+ "| total_timesteps | 914000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.724 |\n",
+ "| explained_variance | 0.9420419 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45699 |\n",
+ "| policy_loss | -0.00632 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000384 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 45800 |\n",
+ "| time_elapsed | 2682 |\n",
+ "| total_timesteps | 916000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.739 |\n",
+ "| explained_variance | 0.95885766 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45799 |\n",
+ "| policy_loss | 0.00537 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000219 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 45900 |\n",
+ "| time_elapsed | 2689 |\n",
+ "| total_timesteps | 918000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.743 |\n",
+ "| explained_variance | 0.94796073 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45899 |\n",
+ "| policy_loss | -6.43e-05 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000202 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 46000 |\n",
+ "| time_elapsed | 2695 |\n",
+ "| total_timesteps | 920000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.734 |\n",
+ "| explained_variance | 0.9870241 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45999 |\n",
+ "| policy_loss | 0.00297 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 6.71e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 46100 |\n",
+ "| time_elapsed | 2701 |\n",
+ "| total_timesteps | 922000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.735 |\n",
+ "| explained_variance | 0.97447634 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46099 |\n",
+ "| policy_loss | -0.00677 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000132 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 46200 |\n",
+ "| time_elapsed | 2707 |\n",
+ "| total_timesteps | 924000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.748 |\n",
+ "| explained_variance | 0.9668383 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46199 |\n",
+ "| policy_loss | -0.00958 |\n",
+ "| std | 0.201 |\n",
+ "| value_loss | 0.00026 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.83 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46300 |\n",
+ "| time_elapsed | 2717 |\n",
+ "| total_timesteps | 926000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.753 |\n",
+ "| explained_variance | 0.9713844 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46299 |\n",
+ "| policy_loss | -0.00175 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 0.000267 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46400 |\n",
+ "| time_elapsed | 2724 |\n",
+ "| total_timesteps | 928000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.744 |\n",
+ "| explained_variance | 0.9690941 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46399 |\n",
+ "| policy_loss | 0.00198 |\n",
+ "| std | 0.202 |\n",
+ "| value_loss | 9.71e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46500 |\n",
+ "| time_elapsed | 2730 |\n",
+ "| total_timesteps | 930000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.752 |\n",
+ "| explained_variance | 0.98169756 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46499 |\n",
+ "| policy_loss | -0.00182 |\n",
+ "| std | 0.201 |\n",
+ "| value_loss | 7.57e-05 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46600 |\n",
+ "| time_elapsed | 2736 |\n",
+ "| total_timesteps | 932000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.768 |\n",
+ "| explained_variance | 0.958521 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46599 |\n",
+ "| policy_loss | 0.00796 |\n",
+ "| std | 0.2 |\n",
+ "| value_loss | 0.000321 |\n",
+ "------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.54 |\n",
+ "| ep_rew_mean | -0.194 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46700 |\n",
+ "| time_elapsed | 2743 |\n",
+ "| total_timesteps | 934000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.771 |\n",
+ "| explained_variance | 0.9603 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46699 |\n",
+ "| policy_loss | 0.00811 |\n",
+ "| std | 0.2 |\n",
+ "| value_loss | 0.000171 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.6 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 339 |\n",
+ "| iterations | 46800 |\n",
+ "| time_elapsed | 2753 |\n",
+ "| total_timesteps | 936000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.768 |\n",
+ "| explained_variance | 0.9908145 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46799 |\n",
+ "| policy_loss | 0.00219 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 7.84e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.198 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 46900 |\n",
+ "| time_elapsed | 2757 |\n",
+ "| total_timesteps | 938000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.781 |\n",
+ "| explained_variance | 0.9639614 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46899 |\n",
+ "| policy_loss | 0.0111 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 0.000552 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.213 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47000 |\n",
+ "| time_elapsed | 2762 |\n",
+ "| total_timesteps | 940000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.787 |\n",
+ "| explained_variance | 0.97391367 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46999 |\n",
+ "| policy_loss | 0.00372 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 0.000137 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.81 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47100 |\n",
+ "| time_elapsed | 2766 |\n",
+ "| total_timesteps | 942000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.788 |\n",
+ "| explained_variance | 0.97501403 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47099 |\n",
+ "| policy_loss | -0.0168 |\n",
+ "| std | 0.198 |\n",
+ "| value_loss | 0.000357 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.59 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47200 |\n",
+ "| time_elapsed | 2771 |\n",
+ "| total_timesteps | 944000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.786 |\n",
+ "| explained_variance | 0.7917006 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47199 |\n",
+ "| policy_loss | 0.0273 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 0.00183 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.217 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47300 |\n",
+ "| time_elapsed | 2775 |\n",
+ "| total_timesteps | 946000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.784 |\n",
+ "| explained_variance | 0.9474554 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47299 |\n",
+ "| policy_loss | 0.0125 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 0.000405 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.74 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 47400 |\n",
+ "| time_elapsed | 2779 |\n",
+ "| total_timesteps | 948000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.786 |\n",
+ "| explained_variance | 0.98800665 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47399 |\n",
+ "| policy_loss | 0.00237 |\n",
+ "| std | 0.198 |\n",
+ "| value_loss | 8.46e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.211 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47500 |\n",
+ "| time_elapsed | 2787 |\n",
+ "| total_timesteps | 950000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.781 |\n",
+ "| explained_variance | 0.9724678 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47499 |\n",
+ "| policy_loss | 0.0024 |\n",
+ "| std | 0.199 |\n",
+ "| value_loss | 0.000164 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 340 |\n",
+ "| iterations | 47600 |\n",
+ "| time_elapsed | 2792 |\n",
+ "| total_timesteps | 952000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.785 |\n",
+ "| explained_variance | 0.99027014 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47599 |\n",
+ "| policy_loss | -0.00857 |\n",
+ "| std | 0.198 |\n",
+ "| value_loss | 0.000118 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.8 |\n",
+ "| ep_rew_mean | -0.223 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 47700 |\n",
+ "| time_elapsed | 2796 |\n",
+ "| total_timesteps | 954000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.789 |\n",
+ "| explained_variance | 0.990915 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47699 |\n",
+ "| policy_loss | -0.00501 |\n",
+ "| std | 0.198 |\n",
+ "| value_loss | 6.49e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.69 |\n",
+ "| ep_rew_mean | -0.209 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 47800 |\n",
+ "| time_elapsed | 2800 |\n",
+ "| total_timesteps | 956000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.792 |\n",
+ "| explained_variance | 0.9743882 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47799 |\n",
+ "| policy_loss | -0.00831 |\n",
+ "| std | 0.198 |\n",
+ "| value_loss | 0.00032 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 47900 |\n",
+ "| time_elapsed | 2806 |\n",
+ "| total_timesteps | 958000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.814 |\n",
+ "| explained_variance | 0.9837645 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47899 |\n",
+ "| policy_loss | 0.00121 |\n",
+ "| std | 0.196 |\n",
+ "| value_loss | 7.55e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48000 |\n",
+ "| time_elapsed | 2810 |\n",
+ "| total_timesteps | 960000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.816 |\n",
+ "| explained_variance | 0.9931013 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47999 |\n",
+ "| policy_loss | -0.00325 |\n",
+ "| std | 0.196 |\n",
+ "| value_loss | 5.91e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.58 |\n",
+ "| ep_rew_mean | -0.197 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48100 |\n",
+ "| time_elapsed | 2815 |\n",
+ "| total_timesteps | 962000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.838 |\n",
+ "| explained_variance | 0.97392714 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48099 |\n",
+ "| policy_loss | -0.00442 |\n",
+ "| std | 0.195 |\n",
+ "| value_loss | 7.03e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.205 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48200 |\n",
+ "| time_elapsed | 2823 |\n",
+ "| total_timesteps | 964000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.853 |\n",
+ "| explained_variance | 0.9561405 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48199 |\n",
+ "| policy_loss | 0.0005 |\n",
+ "| std | 0.194 |\n",
+ "| value_loss | 0.000178 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.63 |\n",
+ "| ep_rew_mean | -0.195 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48300 |\n",
+ "| time_elapsed | 2827 |\n",
+ "| total_timesteps | 966000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.871 |\n",
+ "| explained_variance | 0.98603976 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48299 |\n",
+ "| policy_loss | 0.0108 |\n",
+ "| std | 0.193 |\n",
+ "| value_loss | 0.00013 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.206 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48400 |\n",
+ "| time_elapsed | 2832 |\n",
+ "| total_timesteps | 968000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.876 |\n",
+ "| explained_variance | 0.8844548 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48399 |\n",
+ "| policy_loss | -0.00342 |\n",
+ "| std | 0.192 |\n",
+ "| value_loss | 0.000601 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.61 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 341 |\n",
+ "| iterations | 48500 |\n",
+ "| time_elapsed | 2837 |\n",
+ "| total_timesteps | 970000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.892 |\n",
+ "| explained_variance | 0.97415155 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48499 |\n",
+ "| policy_loss | 0.00535 |\n",
+ "| std | 0.191 |\n",
+ "| value_loss | 0.00015 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.75 |\n",
+ "| ep_rew_mean | -0.215 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 48600 |\n",
+ "| time_elapsed | 2841 |\n",
+ "| total_timesteps | 972000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.903 |\n",
+ "| explained_variance | 0.9617835 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48599 |\n",
+ "| policy_loss | 0.0132 |\n",
+ "| std | 0.191 |\n",
+ "| value_loss | 0.000383 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.72 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 48700 |\n",
+ "| time_elapsed | 2846 |\n",
+ "| total_timesteps | 974000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.89 |\n",
+ "| explained_variance | 0.9830824 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48699 |\n",
+ "| policy_loss | 0.00573 |\n",
+ "| std | 0.191 |\n",
+ "| value_loss | 0.000103 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.214 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 48800 |\n",
+ "| time_elapsed | 2850 |\n",
+ "| total_timesteps | 976000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.887 |\n",
+ "| explained_variance | 0.9614461 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48799 |\n",
+ "| policy_loss | -0.0139 |\n",
+ "| std | 0.192 |\n",
+ "| value_loss | 0.000635 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.76 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 48900 |\n",
+ "| time_elapsed | 2858 |\n",
+ "| total_timesteps | 978000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.88 |\n",
+ "| explained_variance | 0.9425929 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48899 |\n",
+ "| policy_loss | 4.7e-05 |\n",
+ "| std | 0.192 |\n",
+ "| value_loss | 0.000399 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.62 |\n",
+ "| ep_rew_mean | -0.208 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49000 |\n",
+ "| time_elapsed | 2862 |\n",
+ "| total_timesteps | 980000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.873 |\n",
+ "| explained_variance | 0.9772742 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48999 |\n",
+ "| policy_loss | 0.014 |\n",
+ "| std | 0.193 |\n",
+ "| value_loss | 0.000161 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.5 |\n",
+ "| ep_rew_mean | -0.185 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49100 |\n",
+ "| time_elapsed | 2868 |\n",
+ "| total_timesteps | 982000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.878 |\n",
+ "| explained_variance | 0.97900677 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49099 |\n",
+ "| policy_loss | -0.00167 |\n",
+ "| std | 0.191 |\n",
+ "| value_loss | 0.000136 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.77 |\n",
+ "| ep_rew_mean | -0.216 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49200 |\n",
+ "| time_elapsed | 2872 |\n",
+ "| total_timesteps | 984000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.883 |\n",
+ "| explained_variance | 0.96298355 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49199 |\n",
+ "| policy_loss | -0.000752 |\n",
+ "| std | 0.191 |\n",
+ "| value_loss | 9.23e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.79 |\n",
+ "| ep_rew_mean | -0.222 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49300 |\n",
+ "| time_elapsed | 2876 |\n",
+ "| total_timesteps | 986000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.911 |\n",
+ "| explained_variance | 0.98365396 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49299 |\n",
+ "| policy_loss | -0.00456 |\n",
+ "| std | 0.189 |\n",
+ "| value_loss | 0.000266 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.68 |\n",
+ "| ep_rew_mean | -0.212 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49400 |\n",
+ "| time_elapsed | 2881 |\n",
+ "| total_timesteps | 988000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.931 |\n",
+ "| explained_variance | 0.95916724 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49399 |\n",
+ "| policy_loss | -0.00675 |\n",
+ "| std | 0.187 |\n",
+ "| value_loss | 0.000311 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.73 |\n",
+ "| ep_rew_mean | -0.219 |\n",
+ "| time/ | |\n",
+ "| fps | 343 |\n",
+ "| iterations | 49500 |\n",
+ "| time_elapsed | 2885 |\n",
+ "| total_timesteps | 990000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.918 |\n",
+ "| explained_variance | 0.99461377 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49499 |\n",
+ "| policy_loss | 0.0052 |\n",
+ "| std | 0.188 |\n",
+ "| value_loss | 5.61e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.67 |\n",
+ "| ep_rew_mean | -0.203 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49600 |\n",
+ "| time_elapsed | 2893 |\n",
+ "| total_timesteps | 992000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.942 |\n",
+ "| explained_variance | 0.9908923 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49599 |\n",
+ "| policy_loss | 0.00297 |\n",
+ "| std | 0.187 |\n",
+ "| value_loss | 9.09e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.51 |\n",
+ "| ep_rew_mean | -0.186 |\n",
+ "| time/ | |\n",
+ "| fps | 342 |\n",
+ "| iterations | 49700 |\n",
+ "| time_elapsed | 2898 |\n",
+ "| total_timesteps | 994000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.961 |\n",
+ "| explained_variance | 0.97241545 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49699 |\n",
+ "| policy_loss | -0.00635 |\n",
+ "| std | 0.186 |\n",
+ "| value_loss | 0.000121 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.71 |\n",
+ "| ep_rew_mean | -0.21 |\n",
+ "| time/ | |\n",
+ "| fps | 343 |\n",
+ "| iterations | 49800 |\n",
+ "| time_elapsed | 2902 |\n",
+ "| total_timesteps | 996000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.958 |\n",
+ "| explained_variance | 0.99666107 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49799 |\n",
+ "| policy_loss | -0.00314 |\n",
+ "| std | 0.186 |\n",
+ "| value_loss | 4e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.66 |\n",
+ "| ep_rew_mean | -0.2 |\n",
+ "| time/ | |\n",
+ "| fps | 343 |\n",
+ "| iterations | 49900 |\n",
+ "| time_elapsed | 2906 |\n",
+ "| total_timesteps | 998000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.943 |\n",
+ "| explained_variance | 0.9459344 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49899 |\n",
+ "| policy_loss | -0.00787 |\n",
+ "| std | 0.187 |\n",
+ "| value_loss | 0.000279 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 2.86 |\n",
+ "| ep_rew_mean | -0.22 |\n",
+ "| time/ | |\n",
+ "| fps | 343 |\n",
+ "| iterations | 50000 |\n",
+ "| time_elapsed | 2911 |\n",
+ "| total_timesteps | 1000000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | 0.965 |\n",
+ "| explained_variance | 0.93244386 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49999 |\n",
+ "| policy_loss | -0.0262 |\n",
+ "| std | 0.186 |\n",
+ "| value_loss | 0.00066 |\n",
+ "--------------------------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.learn(1_000_000)"
+ ]
},
{
"cell_type": "code",
+ "execution_count": 28,
+ "metadata": {
+ "id": "MfYtjj19cKFr"
+ },
+ "outputs": [],
"source": [
"# Save the model and VecNormalize statistics when saving the agent\n",
"model.save(\"a2c-PandaReachDense-v3\")\n",
"env.save(\"vec_normalize.pkl\")"
- ],
- "metadata": {
- "id": "MfYtjj19cKFr"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "01M9GCd32Ig-"
+ },
"source": [
"### Evaluate the agent 📈\n",
"- Now that's our agent is trained, we need to **check its performance**.\n",
"- Stable-Baselines3 provides a method to do that: `evaluate_policy`"
- ],
- "metadata": {
- "id": "01M9GCd32Ig-"
- }
+ ]
},
{
"cell_type": "code",
+ "execution_count": 29,
+ "metadata": {
+ "id": "liirTVoDkHq3"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "Mean reward = -0.26 +/- 0.09\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/stable_baselines3/common/evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
"source": [
"from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
"\n",
@@ -570,27 +9670,25 @@
"mean_reward, std_reward = evaluate_policy(model, eval_env)\n",
"\n",
"print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")"
- ],
- "metadata": {
- "id": "liirTVoDkHq3"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "44L9LVQaavR8"
+ },
"source": [
"### Publish your trained model on the Hub 🔥\n",
"Now that we saw we got good results after the training, we can publish our trained model on the Hub with one line of code.\n",
"\n",
"📚 The libraries documentation 👉 https://github.com/huggingface/huggingface_sb3/tree/main#hugging-face--x-stable-baselines3-v20\n"
- ],
- "metadata": {
- "id": "44L9LVQaavR8"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "MkMk99m8bgaQ"
+ },
"source": [
"By using `package_to_hub`, as we already mentionned in the former units, **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.\n",
"\n",
@@ -599,10 +9697,7 @@
"- You can **visualize your agent playing** 👀\n",
"- You can **share with the community an agent that others can use** 💾\n",
"- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉 https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard\n"
- ],
- "metadata": {
- "id": "MkMk99m8bgaQ"
- }
+ ]
},
{
"cell_type": "markdown",
@@ -655,15 +9750,350 @@
},
{
"cell_type": "markdown",
- "source": [
- "For this environment, **running this cell can take approximately 10min**"
- ],
"metadata": {
"id": "juxItTNf1W74"
- }
+ },
+ "source": [
+ "For this environment, **running this cell can take approximately 10min**"
+ ]
},
{
"cell_type": "code",
+ "execution_count": 31,
+ "metadata": {
+ "id": "V1N8r8QVwcCE"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n",
+ "create a model card and push everything to the hub. It might take up to 1min.\n",
+ "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n",
+ "Saving video to /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n",
+ "MoviePy - Building video /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4.\n",
+ "MoviePy - Writing video /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ " \r"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MoviePy - Done !\n",
+ "MoviePy - video ready /tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n",
+ " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n",
+ " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n",
+ " libavutil 58. 29.100 / 58. 29.100\n",
+ " libavcodec 60. 31.102 / 60. 31.102\n",
+ " libavformat 60. 16.100 / 60. 16.100\n",
+ " libavdevice 60. 3.100 / 60. 3.100\n",
+ " libavfilter 9. 12.100 / 9. 12.100\n",
+ " libswscale 7. 5.100 / 7. 5.100\n",
+ " libswresample 4. 12.100 / 4. 12.100\n",
+ " libpostproc 57. 3.100 / 57. 3.100\n",
+ "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmpdcvmxwip/-step-0-to-step-1000.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf61.1.100\n",
+ " Duration: 00:00:40.00, start: 0.000000, bitrate: 118 kb/s\n",
+ " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 720x480, 116 kb/s, 25 fps, 25 tbr, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc61.3.100 libx264\n",
+ "Stream mapping:\n",
+ " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n",
+ "Press [q] to stop, [?] for help\n",
+ "[libx264 @ 0x55777995a9c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n",
+ "[libx264 @ 0x55777995a9c0] profile High, level 3.0, 4:2:0, 8-bit\n",
+ "[libx264 @ 0x55777995a9c0] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n",
+ "Output #0, mp4, to '/tmp/tmpo6y5pqyw/replay.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf60.16.100\n",
+ " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 720x480, q=2-31, 25 fps, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc60.31.102 libx264\n",
+ " Side data:\n",
+ " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n",
+ "[out#0/mp4 @ 0x5577798d6080] video:551kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.198598%\n",
+ "frame= 1000 fps=870 q=-1.0 Lsize= 564kB time=00:00:39.88 bitrate= 115.8kbits/s speed=34.7x \n",
+ "[libx264 @ 0x55777995a9c0] frame I:4 Avg QP:14.60 size: 7429\n",
+ "[libx264 @ 0x55777995a9c0] frame P:297 Avg QP:23.56 size: 727\n",
+ "[libx264 @ 0x55777995a9c0] frame B:699 Avg QP:23.15 size: 455\n",
+ "[libx264 @ 0x55777995a9c0] consecutive B-frames: 1.9% 9.2% 16.5% 72.4%\n",
+ "[libx264 @ 0x55777995a9c0] mb I I16..4: 24.4% 58.9% 16.6%\n",
+ "[libx264 @ 0x55777995a9c0] mb P I16..4: 0.1% 0.5% 0.3% P16..4: 2.5% 1.1% 0.7% 0.0% 0.0% skip:94.7%\n",
+ "[libx264 @ 0x55777995a9c0] mb B I16..4: 0.1% 0.1% 0.2% B16..8: 3.2% 1.1% 0.4% direct: 0.1% skip:94.9% L0:55.3% L1:43.6% BI: 1.1%\n",
+ "[libx264 @ 0x55777995a9c0] 8x8 transform intra:47.8% inter:8.5%\n",
+ "[libx264 @ 0x55777995a9c0] coded y,uvDC,uvAC intra: 18.0% 5.8% 4.5% inter: 0.7% 0.0% 0.0%\n",
+ "[libx264 @ 0x55777995a9c0] i16 v,h,dc,p: 66% 15% 18% 0%\n",
+ "[libx264 @ 0x55777995a9c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 38% 11% 49% 0% 0% 0% 0% 0% 0%\n",
+ "[libx264 @ 0x55777995a9c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 17% 34% 3% 5% 5% 6% 3% 5%\n",
+ "[libx264 @ 0x55777995a9c0] i8c dc,h,v,p: 94% 3% 3% 0%\n",
+ "[libx264 @ 0x55777995a9c0] Weighted P-Frames: Y:0.0% UV:0.0%\n",
+ "[libx264 @ 0x55777995a9c0] ref P L0: 44.0% 2.7% 36.4% 16.9%\n",
+ "[libx264 @ 0x55777995a9c0] ref B L0: 67.2% 24.2% 8.7%\n",
+ "[libx264 @ 0x55777995a9c0] ref B L1: 94.0% 6.0%\n",
+ "[libx264 @ 0x55777995a9c0] kb/s:112.80\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Pushing repo turbo-maikol/a2c-PandaReachDense-v3 to the Hugging Face\n",
+ "Hub\u001b[0m\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B \n",
+ "\u001b[A\n",
+ "Processing Files (1 / 1) : 0%| | 1.26kB / 789kB, ???B/s \n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (1 / 6) : 70%|███████ | 553kB / 789kB, 690kB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (6 / 6) : 100%|██████████| 789kB / 789kB, 563kB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (6 / 6) : 100%|██████████| 789kB / 789kB, 394kB/s \n",
+ "New Data Upload : 100%|██████████| 788kB / 788kB, 394kB/s \n",
+ " ...ReachDense-v3/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n",
+ " ...aReachDense-v3/policy.optimizer.pth: 100%|██████████| 48.9kB / 48.9kB \n",
+ " ...w/a2c-PandaReachDense-v3/policy.pth: 100%|██████████| 46.8kB / 46.8kB \n",
+ " ...o6y5pqyw/a2c-PandaReachDense-v3.zip: 100%|██████████| 113kB / 113kB \n",
+ " /tmp/tmpo6y5pqyw/replay.mp4 : 100%|██████████| 577kB / 577kB \n",
+ " /tmp/tmpo6y5pqyw/vec_normalize.pkl : 100%|██████████| 2.61kB / 2.61kB \n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n",
+ "https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3/tree/main/\u001b[0m\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3/commit/ca3b9e054bb58644bb45ae278b3f9887e1f7081d', commit_message='Initial commit', commit_description='', oid='ca3b9e054bb58644bb45ae278b3f9887e1f7081d', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/a2c-PandaReachDense-v3', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/a2c-PandaReachDense-v3'), pr_revision=None, pr_num=None)"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"from huggingface_sb3 import package_to_hub\n",
"\n",
@@ -673,18 +10103,16 @@
" model_architecture=\"A2C\",\n",
" env_id=env_id,\n",
" eval_env=eval_env,\n",
- " repo_id=f\"ThomasSimonini/a2c-{env_id}\", # Change the username\n",
+ " repo_id=f\"turbo-maikol/a2c-{env_id}\", # Change the username\n",
" commit_message=\"Initial commit\",\n",
")"
- ],
- "metadata": {
- "id": "V1N8r8QVwcCE"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "G3xy3Nf3c2O1"
+ },
"source": [
"## Some additional challenges 🏆\n",
"The best way to learn **is to try things by your own**! Why not trying `PandaPickAndPlace-v3`?\n",
@@ -705,22 +10133,9436 @@
"6. Save the model and VecNormalize statistics when saving the agent\n",
"7. Evaluate your agent\n",
"8. Publish your trained model on the Hub 🔥 with `package_to_hub`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "Using cuda device\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 247 |\n",
+ "| iterations | 100 |\n",
+ "| time_elapsed | 8 |\n",
+ "| total_timesteps | 2000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.92173636 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 99 |\n",
+ "| policy_loss | -0.453 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0769 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 269 |\n",
+ "| iterations | 200 |\n",
+ "| time_elapsed | 14 |\n",
+ "| total_timesteps | 4000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.67 |\n",
+ "| explained_variance | 0.9866529 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 199 |\n",
+ "| policy_loss | -1.13 |\n",
+ "| std | 0.999 |\n",
+ "| value_loss | 0.0935 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 300 |\n",
+ "| time_elapsed | 21 |\n",
+ "| total_timesteps | 6000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.66 |\n",
+ "| explained_variance | 0.91406125 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 299 |\n",
+ "| policy_loss | -1.29 |\n",
+ "| std | 0.997 |\n",
+ "| value_loss | 0.106 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 254 |\n",
+ "| iterations | 400 |\n",
+ "| time_elapsed | 31 |\n",
+ "| total_timesteps | 8000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.97533536 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 399 |\n",
+ "| policy_loss | 0.149 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0134 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 264 |\n",
+ "| iterations | 500 |\n",
+ "| time_elapsed | 37 |\n",
+ "| total_timesteps | 10000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.71 |\n",
+ "| explained_variance | 0.97877157 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 499 |\n",
+ "| policy_loss | 0.671 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0334 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 270 |\n",
+ "| iterations | 600 |\n",
+ "| time_elapsed | 44 |\n",
+ "| total_timesteps | 12000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.72 |\n",
+ "| explained_variance | 0.941841 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 599 |\n",
+ "| policy_loss | -0.656 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0444 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 700 |\n",
+ "| time_elapsed | 50 |\n",
+ "| total_timesteps | 14000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.72 |\n",
+ "| explained_variance | 0.7981684 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 699 |\n",
+ "| policy_loss | -0.123 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0345 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 800 |\n",
+ "| time_elapsed | 57 |\n",
+ "| total_timesteps | 16000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.71 |\n",
+ "| explained_variance | 0.7265997 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 799 |\n",
+ "| policy_loss | 0.379 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0249 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 267 |\n",
+ "| iterations | 900 |\n",
+ "| time_elapsed | 67 |\n",
+ "| total_timesteps | 18000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.89900863 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 899 |\n",
+ "| policy_loss | -0.36 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0117 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 269 |\n",
+ "| iterations | 1000 |\n",
+ "| time_elapsed | 74 |\n",
+ "| total_timesteps | 20000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.9879093 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 999 |\n",
+ "| policy_loss | -0.238 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.0122 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 271 |\n",
+ "| iterations | 1100 |\n",
+ "| time_elapsed | 81 |\n",
+ "| total_timesteps | 22000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.67 |\n",
+ "| explained_variance | 0.96510875 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1099 |\n",
+ "| policy_loss | -0.0184 |\n",
+ "| std | 0.998 |\n",
+ "| value_loss | 0.0225 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 1200 |\n",
+ "| time_elapsed | 87 |\n",
+ "| total_timesteps | 24000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.98142165 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1199 |\n",
+ "| policy_loss | -0.408 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.0161 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 1300 |\n",
+ "| time_elapsed | 94 |\n",
+ "| total_timesteps | 26000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.8481641 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1299 |\n",
+ "| policy_loss | -0.445 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.00926 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 266 |\n",
+ "| iterations | 1400 |\n",
+ "| time_elapsed | 105 |\n",
+ "| total_timesteps | 28000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.24699801 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1399 |\n",
+ "| policy_loss | -0.0865 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.00277 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 266 |\n",
+ "| iterations | 1500 |\n",
+ "| time_elapsed | 112 |\n",
+ "| total_timesteps | 30000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.98543787 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1499 |\n",
+ "| policy_loss | -0.122 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00198 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 268 |\n",
+ "| iterations | 1600 |\n",
+ "| time_elapsed | 119 |\n",
+ "| total_timesteps | 32000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.97692937 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1599 |\n",
+ "| policy_loss | 0.102 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00283 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 269 |\n",
+ "| iterations | 1700 |\n",
+ "| time_elapsed | 126 |\n",
+ "| total_timesteps | 34000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.9177654 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1699 |\n",
+ "| policy_loss | -0.247 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00606 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 264 |\n",
+ "| iterations | 1800 |\n",
+ "| time_elapsed | 135 |\n",
+ "| total_timesteps | 36000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.88942945 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1799 |\n",
+ "| policy_loss | -0.0544 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00655 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 268 |\n",
+ "| iterations | 1900 |\n",
+ "| time_elapsed | 141 |\n",
+ "| total_timesteps | 38000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.9895952 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1899 |\n",
+ "| policy_loss | 0.179 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00177 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 271 |\n",
+ "| iterations | 2000 |\n",
+ "| time_elapsed | 147 |\n",
+ "| total_timesteps | 40000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.7657582 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 1999 |\n",
+ "| policy_loss | 0.267 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00616 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 2100 |\n",
+ "| time_elapsed | 153 |\n",
+ "| total_timesteps | 42000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.67 |\n",
+ "| explained_variance | 0.9649579 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2099 |\n",
+ "| policy_loss | -0.0232 |\n",
+ "| std | 0.998 |\n",
+ "| value_loss | 0.000706 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 2200 |\n",
+ "| time_elapsed | 159 |\n",
+ "| total_timesteps | 44000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.67 |\n",
+ "| explained_variance | 0.9855432 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2199 |\n",
+ "| policy_loss | -0.388 |\n",
+ "| std | 0.999 |\n",
+ "| value_loss | 0.0113 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 2300 |\n",
+ "| time_elapsed | 166 |\n",
+ "| total_timesteps | 46000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.7222178 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2299 |\n",
+ "| policy_loss | -0.0183 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00072 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 2400 |\n",
+ "| time_elapsed | 175 |\n",
+ "| total_timesteps | 48000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.98888546 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2399 |\n",
+ "| policy_loss | -0.238 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00958 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 2500 |\n",
+ "| time_elapsed | 182 |\n",
+ "| total_timesteps | 50000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.96954125 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2499 |\n",
+ "| policy_loss | -0.0431 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.000864 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 2600 |\n",
+ "| time_elapsed | 188 |\n",
+ "| total_timesteps | 52000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.69 |\n",
+ "| explained_variance | 0.96610194 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2599 |\n",
+ "| policy_loss | -0.105 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00381 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 2700 |\n",
+ "| time_elapsed | 195 |\n",
+ "| total_timesteps | 54000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.9916272 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2699 |\n",
+ "| policy_loss | 0.0748 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.00139 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 2800 |\n",
+ "| time_elapsed | 201 |\n",
+ "| total_timesteps | 56000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.7 |\n",
+ "| explained_variance | 0.96441084 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2799 |\n",
+ "| policy_loss | 0.1 |\n",
+ "| std | 1.01 |\n",
+ "| value_loss | 0.00154 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 2900 |\n",
+ "| time_elapsed | 211 |\n",
+ "| total_timesteps | 58000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.68 |\n",
+ "| explained_variance | 0.9759128 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2899 |\n",
+ "| policy_loss | -0.0517 |\n",
+ "| std | 1 |\n",
+ "| value_loss | 0.00165 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 3000 |\n",
+ "| time_elapsed | 217 |\n",
+ "| total_timesteps | 60000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.66 |\n",
+ "| explained_variance | 0.9729539 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 2999 |\n",
+ "| policy_loss | 0.032 |\n",
+ "| std | 0.997 |\n",
+ "| value_loss | 0.000864 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 3100 |\n",
+ "| time_elapsed | 224 |\n",
+ "| total_timesteps | 62000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.66 |\n",
+ "| explained_variance | 0.9806961 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3099 |\n",
+ "| policy_loss | -0.0191 |\n",
+ "| std | 0.995 |\n",
+ "| value_loss | 0.000556 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 3200 |\n",
+ "| time_elapsed | 231 |\n",
+ "| total_timesteps | 64000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.64 |\n",
+ "| explained_variance | 0.9768019 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3199 |\n",
+ "| policy_loss | -0.0979 |\n",
+ "| std | 0.991 |\n",
+ "| value_loss | 0.00653 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 3300 |\n",
+ "| time_elapsed | 237 |\n",
+ "| total_timesteps | 66000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.61 |\n",
+ "| explained_variance | 0.98892736 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3299 |\n",
+ "| policy_loss | -0.292 |\n",
+ "| std | 0.984 |\n",
+ "| value_loss | 0.0137 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 3400 |\n",
+ "| time_elapsed | 247 |\n",
+ "| total_timesteps | 68000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.9900239 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3399 |\n",
+ "| policy_loss | 0.00371 |\n",
+ "| std | 0.98 |\n",
+ "| value_loss | 0.000155 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 3500 |\n",
+ "| time_elapsed | 253 |\n",
+ "| total_timesteps | 70000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.9155875 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3499 |\n",
+ "| policy_loss | 0.14 |\n",
+ "| std | 0.978 |\n",
+ "| value_loss | 0.00183 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 3600 |\n",
+ "| time_elapsed | 260 |\n",
+ "| total_timesteps | 72000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.58 |\n",
+ "| explained_variance | 0.9748997 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3599 |\n",
+ "| policy_loss | -0.173 |\n",
+ "| std | 0.976 |\n",
+ "| value_loss | 0.00162 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 3700 |\n",
+ "| time_elapsed | 266 |\n",
+ "| total_timesteps | 74000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.9340458 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3699 |\n",
+ "| policy_loss | -0.0189 |\n",
+ "| std | 0.978 |\n",
+ "| value_loss | 0.00181 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 3800 |\n",
+ "| time_elapsed | 272 |\n",
+ "| total_timesteps | 76000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.96108234 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3799 |\n",
+ "| policy_loss | 0.0171 |\n",
+ "| std | 0.979 |\n",
+ "| value_loss | 0.000512 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 3900 |\n",
+ "| time_elapsed | 282 |\n",
+ "| total_timesteps | 78000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.62 |\n",
+ "| explained_variance | 0.92484015 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3899 |\n",
+ "| policy_loss | -0.0346 |\n",
+ "| std | 0.986 |\n",
+ "| value_loss | 0.000586 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 4000 |\n",
+ "| time_elapsed | 289 |\n",
+ "| total_timesteps | 80000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.63 |\n",
+ "| explained_variance | 0.99392503 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 3999 |\n",
+ "| policy_loss | 0.0322 |\n",
+ "| std | 0.988 |\n",
+ "| value_loss | 0.000124 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 4100 |\n",
+ "| time_elapsed | 296 |\n",
+ "| total_timesteps | 82000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.6 |\n",
+ "| explained_variance | 0.93192434 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4099 |\n",
+ "| policy_loss | 0.122 |\n",
+ "| std | 0.981 |\n",
+ "| value_loss | 0.0031 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 4200 |\n",
+ "| time_elapsed | 302 |\n",
+ "| total_timesteps | 84000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.9695567 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4199 |\n",
+ "| policy_loss | 0.013 |\n",
+ "| std | 0.98 |\n",
+ "| value_loss | 0.000614 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 4300 |\n",
+ "| time_elapsed | 308 |\n",
+ "| total_timesteps | 86000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.58 |\n",
+ "| explained_variance | 0.90897274 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4299 |\n",
+ "| policy_loss | 0.0164 |\n",
+ "| std | 0.976 |\n",
+ "| value_loss | 0.00136 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 4400 |\n",
+ "| time_elapsed | 318 |\n",
+ "| total_timesteps | 88000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.57 |\n",
+ "| explained_variance | -0.491444 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4399 |\n",
+ "| policy_loss | 0.313 |\n",
+ "| std | 0.976 |\n",
+ "| value_loss | 0.00968 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 4500 |\n",
+ "| time_elapsed | 324 |\n",
+ "| total_timesteps | 90000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.57 |\n",
+ "| explained_variance | 0.9918581 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4499 |\n",
+ "| policy_loss | 0.00828 |\n",
+ "| std | 0.975 |\n",
+ "| value_loss | 0.000251 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 4600 |\n",
+ "| time_elapsed | 330 |\n",
+ "| total_timesteps | 92000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.9781639 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4599 |\n",
+ "| policy_loss | -0.0435 |\n",
+ "| std | 0.98 |\n",
+ "| value_loss | 0.00035 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 4700 |\n",
+ "| time_elapsed | 337 |\n",
+ "| total_timesteps | 94000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.6 |\n",
+ "| explained_variance | 0.11130333 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4699 |\n",
+ "| policy_loss | -1.37 |\n",
+ "| std | 0.982 |\n",
+ "| value_loss | 0.515 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 4800 |\n",
+ "| time_elapsed | 344 |\n",
+ "| total_timesteps | 96000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.871283 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4799 |\n",
+ "| policy_loss | -0.17 |\n",
+ "| std | 0.98 |\n",
+ "| value_loss | 0.0107 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 4900 |\n",
+ "| time_elapsed | 355 |\n",
+ "| total_timesteps | 98000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.58 |\n",
+ "| explained_variance | 0.99265605 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4899 |\n",
+ "| policy_loss | 0.0609 |\n",
+ "| std | 0.977 |\n",
+ "| value_loss | 0.000309 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 5000 |\n",
+ "| time_elapsed | 361 |\n",
+ "| total_timesteps | 100000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.90010345 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 4999 |\n",
+ "| policy_loss | 0.0621 |\n",
+ "| std | 0.979 |\n",
+ "| value_loss | 0.000767 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 5100 |\n",
+ "| time_elapsed | 368 |\n",
+ "| total_timesteps | 102000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.47578835 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5099 |\n",
+ "| policy_loss | -0.121 |\n",
+ "| std | 0.979 |\n",
+ "| value_loss | 0.000866 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 5200 |\n",
+ "| time_elapsed | 374 |\n",
+ "| total_timesteps | 104000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.59 |\n",
+ "| explained_variance | 0.8169235 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5199 |\n",
+ "| policy_loss | -0.177 |\n",
+ "| std | 0.98 |\n",
+ "| value_loss | 0.00596 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 5300 |\n",
+ "| time_elapsed | 381 |\n",
+ "| total_timesteps | 106000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.58 |\n",
+ "| explained_variance | 0.5145811 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5299 |\n",
+ "| policy_loss | -0.316 |\n",
+ "| std | 0.976 |\n",
+ "| value_loss | 0.0262 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 5400 |\n",
+ "| time_elapsed | 391 |\n",
+ "| total_timesteps | 108000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.57 |\n",
+ "| explained_variance | 0.99181134 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5399 |\n",
+ "| policy_loss | -0.00888 |\n",
+ "| std | 0.975 |\n",
+ "| value_loss | 0.0171 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 5500 |\n",
+ "| time_elapsed | 398 |\n",
+ "| total_timesteps | 110000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.56 |\n",
+ "| explained_variance | 0.76695526 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5499 |\n",
+ "| policy_loss | -0.0179 |\n",
+ "| std | 0.973 |\n",
+ "| value_loss | 0.000661 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 5600 |\n",
+ "| time_elapsed | 405 |\n",
+ "| total_timesteps | 112000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.54 |\n",
+ "| explained_variance | 0.98992884 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5599 |\n",
+ "| policy_loss | 0.00993 |\n",
+ "| std | 0.968 |\n",
+ "| value_loss | 9.45e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 5700 |\n",
+ "| time_elapsed | 412 |\n",
+ "| total_timesteps | 114000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.54 |\n",
+ "| explained_variance | 0.9023917 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5699 |\n",
+ "| policy_loss | -0.0113 |\n",
+ "| std | 0.967 |\n",
+ "| value_loss | 5.5e-05 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 5800 |\n",
+ "| time_elapsed | 423 |\n",
+ "| total_timesteps | 116000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.52 |\n",
+ "| explained_variance | -0.55550146 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5799 |\n",
+ "| policy_loss | 0.0259 |\n",
+ "| std | 0.962 |\n",
+ "| value_loss | 0.000285 |\n",
+ "---------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 5900 |\n",
+ "| time_elapsed | 430 |\n",
+ "| total_timesteps | 118000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.52 |\n",
+ "| explained_variance | 0.579239 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5899 |\n",
+ "| policy_loss | 0.468 |\n",
+ "| std | 0.961 |\n",
+ "| value_loss | 0.0336 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 6000 |\n",
+ "| time_elapsed | 437 |\n",
+ "| total_timesteps | 120000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.52 |\n",
+ "| explained_variance | 0.98050183 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 5999 |\n",
+ "| policy_loss | -0.0048 |\n",
+ "| std | 0.962 |\n",
+ "| value_loss | 1.05e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 6100 |\n",
+ "| time_elapsed | 443 |\n",
+ "| total_timesteps | 122000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.5 |\n",
+ "| explained_variance | -18.419111 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6099 |\n",
+ "| policy_loss | 0.408 |\n",
+ "| std | 0.958 |\n",
+ "| value_loss | 0.0274 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 6200 |\n",
+ "| time_elapsed | 450 |\n",
+ "| total_timesteps | 124000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.5 |\n",
+ "| explained_variance | 0.8330802 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6199 |\n",
+ "| policy_loss | 0.269 |\n",
+ "| std | 0.959 |\n",
+ "| value_loss | 0.00514 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 6300 |\n",
+ "| time_elapsed | 460 |\n",
+ "| total_timesteps | 126000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.51 |\n",
+ "| explained_variance | 0.93740755 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6299 |\n",
+ "| policy_loss | -0.0936 |\n",
+ "| std | 0.959 |\n",
+ "| value_loss | 0.000951 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 6400 |\n",
+ "| time_elapsed | 467 |\n",
+ "| total_timesteps | 128000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.51 |\n",
+ "| explained_variance | 0.9103676 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6399 |\n",
+ "| policy_loss | -0.0194 |\n",
+ "| std | 0.96 |\n",
+ "| value_loss | 3.56e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 6500 |\n",
+ "| time_elapsed | 473 |\n",
+ "| total_timesteps | 130000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.52 |\n",
+ "| explained_variance | 0.94888294 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6499 |\n",
+ "| policy_loss | -0.0402 |\n",
+ "| std | 0.961 |\n",
+ "| value_loss | 0.00023 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 6600 |\n",
+ "| time_elapsed | 480 |\n",
+ "| total_timesteps | 132000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.5 |\n",
+ "| explained_variance | -5.1157684 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6599 |\n",
+ "| policy_loss | -0.0434 |\n",
+ "| std | 0.959 |\n",
+ "| value_loss | 0.0138 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 6700 |\n",
+ "| time_elapsed | 487 |\n",
+ "| total_timesteps | 134000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.5 |\n",
+ "| explained_variance | 0.94738376 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6699 |\n",
+ "| policy_loss | -0.0506 |\n",
+ "| std | 0.958 |\n",
+ "| value_loss | 0.000277 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 6800 |\n",
+ "| time_elapsed | 497 |\n",
+ "| total_timesteps | 136000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.5 |\n",
+ "| explained_variance | 0.9724636 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6799 |\n",
+ "| policy_loss | 0.0752 |\n",
+ "| std | 0.957 |\n",
+ "| value_loss | 0.254 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 6900 |\n",
+ "| time_elapsed | 504 |\n",
+ "| total_timesteps | 138000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.47 |\n",
+ "| explained_variance | 0.79116476 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6899 |\n",
+ "| policy_loss | -0.0356 |\n",
+ "| std | 0.951 |\n",
+ "| value_loss | 0.00429 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 7000 |\n",
+ "| time_elapsed | 510 |\n",
+ "| total_timesteps | 140000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.47 |\n",
+ "| explained_variance | 0.9288944 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 6999 |\n",
+ "| policy_loss | -0.0441 |\n",
+ "| std | 0.951 |\n",
+ "| value_loss | 0.00104 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 7100 |\n",
+ "| time_elapsed | 516 |\n",
+ "| total_timesteps | 142000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.5042453 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7099 |\n",
+ "| policy_loss | 0.0777 |\n",
+ "| std | 0.948 |\n",
+ "| value_loss | 0.000893 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 7200 |\n",
+ "| time_elapsed | 523 |\n",
+ "| total_timesteps | 144000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.20719147 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7199 |\n",
+ "| policy_loss | -0.131 |\n",
+ "| std | 0.949 |\n",
+ "| value_loss | 0.00243 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 7300 |\n",
+ "| time_elapsed | 533 |\n",
+ "| total_timesteps | 146000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.93220496 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7299 |\n",
+ "| policy_loss | 0.0805 |\n",
+ "| std | 0.948 |\n",
+ "| value_loss | 0.001 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 7400 |\n",
+ "| time_elapsed | 540 |\n",
+ "| total_timesteps | 148000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.48 |\n",
+ "| explained_variance | 0.84394395 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7399 |\n",
+ "| policy_loss | -0.115 |\n",
+ "| std | 0.953 |\n",
+ "| value_loss | 0.00974 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 7500 |\n",
+ "| time_elapsed | 546 |\n",
+ "| total_timesteps | 150000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.48 |\n",
+ "| explained_variance | 0.96859866 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7499 |\n",
+ "| policy_loss | -0.000996 |\n",
+ "| std | 0.953 |\n",
+ "| value_loss | 3.46e-05 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 7600 |\n",
+ "| time_elapsed | 553 |\n",
+ "| total_timesteps | 152000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.114243984 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7599 |\n",
+ "| policy_loss | -0.0376 |\n",
+ "| std | 0.949 |\n",
+ "| value_loss | 0.00137 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 7700 |\n",
+ "| time_elapsed | 560 |\n",
+ "| total_timesteps | 154000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.44 |\n",
+ "| explained_variance | -5.899102 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7699 |\n",
+ "| policy_loss | 0.106 |\n",
+ "| std | 0.942 |\n",
+ "| value_loss | 0.00164 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 7800 |\n",
+ "| time_elapsed | 570 |\n",
+ "| total_timesteps | 156000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.43601793 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7799 |\n",
+ "| policy_loss | 0.206 |\n",
+ "| std | 0.947 |\n",
+ "| value_loss | 0.00701 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 7900 |\n",
+ "| time_elapsed | 576 |\n",
+ "| total_timesteps | 158000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.9624703 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7899 |\n",
+ "| policy_loss | -0.00135 |\n",
+ "| std | 0.949 |\n",
+ "| value_loss | 3.39e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.6 |\n",
+ "| ep_rew_mean | -49.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 8000 |\n",
+ "| time_elapsed | 583 |\n",
+ "| total_timesteps | 160000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.8270585 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 7999 |\n",
+ "| policy_loss | 0.00402 |\n",
+ "| std | 0.949 |\n",
+ "| value_loss | 3.21e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 8100 |\n",
+ "| time_elapsed | 590 |\n",
+ "| total_timesteps | 162000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.47 |\n",
+ "| explained_variance | -2.4589894 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8099 |\n",
+ "| policy_loss | -0.012 |\n",
+ "| std | 0.951 |\n",
+ "| value_loss | 0.000215 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 8200 |\n",
+ "| time_elapsed | 600 |\n",
+ "| total_timesteps | 164000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.92311984 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8199 |\n",
+ "| policy_loss | 0.0122 |\n",
+ "| std | 0.948 |\n",
+ "| value_loss | 1.28e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 8300 |\n",
+ "| time_elapsed | 607 |\n",
+ "| total_timesteps | 166000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.45 |\n",
+ "| explained_variance | 0.49472392 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8299 |\n",
+ "| policy_loss | -0.0236 |\n",
+ "| std | 0.946 |\n",
+ "| value_loss | 0.000205 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 8400 |\n",
+ "| time_elapsed | 613 |\n",
+ "| total_timesteps | 168000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | -1.6017191 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8399 |\n",
+ "| policy_loss | -0.00243 |\n",
+ "| std | 0.947 |\n",
+ "| value_loss | 0.00038 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 8500 |\n",
+ "| time_elapsed | 620 |\n",
+ "| total_timesteps | 170000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.44 |\n",
+ "| explained_variance | 0.4432842 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8499 |\n",
+ "| policy_loss | -0.0109 |\n",
+ "| std | 0.944 |\n",
+ "| value_loss | 4.59e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 8600 |\n",
+ "| time_elapsed | 627 |\n",
+ "| total_timesteps | 172000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.45 |\n",
+ "| explained_variance | 0.2270329 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8599 |\n",
+ "| policy_loss | -0.0348 |\n",
+ "| std | 0.945 |\n",
+ "| value_loss | 6e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 8700 |\n",
+ "| time_elapsed | 637 |\n",
+ "| total_timesteps | 174000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.44 |\n",
+ "| explained_variance | 0.6000677 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8699 |\n",
+ "| policy_loss | 0.00449 |\n",
+ "| std | 0.944 |\n",
+ "| value_loss | 1.61e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 8800 |\n",
+ "| time_elapsed | 644 |\n",
+ "| total_timesteps | 176000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.44 |\n",
+ "| explained_variance | 0.35898274 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8799 |\n",
+ "| policy_loss | 0.0663 |\n",
+ "| std | 0.943 |\n",
+ "| value_loss | 0.000623 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 8900 |\n",
+ "| time_elapsed | 651 |\n",
+ "| total_timesteps | 178000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.46 |\n",
+ "| explained_variance | 0.23337007 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8899 |\n",
+ "| policy_loss | 0.0131 |\n",
+ "| std | 0.947 |\n",
+ "| value_loss | 3.51e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9000 |\n",
+ "| time_elapsed | 658 |\n",
+ "| total_timesteps | 180000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.45 |\n",
+ "| explained_variance | 0.7281234 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 8999 |\n",
+ "| policy_loss | 0.133 |\n",
+ "| std | 0.946 |\n",
+ "| value_loss | 0.00179 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9100 |\n",
+ "| time_elapsed | 665 |\n",
+ "| total_timesteps | 182000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.47 |\n",
+ "| explained_variance | -9.7073145 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9099 |\n",
+ "| policy_loss | -0.015 |\n",
+ "| std | 0.949 |\n",
+ "| value_loss | 3.28e-05 |\n",
+ "--------------------------------------\n",
+ "-----------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 9200 |\n",
+ "| time_elapsed | 675 |\n",
+ "| total_timesteps | 184000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.47 |\n",
+ "| explained_variance | -0.0025390387 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9199 |\n",
+ "| policy_loss | 0.113 |\n",
+ "| std | 0.951 |\n",
+ "| value_loss | 0.00119 |\n",
+ "-----------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 9300 |\n",
+ "| time_elapsed | 681 |\n",
+ "| total_timesteps | 186000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.44 |\n",
+ "| explained_variance | 0.6690495 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9299 |\n",
+ "| policy_loss | -0.00437 |\n",
+ "| std | 0.944 |\n",
+ "| value_loss | 4.59e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9400 |\n",
+ "| time_elapsed | 688 |\n",
+ "| total_timesteps | 188000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.43 |\n",
+ "| explained_variance | 0.7185387 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9399 |\n",
+ "| policy_loss | 0.00195 |\n",
+ "| std | 0.941 |\n",
+ "| value_loss | 6.26e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9500 |\n",
+ "| time_elapsed | 694 |\n",
+ "| total_timesteps | 190000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.43 |\n",
+ "| explained_variance | -0.9873673 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9499 |\n",
+ "| policy_loss | -0.0856 |\n",
+ "| std | 0.94 |\n",
+ "| value_loss | 0.000481 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9600 |\n",
+ "| time_elapsed | 701 |\n",
+ "| total_timesteps | 192000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.4 |\n",
+ "| explained_variance | 0.78831863 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9599 |\n",
+ "| policy_loss | -0.0148 |\n",
+ "| std | 0.934 |\n",
+ "| value_loss | 3.55e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.2 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 9700 |\n",
+ "| time_elapsed | 711 |\n",
+ "| total_timesteps | 194000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.4 |\n",
+ "| explained_variance | 0.95177764 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9699 |\n",
+ "| policy_loss | 0.0071 |\n",
+ "| std | 0.935 |\n",
+ "| value_loss | 2.33e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9800 |\n",
+ "| time_elapsed | 717 |\n",
+ "| total_timesteps | 196000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.37 |\n",
+ "| explained_variance | 0.9991041 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9799 |\n",
+ "| policy_loss | -0.0138 |\n",
+ "| std | 0.928 |\n",
+ "| value_loss | 1.32e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 9900 |\n",
+ "| time_elapsed | 724 |\n",
+ "| total_timesteps | 198000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.39 |\n",
+ "| explained_variance | 0.98675823 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9899 |\n",
+ "| policy_loss | -0.00244 |\n",
+ "| std | 0.933 |\n",
+ "| value_loss | 2.88e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10000 |\n",
+ "| time_elapsed | 731 |\n",
+ "| total_timesteps | 200000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.39 |\n",
+ "| explained_variance | 0.8423076 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 9999 |\n",
+ "| policy_loss | 0.0111 |\n",
+ "| std | 0.933 |\n",
+ "| value_loss | 0.000101 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10100 |\n",
+ "| time_elapsed | 737 |\n",
+ "| total_timesteps | 202000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.38 |\n",
+ "| explained_variance | 0.9848204 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10099 |\n",
+ "| policy_loss | -0.0111 |\n",
+ "| std | 0.929 |\n",
+ "| value_loss | 8.33e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 10200 |\n",
+ "| time_elapsed | 748 |\n",
+ "| total_timesteps | 204000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.35 |\n",
+ "| explained_variance | 0.9231719 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10199 |\n",
+ "| policy_loss | -0.00118 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 1.12e-05 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10300 |\n",
+ "| time_elapsed | 754 |\n",
+ "| total_timesteps | 206000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.35 |\n",
+ "| explained_variance | 0.056429803 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10299 |\n",
+ "| policy_loss | 0.0131 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 0.000336 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10400 |\n",
+ "| time_elapsed | 761 |\n",
+ "| total_timesteps | 208000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.35 |\n",
+ "| explained_variance | 0.9514538 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10399 |\n",
+ "| policy_loss | -0.00282 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 4.15e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10500 |\n",
+ "| time_elapsed | 768 |\n",
+ "| total_timesteps | 210000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.34 |\n",
+ "| explained_variance | 0.94718796 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10499 |\n",
+ "| policy_loss | -0.00272 |\n",
+ "| std | 0.921 |\n",
+ "| value_loss | 5.42e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10600 |\n",
+ "| time_elapsed | 774 |\n",
+ "| total_timesteps | 212000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.35 |\n",
+ "| explained_variance | 0.8666384 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10599 |\n",
+ "| policy_loss | -0.0275 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 5.03e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 10700 |\n",
+ "| time_elapsed | 784 |\n",
+ "| total_timesteps | 214000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.35 |\n",
+ "| explained_variance | -0.4472072 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10699 |\n",
+ "| policy_loss | -0.00202 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 3e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10800 |\n",
+ "| time_elapsed | 790 |\n",
+ "| total_timesteps | 216000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.36 |\n",
+ "| explained_variance | 0.5958526 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10799 |\n",
+ "| policy_loss | 0.0985 |\n",
+ "| std | 0.923 |\n",
+ "| value_loss | 0.00219 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 10900 |\n",
+ "| time_elapsed | 797 |\n",
+ "| total_timesteps | 218000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.36 |\n",
+ "| explained_variance | 0.9833134 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10899 |\n",
+ "| policy_loss | 0.0262 |\n",
+ "| std | 0.924 |\n",
+ "| value_loss | 3.51e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11000 |\n",
+ "| time_elapsed | 803 |\n",
+ "| total_timesteps | 220000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.37 |\n",
+ "| explained_variance | 0.80022764 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 10999 |\n",
+ "| policy_loss | 0.0101 |\n",
+ "| std | 0.927 |\n",
+ "| value_loss | 1.26e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 11100 |\n",
+ "| time_elapsed | 809 |\n",
+ "| total_timesteps | 222000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.39 |\n",
+ "| explained_variance | 0.86710656 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11099 |\n",
+ "| policy_loss | -0.00484 |\n",
+ "| std | 0.932 |\n",
+ "| value_loss | 5.19e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11200 |\n",
+ "| time_elapsed | 819 |\n",
+ "| total_timesteps | 224000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.38 |\n",
+ "| explained_variance | 0.9757535 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11199 |\n",
+ "| policy_loss | 0.00393 |\n",
+ "| std | 0.928 |\n",
+ "| value_loss | 4.17e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11300 |\n",
+ "| time_elapsed | 826 |\n",
+ "| total_timesteps | 226000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.37 |\n",
+ "| explained_variance | 0.9639573 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11299 |\n",
+ "| policy_loss | -0.00267 |\n",
+ "| std | 0.928 |\n",
+ "| value_loss | 2.23e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11400 |\n",
+ "| time_elapsed | 833 |\n",
+ "| total_timesteps | 228000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.38 |\n",
+ "| explained_variance | -2.9878726 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11399 |\n",
+ "| policy_loss | -0.00966 |\n",
+ "| std | 0.929 |\n",
+ "| value_loss | 1.41e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11500 |\n",
+ "| time_elapsed | 839 |\n",
+ "| total_timesteps | 230000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.37 |\n",
+ "| explained_variance | 0.61145973 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11499 |\n",
+ "| policy_loss | 0.000605 |\n",
+ "| std | 0.927 |\n",
+ "| value_loss | 1.67e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 11600 |\n",
+ "| time_elapsed | 846 |\n",
+ "| total_timesteps | 232000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.34 |\n",
+ "| explained_variance | -0.48529482 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11599 |\n",
+ "| policy_loss | -0.00852 |\n",
+ "| std | 0.92 |\n",
+ "| value_loss | 0.000263 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11700 |\n",
+ "| time_elapsed | 856 |\n",
+ "| total_timesteps | 234000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.32 |\n",
+ "| explained_variance | 0.5111707 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11699 |\n",
+ "| policy_loss | 0.0447 |\n",
+ "| std | 0.916 |\n",
+ "| value_loss | 0.00013 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.1 |\n",
+ "| ep_rew_mean | -46.1 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11800 |\n",
+ "| time_elapsed | 863 |\n",
+ "| total_timesteps | 236000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.33 |\n",
+ "| explained_variance | -0.15370154 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11799 |\n",
+ "| policy_loss | -0.0161 |\n",
+ "| std | 0.918 |\n",
+ "| value_loss | 2.98e-05 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 11900 |\n",
+ "| time_elapsed | 869 |\n",
+ "| total_timesteps | 238000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.31 |\n",
+ "| explained_variance | 0.80145425 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11899 |\n",
+ "| policy_loss | -0.00846 |\n",
+ "| std | 0.914 |\n",
+ "| value_loss | 1.04e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 12000 |\n",
+ "| time_elapsed | 875 |\n",
+ "| total_timesteps | 240000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.31 |\n",
+ "| explained_variance | 0.7291146 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 11999 |\n",
+ "| policy_loss | 0.0131 |\n",
+ "| std | 0.914 |\n",
+ "| value_loss | 1.66e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12100 |\n",
+ "| time_elapsed | 882 |\n",
+ "| total_timesteps | 242000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.31 |\n",
+ "| explained_variance | 0.9325699 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12099 |\n",
+ "| policy_loss | -0.00958 |\n",
+ "| std | 0.913 |\n",
+ "| value_loss | 9.12e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 12200 |\n",
+ "| time_elapsed | 892 |\n",
+ "| total_timesteps | 244000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.29 |\n",
+ "| explained_variance | 0.9233826 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12199 |\n",
+ "| policy_loss | 0.00747 |\n",
+ "| std | 0.908 |\n",
+ "| value_loss | 5.35e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 12300 |\n",
+ "| time_elapsed | 898 |\n",
+ "| total_timesteps | 246000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.28 |\n",
+ "| explained_variance | 0.89778614 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12299 |\n",
+ "| policy_loss | -0.00221 |\n",
+ "| std | 0.906 |\n",
+ "| value_loss | 4.25e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12400 |\n",
+ "| time_elapsed | 905 |\n",
+ "| total_timesteps | 248000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.26 |\n",
+ "| explained_variance | 0.8887966 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12399 |\n",
+ "| policy_loss | 0.00342 |\n",
+ "| std | 0.902 |\n",
+ "| value_loss | 9.93e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12500 |\n",
+ "| time_elapsed | 911 |\n",
+ "| total_timesteps | 250000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.22 |\n",
+ "| explained_variance | 0.66576827 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12499 |\n",
+ "| policy_loss | -0.0226 |\n",
+ "| std | 0.894 |\n",
+ "| value_loss | 3.2e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12600 |\n",
+ "| time_elapsed | 917 |\n",
+ "| total_timesteps | 252000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.22 |\n",
+ "| explained_variance | 0.7417493 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12599 |\n",
+ "| policy_loss | -0.00908 |\n",
+ "| std | 0.893 |\n",
+ "| value_loss | 0.000403 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 12700 |\n",
+ "| time_elapsed | 927 |\n",
+ "| total_timesteps | 254000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.21 |\n",
+ "| explained_variance | -0.4401511 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12699 |\n",
+ "| policy_loss | -0.00291 |\n",
+ "| std | 0.891 |\n",
+ "| value_loss | 0.000273 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12800 |\n",
+ "| time_elapsed | 933 |\n",
+ "| total_timesteps | 256000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | 0.049697876 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12799 |\n",
+ "| policy_loss | -0.0232 |\n",
+ "| std | 0.887 |\n",
+ "| value_loss | 0.00016 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 12900 |\n",
+ "| time_elapsed | 940 |\n",
+ "| total_timesteps | 258000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | -1.4899552 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12899 |\n",
+ "| policy_loss | 0.0311 |\n",
+ "| std | 0.887 |\n",
+ "| value_loss | 8.18e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13000 |\n",
+ "| time_elapsed | 947 |\n",
+ "| total_timesteps | 260000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.8485774 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 12999 |\n",
+ "| policy_loss | -0.0228 |\n",
+ "| std | 0.881 |\n",
+ "| value_loss | 0.000253 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13100 |\n",
+ "| time_elapsed | 953 |\n",
+ "| total_timesteps | 262000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | -19.859615 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13099 |\n",
+ "| policy_loss | -0.113 |\n",
+ "| std | 0.882 |\n",
+ "| value_loss | 0.00415 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13200 |\n",
+ "| time_elapsed | 963 |\n",
+ "| total_timesteps | 264000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.2 |\n",
+ "| explained_variance | 0.9869409 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13199 |\n",
+ "| policy_loss | -0.0141 |\n",
+ "| std | 0.888 |\n",
+ "| value_loss | 1.01e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13300 |\n",
+ "| time_elapsed | 969 |\n",
+ "| total_timesteps | 266000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.2 |\n",
+ "| explained_variance | 0.91975826 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13299 |\n",
+ "| policy_loss | 0.00825 |\n",
+ "| std | 0.889 |\n",
+ "| value_loss | 1.69e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13400 |\n",
+ "| time_elapsed | 976 |\n",
+ "| total_timesteps | 268000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | 0.88386124 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13399 |\n",
+ "| policy_loss | -0.0196 |\n",
+ "| std | 0.887 |\n",
+ "| value_loss | 2.02e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13500 |\n",
+ "| time_elapsed | 982 |\n",
+ "| total_timesteps | 270000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.21 |\n",
+ "| explained_variance | 0.88700855 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13499 |\n",
+ "| policy_loss | -0.0174 |\n",
+ "| std | 0.892 |\n",
+ "| value_loss | 1.85e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13600 |\n",
+ "| time_elapsed | 989 |\n",
+ "| total_timesteps | 272000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.2 |\n",
+ "| explained_variance | 0.9246665 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13599 |\n",
+ "| policy_loss | -0.0265 |\n",
+ "| std | 0.889 |\n",
+ "| value_loss | 3.59e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13700 |\n",
+ "| time_elapsed | 999 |\n",
+ "| total_timesteps | 274000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.2 |\n",
+ "| explained_variance | 0.90511894 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13699 |\n",
+ "| policy_loss | -0.0152 |\n",
+ "| std | 0.889 |\n",
+ "| value_loss | 1.5e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13800 |\n",
+ "| time_elapsed | 1006 |\n",
+ "| total_timesteps | 276000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.21 |\n",
+ "| explained_variance | 0.96453655 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13799 |\n",
+ "| policy_loss | 0.00467 |\n",
+ "| std | 0.89 |\n",
+ "| value_loss | 6.43e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 13900 |\n",
+ "| time_elapsed | 1012 |\n",
+ "| total_timesteps | 278000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.21 |\n",
+ "| explained_variance | 0.9376099 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13899 |\n",
+ "| policy_loss | -0.00328 |\n",
+ "| std | 0.892 |\n",
+ "| value_loss | 1e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14000 |\n",
+ "| time_elapsed | 1019 |\n",
+ "| total_timesteps | 280000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.2 |\n",
+ "| explained_variance | 0.9059786 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 13999 |\n",
+ "| policy_loss | 0.00602 |\n",
+ "| std | 0.889 |\n",
+ "| value_loss | 7.84e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14100 |\n",
+ "| time_elapsed | 1025 |\n",
+ "| total_timesteps | 282000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.72370136 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14099 |\n",
+ "| policy_loss | -0.016 |\n",
+ "| std | 0.883 |\n",
+ "| value_loss | 2.04e-05 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14200 |\n",
+ "| time_elapsed | 1035 |\n",
+ "| total_timesteps | 284000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.92715 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14199 |\n",
+ "| policy_loss | 0.00173 |\n",
+ "| std | 0.885 |\n",
+ "| value_loss | 3e-05 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14300 |\n",
+ "| time_elapsed | 1042 |\n",
+ "| total_timesteps | 286000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | 0.86559874 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14299 |\n",
+ "| policy_loss | -0.29 |\n",
+ "| std | 0.885 |\n",
+ "| value_loss | 0.00812 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14400 |\n",
+ "| time_elapsed | 1048 |\n",
+ "| total_timesteps | 288000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.95699525 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14399 |\n",
+ "| policy_loss | -0.0155 |\n",
+ "| std | 0.884 |\n",
+ "| value_loss | 1.59e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.7 |\n",
+ "| ep_rew_mean | -46.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14500 |\n",
+ "| time_elapsed | 1055 |\n",
+ "| total_timesteps | 290000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.15 |\n",
+ "| explained_variance | 0.27279484 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14499 |\n",
+ "| policy_loss | -0.0854 |\n",
+ "| std | 0.878 |\n",
+ "| value_loss | 0.000576 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14600 |\n",
+ "| time_elapsed | 1062 |\n",
+ "| total_timesteps | 292000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.15 |\n",
+ "| explained_variance | 0.96234167 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14599 |\n",
+ "| policy_loss | -0.00485 |\n",
+ "| std | 0.878 |\n",
+ "| value_loss | 5.17e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14700 |\n",
+ "| time_elapsed | 1072 |\n",
+ "| total_timesteps | 294000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.9157329 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14699 |\n",
+ "| policy_loss | -0.000324 |\n",
+ "| std | 0.881 |\n",
+ "| value_loss | 6.58e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14800 |\n",
+ "| time_elapsed | 1078 |\n",
+ "| total_timesteps | 296000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.9132874 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14799 |\n",
+ "| policy_loss | -0.0059 |\n",
+ "| std | 0.883 |\n",
+ "| value_loss | 3.24e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 14900 |\n",
+ "| time_elapsed | 1085 |\n",
+ "| total_timesteps | 298000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.84724855 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14899 |\n",
+ "| policy_loss | -0.0291 |\n",
+ "| std | 0.884 |\n",
+ "| value_loss | 6.02e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15000 |\n",
+ "| time_elapsed | 1091 |\n",
+ "| total_timesteps | 300000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.98511446 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 14999 |\n",
+ "| policy_loss | 0.00702 |\n",
+ "| std | 0.884 |\n",
+ "| value_loss | 8.7e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15100 |\n",
+ "| time_elapsed | 1098 |\n",
+ "| total_timesteps | 302000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | 0.9877453 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15099 |\n",
+ "| policy_loss | 0.0733 |\n",
+ "| std | 0.886 |\n",
+ "| value_loss | 0.000876 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15200 |\n",
+ "| time_elapsed | 1108 |\n",
+ "| total_timesteps | 304000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.19 |\n",
+ "| explained_variance | 0.97673744 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15199 |\n",
+ "| policy_loss | 0.00857 |\n",
+ "| std | 0.886 |\n",
+ "| value_loss | 1.03e-05 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15300 |\n",
+ "| time_elapsed | 1114 |\n",
+ "| total_timesteps | 306000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.94397 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15299 |\n",
+ "| policy_loss | 0.00493 |\n",
+ "| std | 0.884 |\n",
+ "| value_loss | 1.62e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15400 |\n",
+ "| time_elapsed | 1121 |\n",
+ "| total_timesteps | 308000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.9985177 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15399 |\n",
+ "| policy_loss | -0.433 |\n",
+ "| std | 0.881 |\n",
+ "| value_loss | 0.0286 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15500 |\n",
+ "| time_elapsed | 1128 |\n",
+ "| total_timesteps | 310000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.17 |\n",
+ "| explained_variance | 0.8468628 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15499 |\n",
+ "| policy_loss | -0.153 |\n",
+ "| std | 0.882 |\n",
+ "| value_loss | 0.00212 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 15600 |\n",
+ "| time_elapsed | 1134 |\n",
+ "| total_timesteps | 312000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.18 |\n",
+ "| explained_variance | 0.9753182 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15599 |\n",
+ "| policy_loss | 0.000916 |\n",
+ "| std | 0.885 |\n",
+ "| value_loss | 9.08e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15700 |\n",
+ "| time_elapsed | 1144 |\n",
+ "| total_timesteps | 314000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.16 |\n",
+ "| explained_variance | 0.6507284 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15699 |\n",
+ "| policy_loss | -0.024 |\n",
+ "| std | 0.881 |\n",
+ "| value_loss | 0.000117 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15800 |\n",
+ "| time_elapsed | 1150 |\n",
+ "| total_timesteps | 316000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.15 |\n",
+ "| explained_variance | 0.9728615 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15799 |\n",
+ "| policy_loss | -0.0219 |\n",
+ "| std | 0.878 |\n",
+ "| value_loss | 0.000419 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.3 |\n",
+ "| ep_rew_mean | -49.3 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 15900 |\n",
+ "| time_elapsed | 1157 |\n",
+ "| total_timesteps | 318000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.13 |\n",
+ "| explained_variance | 0.9922751 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15899 |\n",
+ "| policy_loss | 0.0166 |\n",
+ "| std | 0.872 |\n",
+ "| value_loss | 3.5e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16000 |\n",
+ "| time_elapsed | 1163 |\n",
+ "| total_timesteps | 320000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.39260852 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 15999 |\n",
+ "| policy_loss | 0.0382 |\n",
+ "| std | 0.871 |\n",
+ "| value_loss | 0.000285 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 16100 |\n",
+ "| time_elapsed | 1170 |\n",
+ "| total_timesteps | 322000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.09 |\n",
+ "| explained_variance | 0.88883996 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16099 |\n",
+ "| policy_loss | -0.288 |\n",
+ "| std | 0.863 |\n",
+ "| value_loss | 0.00633 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16200 |\n",
+ "| time_elapsed | 1180 |\n",
+ "| total_timesteps | 324000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.1 |\n",
+ "| explained_variance | 0.43979698 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16199 |\n",
+ "| policy_loss | -0.00578 |\n",
+ "| std | 0.867 |\n",
+ "| value_loss | 6.38e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.8 |\n",
+ "| ep_rew_mean | -46.7 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16300 |\n",
+ "| time_elapsed | 1186 |\n",
+ "| total_timesteps | 326000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.8098929 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16299 |\n",
+ "| policy_loss | 0.000928 |\n",
+ "| std | 0.872 |\n",
+ "| value_loss | 2.55e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16400 |\n",
+ "| time_elapsed | 1193 |\n",
+ "| total_timesteps | 328000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.14 |\n",
+ "| explained_variance | 0.9836917 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16399 |\n",
+ "| policy_loss | -0.108 |\n",
+ "| std | 0.874 |\n",
+ "| value_loss | 0.00153 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16500 |\n",
+ "| time_elapsed | 1200 |\n",
+ "| total_timesteps | 330000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.13 |\n",
+ "| explained_variance | 0.9889623 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16499 |\n",
+ "| policy_loss | -0.00139 |\n",
+ "| std | 0.874 |\n",
+ "| value_loss | 5.57e-06 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16600 |\n",
+ "| time_elapsed | 1210 |\n",
+ "| total_timesteps | 332000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.012164831 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16599 |\n",
+ "| policy_loss | 4.24 |\n",
+ "| std | 0.871 |\n",
+ "| value_loss | 3.84 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 45.7 |\n",
+ "| ep_rew_mean | -45.6 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16700 |\n",
+ "| time_elapsed | 1217 |\n",
+ "| total_timesteps | 334000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.99397725 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16699 |\n",
+ "| policy_loss | 0.00754 |\n",
+ "| std | 0.872 |\n",
+ "| value_loss | 7.58e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.4 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16800 |\n",
+ "| time_elapsed | 1224 |\n",
+ "| total_timesteps | 336000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | -4.876895 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16799 |\n",
+ "| policy_loss | -0.00223 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 5.49e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 16900 |\n",
+ "| time_elapsed | 1231 |\n",
+ "| total_timesteps | 338000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | 0.89987206 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16899 |\n",
+ "| policy_loss | -0.00284 |\n",
+ "| std | 0.869 |\n",
+ "| value_loss | 1.76e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17000 |\n",
+ "| time_elapsed | 1245 |\n",
+ "| total_timesteps | 340000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | 0.5528773 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 16999 |\n",
+ "| policy_loss | -0.00291 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 9.32e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17100 |\n",
+ "| time_elapsed | 1255 |\n",
+ "| total_timesteps | 342000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.13 |\n",
+ "| explained_variance | 0.8135508 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17099 |\n",
+ "| policy_loss | 0.00777 |\n",
+ "| std | 0.872 |\n",
+ "| value_loss | 2.38e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17200 |\n",
+ "| time_elapsed | 1264 |\n",
+ "| total_timesteps | 344000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.16375178 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17199 |\n",
+ "| policy_loss | -0.0403 |\n",
+ "| std | 0.871 |\n",
+ "| value_loss | 0.000449 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17300 |\n",
+ "| time_elapsed | 1270 |\n",
+ "| total_timesteps | 346000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.12 |\n",
+ "| explained_variance | 0.22681582 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17299 |\n",
+ "| policy_loss | 0.0323 |\n",
+ "| std | 0.87 |\n",
+ "| value_loss | 0.000112 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.2 |\n",
+ "| ep_rew_mean | -49.2 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17400 |\n",
+ "| time_elapsed | 1277 |\n",
+ "| total_timesteps | 348000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | 0.9767354 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17399 |\n",
+ "| policy_loss | -0.00717 |\n",
+ "| std | 0.87 |\n",
+ "| value_loss | 5.74e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 271 |\n",
+ "| iterations | 17500 |\n",
+ "| time_elapsed | 1287 |\n",
+ "| total_timesteps | 350000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.1 |\n",
+ "| explained_variance | 0.9386433 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17499 |\n",
+ "| policy_loss | -0.00407 |\n",
+ "| std | 0.867 |\n",
+ "| value_loss | 1.6e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17600 |\n",
+ "| time_elapsed | 1293 |\n",
+ "| total_timesteps | 352000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.09 |\n",
+ "| explained_variance | 0.8396387 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17599 |\n",
+ "| policy_loss | -0.0163 |\n",
+ "| std | 0.865 |\n",
+ "| value_loss | 6.88e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.8 |\n",
+ "| ep_rew_mean | -46.8 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17700 |\n",
+ "| time_elapsed | 1299 |\n",
+ "| total_timesteps | 354000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | 0.9501319 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17699 |\n",
+ "| policy_loss | 0.0174 |\n",
+ "| std | 0.87 |\n",
+ "| value_loss | 3.56e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17800 |\n",
+ "| time_elapsed | 1306 |\n",
+ "| total_timesteps | 356000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.09 |\n",
+ "| explained_variance | 0.94616187 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17799 |\n",
+ "| policy_loss | 0.00346 |\n",
+ "| std | 0.865 |\n",
+ "| value_loss | 2.21e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 17900 |\n",
+ "| time_elapsed | 1312 |\n",
+ "| total_timesteps | 358000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.09 |\n",
+ "| explained_variance | 0.96140474 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17899 |\n",
+ "| policy_loss | -0.0017 |\n",
+ "| std | 0.865 |\n",
+ "| value_loss | 0.000215 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18000 |\n",
+ "| time_elapsed | 1322 |\n",
+ "| total_timesteps | 360000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.1 |\n",
+ "| explained_variance | 0.97701174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 17999 |\n",
+ "| policy_loss | 0.00294 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 4.87e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18100 |\n",
+ "| time_elapsed | 1328 |\n",
+ "| total_timesteps | 362000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.1 |\n",
+ "| explained_variance | 0.5843673 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18099 |\n",
+ "| policy_loss | -0.0331 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 0.00043 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.2 |\n",
+ "| ep_rew_mean | -49.2 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18200 |\n",
+ "| time_elapsed | 1334 |\n",
+ "| total_timesteps | 364000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.1 |\n",
+ "| explained_variance | 0.98297787 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18199 |\n",
+ "| policy_loss | 0.00959 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 6.49e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18300 |\n",
+ "| time_elapsed | 1341 |\n",
+ "| total_timesteps | 366000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.11 |\n",
+ "| explained_variance | 0.9543413 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18299 |\n",
+ "| policy_loss | 0.00256 |\n",
+ "| std | 0.868 |\n",
+ "| value_loss | 6.71e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 18400 |\n",
+ "| time_elapsed | 1347 |\n",
+ "| total_timesteps | 368000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.08 |\n",
+ "| explained_variance | -5.475193 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18399 |\n",
+ "| policy_loss | 0.119 |\n",
+ "| std | 0.863 |\n",
+ "| value_loss | 0.00233 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18500 |\n",
+ "| time_elapsed | 1356 |\n",
+ "| total_timesteps | 370000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.09 |\n",
+ "| explained_variance | 0.9711382 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18499 |\n",
+ "| policy_loss | 0.00211 |\n",
+ "| std | 0.865 |\n",
+ "| value_loss | 3.28e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 272 |\n",
+ "| iterations | 18600 |\n",
+ "| time_elapsed | 1363 |\n",
+ "| total_timesteps | 372000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.07 |\n",
+ "| explained_variance | 0.3015197 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18599 |\n",
+ "| policy_loss | 0.0673 |\n",
+ "| std | 0.861 |\n",
+ "| value_loss | 0.000601 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 18700 |\n",
+ "| time_elapsed | 1369 |\n",
+ "| total_timesteps | 374000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.06 |\n",
+ "| explained_variance | 0.972879 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18699 |\n",
+ "| policy_loss | 0.00273 |\n",
+ "| std | 0.859 |\n",
+ "| value_loss | 8.61e-06 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 18800 |\n",
+ "| time_elapsed | 1375 |\n",
+ "| total_timesteps | 376000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.07 |\n",
+ "| explained_variance | 0.9947514 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18799 |\n",
+ "| policy_loss | 0.000101 |\n",
+ "| std | 0.86 |\n",
+ "| value_loss | 1.14e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 18900 |\n",
+ "| time_elapsed | 1381 |\n",
+ "| total_timesteps | 378000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.07 |\n",
+ "| explained_variance | 0.97353405 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18899 |\n",
+ "| policy_loss | 0.00465 |\n",
+ "| std | 0.861 |\n",
+ "| value_loss | 3.59e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19000 |\n",
+ "| time_elapsed | 1391 |\n",
+ "| total_timesteps | 380000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.05 |\n",
+ "| explained_variance | 0.91057456 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 18999 |\n",
+ "| policy_loss | 0.0117 |\n",
+ "| std | 0.857 |\n",
+ "| value_loss | 5.27e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19100 |\n",
+ "| time_elapsed | 1397 |\n",
+ "| total_timesteps | 382000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.03 |\n",
+ "| explained_variance | 0.9235318 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19099 |\n",
+ "| policy_loss | 0.00576 |\n",
+ "| std | 0.852 |\n",
+ "| value_loss | 8.68e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19200 |\n",
+ "| time_elapsed | 1403 |\n",
+ "| total_timesteps | 384000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.02 |\n",
+ "| explained_variance | 0.96858865 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19199 |\n",
+ "| policy_loss | 0.0038 |\n",
+ "| std | 0.85 |\n",
+ "| value_loss | 8.57e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19300 |\n",
+ "| time_elapsed | 1410 |\n",
+ "| total_timesteps | 386000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.03 |\n",
+ "| explained_variance | 0.86711633 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19299 |\n",
+ "| policy_loss | -0.0111 |\n",
+ "| std | 0.852 |\n",
+ "| value_loss | 1.94e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19400 |\n",
+ "| time_elapsed | 1416 |\n",
+ "| total_timesteps | 388000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.98 |\n",
+ "| explained_variance | 0.94285834 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19399 |\n",
+ "| policy_loss | 0.00256 |\n",
+ "| std | 0.842 |\n",
+ "| value_loss | 3.84e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19500 |\n",
+ "| time_elapsed | 1427 |\n",
+ "| total_timesteps | 390000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.99 |\n",
+ "| explained_variance | 0.86415195 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19499 |\n",
+ "| policy_loss | -0.0182 |\n",
+ "| std | 0.843 |\n",
+ "| value_loss | 3.04e-05 |\n",
+ "--------------------------------------\n",
+ "----------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19600 |\n",
+ "| time_elapsed | 1433 |\n",
+ "| total_timesteps | 392000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5 |\n",
+ "| explained_variance | 0.0044369698 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19599 |\n",
+ "| policy_loss | 2.2 |\n",
+ "| std | 0.845 |\n",
+ "| value_loss | 3.88 |\n",
+ "----------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19700 |\n",
+ "| time_elapsed | 1439 |\n",
+ "| total_timesteps | 394000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5 |\n",
+ "| explained_variance | 0.86026084 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19699 |\n",
+ "| policy_loss | -0.0129 |\n",
+ "| std | 0.845 |\n",
+ "| value_loss | 9.01e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 19800 |\n",
+ "| time_elapsed | 1445 |\n",
+ "| total_timesteps | 396000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -5.01 |\n",
+ "| explained_variance | 0.8322771 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19799 |\n",
+ "| policy_loss | 0.00362 |\n",
+ "| std | 0.848 |\n",
+ "| value_loss | 3.95e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 19900 |\n",
+ "| time_elapsed | 1451 |\n",
+ "| total_timesteps | 398000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.99 |\n",
+ "| explained_variance | 0.9543173 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19899 |\n",
+ "| policy_loss | 0.00162 |\n",
+ "| std | 0.843 |\n",
+ "| value_loss | 8.87e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 20000 |\n",
+ "| time_elapsed | 1461 |\n",
+ "| total_timesteps | 400000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.99 |\n",
+ "| explained_variance | 0.96547365 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 19999 |\n",
+ "| policy_loss | -0.03 |\n",
+ "| std | 0.844 |\n",
+ "| value_loss | 3.86e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 20100 |\n",
+ "| time_elapsed | 1467 |\n",
+ "| total_timesteps | 402000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.98 |\n",
+ "| explained_variance | 0.9669047 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20099 |\n",
+ "| policy_loss | 0.00462 |\n",
+ "| std | 0.841 |\n",
+ "| value_loss | 5.21e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.8 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20200 |\n",
+ "| time_elapsed | 1473 |\n",
+ "| total_timesteps | 404000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.96 |\n",
+ "| explained_variance | 0.8856006 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20199 |\n",
+ "| policy_loss | -0.00941 |\n",
+ "| std | 0.838 |\n",
+ "| value_loss | 1.04e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20300 |\n",
+ "| time_elapsed | 1479 |\n",
+ "| total_timesteps | 406000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.96 |\n",
+ "| explained_variance | 0.96606714 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20299 |\n",
+ "| policy_loss | -0.0049 |\n",
+ "| std | 0.838 |\n",
+ "| value_loss | 3.18e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20400 |\n",
+ "| time_elapsed | 1486 |\n",
+ "| total_timesteps | 408000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.97 |\n",
+ "| explained_variance | 0.9416196 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20399 |\n",
+ "| policy_loss | 0.0102 |\n",
+ "| std | 0.839 |\n",
+ "| value_loss | 1.27e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 273 |\n",
+ "| iterations | 20500 |\n",
+ "| time_elapsed | 1496 |\n",
+ "| total_timesteps | 410000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.98 |\n",
+ "| explained_variance | -1.4714031 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20499 |\n",
+ "| policy_loss | 0.00969 |\n",
+ "| std | 0.842 |\n",
+ "| value_loss | 1.55e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20600 |\n",
+ "| time_elapsed | 1503 |\n",
+ "| total_timesteps | 412000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.97 |\n",
+ "| explained_variance | 0.91544884 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20599 |\n",
+ "| policy_loss | 0.00142 |\n",
+ "| std | 0.84 |\n",
+ "| value_loss | 1.74e-06 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20700 |\n",
+ "| time_elapsed | 1509 |\n",
+ "| total_timesteps | 414000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.94 |\n",
+ "| explained_variance | 0.906113 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20699 |\n",
+ "| policy_loss | -0.00503 |\n",
+ "| std | 0.833 |\n",
+ "| value_loss | 3.68e-06 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20800 |\n",
+ "| time_elapsed | 1516 |\n",
+ "| total_timesteps | 416000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.93 |\n",
+ "| explained_variance | 0.95149547 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20799 |\n",
+ "| policy_loss | 0.00178 |\n",
+ "| std | 0.831 |\n",
+ "| value_loss | 1.11e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 20900 |\n",
+ "| time_elapsed | 1522 |\n",
+ "| total_timesteps | 418000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.91 |\n",
+ "| explained_variance | 0.9828419 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20899 |\n",
+ "| policy_loss | 0.0151 |\n",
+ "| std | 0.827 |\n",
+ "| value_loss | 3.95e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21000 |\n",
+ "| time_elapsed | 1528 |\n",
+ "| total_timesteps | 420000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.9 |\n",
+ "| explained_variance | 0.9770458 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 20999 |\n",
+ "| policy_loss | -0.00239 |\n",
+ "| std | 0.825 |\n",
+ "| value_loss | 7.19e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21100 |\n",
+ "| time_elapsed | 1538 |\n",
+ "| total_timesteps | 422000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.9 |\n",
+ "| explained_variance | 0.92313987 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21099 |\n",
+ "| policy_loss | 0.00894 |\n",
+ "| std | 0.826 |\n",
+ "| value_loss | 9.61e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21200 |\n",
+ "| time_elapsed | 1544 |\n",
+ "| total_timesteps | 424000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.89 |\n",
+ "| explained_variance | 0.32365882 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21199 |\n",
+ "| policy_loss | -0.0463 |\n",
+ "| std | 0.822 |\n",
+ "| value_loss | 9.98e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21300 |\n",
+ "| time_elapsed | 1551 |\n",
+ "| total_timesteps | 426000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.88 |\n",
+ "| explained_variance | 0.7403059 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21299 |\n",
+ "| policy_loss | 0.00897 |\n",
+ "| std | 0.821 |\n",
+ "| value_loss | 9.72e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21400 |\n",
+ "| time_elapsed | 1557 |\n",
+ "| total_timesteps | 428000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.87 |\n",
+ "| explained_variance | 0.8968396 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21399 |\n",
+ "| policy_loss | -0.00422 |\n",
+ "| std | 0.819 |\n",
+ "| value_loss | 5.15e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 21500 |\n",
+ "| time_elapsed | 1563 |\n",
+ "| total_timesteps | 430000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.87 |\n",
+ "| explained_variance | 0.9448255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21499 |\n",
+ "| policy_loss | -0.00374 |\n",
+ "| std | 0.818 |\n",
+ "| value_loss | 1.92e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21600 |\n",
+ "| time_elapsed | 1572 |\n",
+ "| total_timesteps | 432000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.87 |\n",
+ "| explained_variance | 0.850035 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21599 |\n",
+ "| policy_loss | -9.31e-05 |\n",
+ "| std | 0.818 |\n",
+ "| value_loss | 6.13e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 274 |\n",
+ "| iterations | 21700 |\n",
+ "| time_elapsed | 1578 |\n",
+ "| total_timesteps | 434000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.84 |\n",
+ "| explained_variance | 0.48841304 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21699 |\n",
+ "| policy_loss | -0.0312 |\n",
+ "| std | 0.812 |\n",
+ "| value_loss | 6.78e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.9 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 21800 |\n",
+ "| time_elapsed | 1584 |\n",
+ "| total_timesteps | 436000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.84 |\n",
+ "| explained_variance | 0.97507805 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21799 |\n",
+ "| policy_loss | -0.00284 |\n",
+ "| std | 0.812 |\n",
+ "| value_loss | 3.46e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 21900 |\n",
+ "| time_elapsed | 1589 |\n",
+ "| total_timesteps | 438000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.84 |\n",
+ "| explained_variance | 0.68833864 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21899 |\n",
+ "| policy_loss | 0.0115 |\n",
+ "| std | 0.813 |\n",
+ "| value_loss | 9.05e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22000 |\n",
+ "| time_elapsed | 1595 |\n",
+ "| total_timesteps | 440000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.83 |\n",
+ "| explained_variance | 0.98591065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 21999 |\n",
+ "| policy_loss | 0.00962 |\n",
+ "| std | 0.811 |\n",
+ "| value_loss | 4.56e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.7 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22100 |\n",
+ "| time_elapsed | 1605 |\n",
+ "| total_timesteps | 442000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.83 |\n",
+ "| explained_variance | 0.82283175 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22099 |\n",
+ "| policy_loss | 0.0108 |\n",
+ "| std | 0.811 |\n",
+ "| value_loss | 1.54e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22200 |\n",
+ "| time_elapsed | 1611 |\n",
+ "| total_timesteps | 444000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.8 |\n",
+ "| explained_variance | 0.59894145 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22199 |\n",
+ "| policy_loss | 0.0302 |\n",
+ "| std | 0.804 |\n",
+ "| value_loss | 0.000191 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22300 |\n",
+ "| time_elapsed | 1617 |\n",
+ "| total_timesteps | 446000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.78 |\n",
+ "| explained_variance | 0.9134196 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22299 |\n",
+ "| policy_loss | 0.00497 |\n",
+ "| std | 0.802 |\n",
+ "| value_loss | 4.45e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22400 |\n",
+ "| time_elapsed | 1623 |\n",
+ "| total_timesteps | 448000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.79 |\n",
+ "| explained_variance | 0.9829938 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22399 |\n",
+ "| policy_loss | -0.0215 |\n",
+ "| std | 0.803 |\n",
+ "| value_loss | 2.6e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 22500 |\n",
+ "| time_elapsed | 1629 |\n",
+ "| total_timesteps | 450000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.78 |\n",
+ "| explained_variance | 0.9305882 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22499 |\n",
+ "| policy_loss | -0.000147 |\n",
+ "| std | 0.802 |\n",
+ "| value_loss | 7.5e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 22600 |\n",
+ "| time_elapsed | 1635 |\n",
+ "| total_timesteps | 452000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.77 |\n",
+ "| explained_variance | 0.35571432 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22599 |\n",
+ "| policy_loss | -0.00716 |\n",
+ "| std | 0.799 |\n",
+ "| value_loss | 2.19e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 275 |\n",
+ "| iterations | 22700 |\n",
+ "| time_elapsed | 1645 |\n",
+ "| total_timesteps | 454000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.77 |\n",
+ "| explained_variance | 0.9946183 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22699 |\n",
+ "| policy_loss | 0.00128 |\n",
+ "| std | 0.799 |\n",
+ "| value_loss | 1.1e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 22800 |\n",
+ "| time_elapsed | 1651 |\n",
+ "| total_timesteps | 456000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.76 |\n",
+ "| explained_variance | 0.9850599 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22799 |\n",
+ "| policy_loss | 0.00015 |\n",
+ "| std | 0.798 |\n",
+ "| value_loss | 3.45e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 22900 |\n",
+ "| time_elapsed | 1657 |\n",
+ "| total_timesteps | 458000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.75 |\n",
+ "| explained_variance | -1.0031595 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22899 |\n",
+ "| policy_loss | 0.000758 |\n",
+ "| std | 0.796 |\n",
+ "| value_loss | 8.14e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.2 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23000 |\n",
+ "| time_elapsed | 1663 |\n",
+ "| total_timesteps | 460000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.75 |\n",
+ "| explained_variance | 0.1282289 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 22999 |\n",
+ "| policy_loss | -0.0544 |\n",
+ "| std | 0.795 |\n",
+ "| value_loss | 0.000587 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23100 |\n",
+ "| time_elapsed | 1670 |\n",
+ "| total_timesteps | 462000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.75 |\n",
+ "| explained_variance | 0.84313476 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23099 |\n",
+ "| policy_loss | -0.0114 |\n",
+ "| std | 0.795 |\n",
+ "| value_loss | 8.02e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.5 |\n",
+ "| ep_rew_mean | -46.4 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23200 |\n",
+ "| time_elapsed | 1679 |\n",
+ "| total_timesteps | 464000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.74 |\n",
+ "| explained_variance | 0.71710217 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23199 |\n",
+ "| policy_loss | -0.0132 |\n",
+ "| std | 0.793 |\n",
+ "| value_loss | 0.000272 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 45.3 |\n",
+ "| ep_rew_mean | -45.2 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23300 |\n",
+ "| time_elapsed | 1685 |\n",
+ "| total_timesteps | 466000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.73 |\n",
+ "| explained_variance | 0.9658966 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23299 |\n",
+ "| policy_loss | -0.00875 |\n",
+ "| std | 0.792 |\n",
+ "| value_loss | 1.17e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23400 |\n",
+ "| time_elapsed | 1691 |\n",
+ "| total_timesteps | 468000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.72 |\n",
+ "| explained_variance | 0.98442066 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23399 |\n",
+ "| policy_loss | 0.000132 |\n",
+ "| std | 0.791 |\n",
+ "| value_loss | 1.35e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23500 |\n",
+ "| time_elapsed | 1697 |\n",
+ "| total_timesteps | 470000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.71 |\n",
+ "| explained_variance | -0.20414686 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23499 |\n",
+ "| policy_loss | 0.0214 |\n",
+ "| std | 0.788 |\n",
+ "| value_loss | 4.31e-05 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 23600 |\n",
+ "| time_elapsed | 1703 |\n",
+ "| total_timesteps | 472000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.68 |\n",
+ "| explained_variance | 0.8207843 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23599 |\n",
+ "| policy_loss | -0.0187 |\n",
+ "| std | 0.781 |\n",
+ "| value_loss | 4.95e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23700 |\n",
+ "| time_elapsed | 1713 |\n",
+ "| total_timesteps | 474000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.67 |\n",
+ "| explained_variance | 0.9646553 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23699 |\n",
+ "| policy_loss | -0.00441 |\n",
+ "| std | 0.78 |\n",
+ "| value_loss | 4.12e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23800 |\n",
+ "| time_elapsed | 1719 |\n",
+ "| total_timesteps | 476000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.64 |\n",
+ "| explained_variance | 0.29265028 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23799 |\n",
+ "| policy_loss | -0.0266 |\n",
+ "| std | 0.775 |\n",
+ "| value_loss | 8.52e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 23900 |\n",
+ "| time_elapsed | 1725 |\n",
+ "| total_timesteps | 478000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.63 |\n",
+ "| explained_variance | 0.53161466 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23899 |\n",
+ "| policy_loss | 0.0194 |\n",
+ "| std | 0.773 |\n",
+ "| value_loss | 0.00012 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.7 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24000 |\n",
+ "| time_elapsed | 1731 |\n",
+ "| total_timesteps | 480000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.61 |\n",
+ "| explained_variance | -0.45772743 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 23999 |\n",
+ "| policy_loss | 0.0221 |\n",
+ "| std | 0.769 |\n",
+ "| value_loss | 0.000243 |\n",
+ "---------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24100 |\n",
+ "| time_elapsed | 1737 |\n",
+ "| total_timesteps | 482000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.6 |\n",
+ "| explained_variance | 0.984889 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24099 |\n",
+ "| policy_loss | 0.00232 |\n",
+ "| std | 0.767 |\n",
+ "| value_loss | 2.45e-06 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 276 |\n",
+ "| iterations | 24200 |\n",
+ "| time_elapsed | 1747 |\n",
+ "| total_timesteps | 484000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.6 |\n",
+ "| explained_variance | 0.38117242 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24199 |\n",
+ "| policy_loss | -0.00147 |\n",
+ "| std | 0.766 |\n",
+ "| value_loss | 2.92e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24300 |\n",
+ "| time_elapsed | 1753 |\n",
+ "| total_timesteps | 486000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.57 |\n",
+ "| explained_variance | 0.8432429 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24299 |\n",
+ "| policy_loss | -0.0173 |\n",
+ "| std | 0.761 |\n",
+ "| value_loss | 7.87e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.3 |\n",
+ "| ep_rew_mean | -49.3 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24400 |\n",
+ "| time_elapsed | 1759 |\n",
+ "| total_timesteps | 488000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.55 |\n",
+ "| explained_variance | 0.74071956 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24399 |\n",
+ "| policy_loss | -0.000541 |\n",
+ "| std | 0.757 |\n",
+ "| value_loss | 5.11e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24500 |\n",
+ "| time_elapsed | 1765 |\n",
+ "| total_timesteps | 490000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.53 |\n",
+ "| explained_variance | 0.93212646 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24499 |\n",
+ "| policy_loss | -0.0138 |\n",
+ "| std | 0.752 |\n",
+ "| value_loss | 1.72e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24600 |\n",
+ "| time_elapsed | 1771 |\n",
+ "| total_timesteps | 492000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.49 |\n",
+ "| explained_variance | 0.83804965 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24599 |\n",
+ "| policy_loss | -0.0364 |\n",
+ "| std | 0.746 |\n",
+ "| value_loss | 7.15e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24700 |\n",
+ "| time_elapsed | 1777 |\n",
+ "| total_timesteps | 494000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.5 |\n",
+ "| explained_variance | 0.99318516 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24699 |\n",
+ "| policy_loss | -0.00357 |\n",
+ "| std | 0.746 |\n",
+ "| value_loss | 3.81e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24800 |\n",
+ "| time_elapsed | 1787 |\n",
+ "| total_timesteps | 496000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.49 |\n",
+ "| explained_variance | 0.9355085 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24799 |\n",
+ "| policy_loss | -0.000438 |\n",
+ "| std | 0.745 |\n",
+ "| value_loss | 1.37e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 24900 |\n",
+ "| time_elapsed | 1793 |\n",
+ "| total_timesteps | 498000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.48 |\n",
+ "| explained_variance | 0.9021735 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24899 |\n",
+ "| policy_loss | -0.000913 |\n",
+ "| std | 0.744 |\n",
+ "| value_loss | 2.17e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 25000 |\n",
+ "| time_elapsed | 1799 |\n",
+ "| total_timesteps | 500000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.47 |\n",
+ "| explained_variance | 0.9590338 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 24999 |\n",
+ "| policy_loss | -0.000665 |\n",
+ "| std | 0.742 |\n",
+ "| value_loss | 8.6e-07 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25100 |\n",
+ "| time_elapsed | 1805 |\n",
+ "| total_timesteps | 502000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.46 |\n",
+ "| explained_variance | -0.09868252 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25099 |\n",
+ "| policy_loss | 0.00935 |\n",
+ "| std | 0.74 |\n",
+ "| value_loss | 3.3e-05 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.8 |\n",
+ "| ep_rew_mean | -49.7 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25200 |\n",
+ "| time_elapsed | 1811 |\n",
+ "| total_timesteps | 504000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.45 |\n",
+ "| explained_variance | 0.9393065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25199 |\n",
+ "| policy_loss | -0.00298 |\n",
+ "| std | 0.739 |\n",
+ "| value_loss | 2.64e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 277 |\n",
+ "| iterations | 25300 |\n",
+ "| time_elapsed | 1820 |\n",
+ "| total_timesteps | 506000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.46 |\n",
+ "| explained_variance | 0.9661807 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25299 |\n",
+ "| policy_loss | -0.00921 |\n",
+ "| std | 0.739 |\n",
+ "| value_loss | 5.25e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25400 |\n",
+ "| time_elapsed | 1826 |\n",
+ "| total_timesteps | 508000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.45 |\n",
+ "| explained_variance | 0.98033226 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25399 |\n",
+ "| policy_loss | -0.0115 |\n",
+ "| std | 0.738 |\n",
+ "| value_loss | 1.33e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25500 |\n",
+ "| time_elapsed | 1832 |\n",
+ "| total_timesteps | 510000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.45 |\n",
+ "| explained_variance | 0.98172903 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25499 |\n",
+ "| policy_loss | -0.00525 |\n",
+ "| std | 0.737 |\n",
+ "| value_loss | 4.09e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25600 |\n",
+ "| time_elapsed | 1838 |\n",
+ "| total_timesteps | 512000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.44 |\n",
+ "| explained_variance | 0.9630763 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25599 |\n",
+ "| policy_loss | 0.00446 |\n",
+ "| std | 0.737 |\n",
+ "| value_loss | 3.84e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25700 |\n",
+ "| time_elapsed | 1844 |\n",
+ "| total_timesteps | 514000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.42 |\n",
+ "| explained_variance | 0.74551255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25699 |\n",
+ "| policy_loss | -0.00379 |\n",
+ "| std | 0.733 |\n",
+ "| value_loss | 4.06e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25800 |\n",
+ "| time_elapsed | 1851 |\n",
+ "| total_timesteps | 516000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.42 |\n",
+ "| explained_variance | 0.88611174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25799 |\n",
+ "| policy_loss | -0.00438 |\n",
+ "| std | 0.733 |\n",
+ "| value_loss | 4.34e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.2 |\n",
+ "| ep_rew_mean | -46.2 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 25900 |\n",
+ "| time_elapsed | 1860 |\n",
+ "| total_timesteps | 518000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.4 |\n",
+ "| explained_variance | 0.97400296 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25899 |\n",
+ "| policy_loss | -0.0016 |\n",
+ "| std | 0.73 |\n",
+ "| value_loss | 3.13e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.4 |\n",
+ "| ep_rew_mean | -47.4 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26000 |\n",
+ "| time_elapsed | 1866 |\n",
+ "| total_timesteps | 520000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.43 |\n",
+ "| explained_variance | 0.9903519 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 25999 |\n",
+ "| policy_loss | -0.00792 |\n",
+ "| std | 0.734 |\n",
+ "| value_loss | 4.45e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26100 |\n",
+ "| time_elapsed | 1872 |\n",
+ "| total_timesteps | 522000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.41 |\n",
+ "| explained_variance | 0.96013033 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26099 |\n",
+ "| policy_loss | -0.00215 |\n",
+ "| std | 0.733 |\n",
+ "| value_loss | 2.28e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26200 |\n",
+ "| time_elapsed | 1878 |\n",
+ "| total_timesteps | 524000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.42 |\n",
+ "| explained_variance | 0.9256018 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26199 |\n",
+ "| policy_loss | 0.00556 |\n",
+ "| std | 0.733 |\n",
+ "| value_loss | 6.28e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 26300 |\n",
+ "| time_elapsed | 1884 |\n",
+ "| total_timesteps | 526000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.4 |\n",
+ "| explained_variance | 0.96353793 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26299 |\n",
+ "| policy_loss | -0.00559 |\n",
+ "| std | 0.73 |\n",
+ "| value_loss | 2.37e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26400 |\n",
+ "| time_elapsed | 1894 |\n",
+ "| total_timesteps | 528000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.4 |\n",
+ "| explained_variance | 0.9796733 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26399 |\n",
+ "| policy_loss | -0.00667 |\n",
+ "| std | 0.73 |\n",
+ "| value_loss | 1.02e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26500 |\n",
+ "| time_elapsed | 1900 |\n",
+ "| total_timesteps | 530000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.4 |\n",
+ "| explained_variance | 0.9652436 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26499 |\n",
+ "| policy_loss | -0.00465 |\n",
+ "| std | 0.73 |\n",
+ "| value_loss | 2.56e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26600 |\n",
+ "| time_elapsed | 1906 |\n",
+ "| total_timesteps | 532000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.39 |\n",
+ "| explained_variance | 0.85467994 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26599 |\n",
+ "| policy_loss | -0.000304 |\n",
+ "| std | 0.728 |\n",
+ "| value_loss | 8.97e-06 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 26700 |\n",
+ "| time_elapsed | 1912 |\n",
+ "| total_timesteps | 534000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.36 |\n",
+ "| explained_variance | 0.99323 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26699 |\n",
+ "| policy_loss | 0.00249 |\n",
+ "| std | 0.724 |\n",
+ "| value_loss | 4.39e-07 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 26800 |\n",
+ "| time_elapsed | 1919 |\n",
+ "| total_timesteps | 536000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.36 |\n",
+ "| explained_variance | 0.95387536 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26799 |\n",
+ "| policy_loss | 0.0109 |\n",
+ "| std | 0.723 |\n",
+ "| value_loss | 1.75e-05 |\n",
+ "--------------------------------------\n",
+ "-----------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 26900 |\n",
+ "| time_elapsed | 1928 |\n",
+ "| total_timesteps | 538000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.35 |\n",
+ "| explained_variance | -0.0017001629 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26899 |\n",
+ "| policy_loss | 3.99 |\n",
+ "| std | 0.72 |\n",
+ "| value_loss | 11.5 |\n",
+ "-----------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27000 |\n",
+ "| time_elapsed | 1934 |\n",
+ "| total_timesteps | 540000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.8643065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 26999 |\n",
+ "| policy_loss | -0.0105 |\n",
+ "| std | 0.718 |\n",
+ "| value_loss | 9.68e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27100 |\n",
+ "| time_elapsed | 1941 |\n",
+ "| total_timesteps | 542000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.33 |\n",
+ "| explained_variance | 0.9878161 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27099 |\n",
+ "| policy_loss | 0.00164 |\n",
+ "| std | 0.718 |\n",
+ "| value_loss | 9.34e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27200 |\n",
+ "| time_elapsed | 1947 |\n",
+ "| total_timesteps | 544000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.32 |\n",
+ "| explained_variance | 0.9731284 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27199 |\n",
+ "| policy_loss | -0.000907 |\n",
+ "| std | 0.716 |\n",
+ "| value_loss | 1.15e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27300 |\n",
+ "| time_elapsed | 1952 |\n",
+ "| total_timesteps | 546000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.32 |\n",
+ "| explained_variance | 0.8868263 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27299 |\n",
+ "| policy_loss | -0.000882 |\n",
+ "| std | 0.715 |\n",
+ "| value_loss | 1.92e-06 |\n",
+ "-------------------------------------\n",
+ "----------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27400 |\n",
+ "| time_elapsed | 1958 |\n",
+ "| total_timesteps | 548000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.31 |\n",
+ "| explained_variance | 0.0024583936 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27399 |\n",
+ "| policy_loss | 2.55 |\n",
+ "| std | 0.715 |\n",
+ "| value_loss | 3.87 |\n",
+ "----------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27500 |\n",
+ "| time_elapsed | 1968 |\n",
+ "| total_timesteps | 550000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.29 |\n",
+ "| explained_variance | 0.9876905 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27499 |\n",
+ "| policy_loss | -0.00128 |\n",
+ "| std | 0.71 |\n",
+ "| value_loss | 2.36e-06 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27600 |\n",
+ "| time_elapsed | 1974 |\n",
+ "| total_timesteps | 552000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.29 |\n",
+ "| explained_variance | 0.905852 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27599 |\n",
+ "| policy_loss | 0.00117 |\n",
+ "| std | 0.711 |\n",
+ "| value_loss | 1.32e-06 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27700 |\n",
+ "| time_elapsed | 1980 |\n",
+ "| total_timesteps | 554000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.28 |\n",
+ "| explained_variance | 0.88651055 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27699 |\n",
+ "| policy_loss | -0.00138 |\n",
+ "| std | 0.709 |\n",
+ "| value_loss | 2.35e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 27800 |\n",
+ "| time_elapsed | 1985 |\n",
+ "| total_timesteps | 556000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.29 |\n",
+ "| explained_variance | 0.9799976 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27799 |\n",
+ "| policy_loss | 0.000998 |\n",
+ "| std | 0.711 |\n",
+ "| value_loss | 3.69e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 27900 |\n",
+ "| time_elapsed | 1991 |\n",
+ "| total_timesteps | 558000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.29 |\n",
+ "| explained_variance | 0.58545244 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27899 |\n",
+ "| policy_loss | 0.000947 |\n",
+ "| std | 0.712 |\n",
+ "| value_loss | 6.65e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 28000 |\n",
+ "| time_elapsed | 2001 |\n",
+ "| total_timesteps | 560000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.26 |\n",
+ "| explained_variance | 0.62959266 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 27999 |\n",
+ "| policy_loss | 0.0204 |\n",
+ "| std | 0.707 |\n",
+ "| value_loss | 0.000115 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.4 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 28100 |\n",
+ "| time_elapsed | 2007 |\n",
+ "| total_timesteps | 562000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.26 |\n",
+ "| explained_variance | 0.42031288 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28099 |\n",
+ "| policy_loss | -0.00396 |\n",
+ "| std | 0.705 |\n",
+ "| value_loss | 2.14e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28200 |\n",
+ "| time_elapsed | 2013 |\n",
+ "| total_timesteps | 564000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.96632874 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28199 |\n",
+ "| policy_loss | -0.0043 |\n",
+ "| std | 0.704 |\n",
+ "| value_loss | 5.77e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28300 |\n",
+ "| time_elapsed | 2019 |\n",
+ "| total_timesteps | 566000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | -0.96487904 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28299 |\n",
+ "| policy_loss | -0.000124 |\n",
+ "| std | 0.704 |\n",
+ "| value_loss | 0.000174 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28400 |\n",
+ "| time_elapsed | 2025 |\n",
+ "| total_timesteps | 568000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.24 |\n",
+ "| explained_variance | 0.97085166 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28399 |\n",
+ "| policy_loss | 0.000454 |\n",
+ "| std | 0.703 |\n",
+ "| value_loss | 3.15e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28500 |\n",
+ "| time_elapsed | 2035 |\n",
+ "| total_timesteps | 570000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.25 |\n",
+ "| explained_variance | 0.95976454 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28499 |\n",
+ "| policy_loss | -0.0113 |\n",
+ "| std | 0.704 |\n",
+ "| value_loss | 9.16e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28600 |\n",
+ "| time_elapsed | 2041 |\n",
+ "| total_timesteps | 572000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.23 |\n",
+ "| explained_variance | 0.9088546 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28599 |\n",
+ "| policy_loss | 0.00286 |\n",
+ "| std | 0.701 |\n",
+ "| value_loss | 3.36e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28700 |\n",
+ "| time_elapsed | 2047 |\n",
+ "| total_timesteps | 574000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.2 |\n",
+ "| explained_variance | 0.97843605 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28699 |\n",
+ "| policy_loss | 8.56e-06 |\n",
+ "| std | 0.695 |\n",
+ "| value_loss | 5.1e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28800 |\n",
+ "| time_elapsed | 2053 |\n",
+ "| total_timesteps | 576000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.2 |\n",
+ "| explained_variance | 0.93301547 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28799 |\n",
+ "| policy_loss | -0.00356 |\n",
+ "| std | 0.695 |\n",
+ "| value_loss | 3.79e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 28900 |\n",
+ "| time_elapsed | 2059 |\n",
+ "| total_timesteps | 578000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.2 |\n",
+ "| explained_variance | 0.8788929 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28899 |\n",
+ "| policy_loss | -0.00445 |\n",
+ "| std | 0.695 |\n",
+ "| value_loss | 1.31e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29000 |\n",
+ "| time_elapsed | 2066 |\n",
+ "| total_timesteps | 580000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.19 |\n",
+ "| explained_variance | 0.9705282 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 28999 |\n",
+ "| policy_loss | -0.0101 |\n",
+ "| std | 0.694 |\n",
+ "| value_loss | 7.94e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29100 |\n",
+ "| time_elapsed | 2075 |\n",
+ "| total_timesteps | 582000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.17 |\n",
+ "| explained_variance | 0.98360187 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29099 |\n",
+ "| policy_loss | 0.00022 |\n",
+ "| std | 0.69 |\n",
+ "| value_loss | 3.07e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29200 |\n",
+ "| time_elapsed | 2082 |\n",
+ "| total_timesteps | 584000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.17 |\n",
+ "| explained_variance | 0.9926867 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29199 |\n",
+ "| policy_loss | 0.00738 |\n",
+ "| std | 0.69 |\n",
+ "| value_loss | 5.01e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29300 |\n",
+ "| time_elapsed | 2088 |\n",
+ "| total_timesteps | 586000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.17 |\n",
+ "| explained_variance | 0.90506065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29299 |\n",
+ "| policy_loss | 0.00392 |\n",
+ "| std | 0.689 |\n",
+ "| value_loss | 2.7e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29400 |\n",
+ "| time_elapsed | 2094 |\n",
+ "| total_timesteps | 588000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.15 |\n",
+ "| explained_variance | 0.80868274 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29399 |\n",
+ "| policy_loss | -0.00462 |\n",
+ "| std | 0.686 |\n",
+ "| value_loss | 3.01e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29500 |\n",
+ "| time_elapsed | 2101 |\n",
+ "| total_timesteps | 590000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.16 |\n",
+ "| explained_variance | 0.963571 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29499 |\n",
+ "| policy_loss | -0.000898 |\n",
+ "| std | 0.689 |\n",
+ "| value_loss | 2.56e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29600 |\n",
+ "| time_elapsed | 2110 |\n",
+ "| total_timesteps | 592000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.16 |\n",
+ "| explained_variance | 0.9684437 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29599 |\n",
+ "| policy_loss | -0.00116 |\n",
+ "| std | 0.688 |\n",
+ "| value_loss | 2.12e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29700 |\n",
+ "| time_elapsed | 2117 |\n",
+ "| total_timesteps | 594000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.13 |\n",
+ "| explained_variance | 0.9956419 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29699 |\n",
+ "| policy_loss | 0.00424 |\n",
+ "| std | 0.683 |\n",
+ "| value_loss | 2.75e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29800 |\n",
+ "| time_elapsed | 2123 |\n",
+ "| total_timesteps | 596000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.12 |\n",
+ "| explained_variance | 0.9595566 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29799 |\n",
+ "| policy_loss | 0.00185 |\n",
+ "| std | 0.681 |\n",
+ "| value_loss | 4.19e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 29900 |\n",
+ "| time_elapsed | 2129 |\n",
+ "| total_timesteps | 598000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.11 |\n",
+ "| explained_variance | 0.96305496 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29899 |\n",
+ "| policy_loss | -0.000112 |\n",
+ "| std | 0.68 |\n",
+ "| value_loss | 6.34e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30000 |\n",
+ "| time_elapsed | 2135 |\n",
+ "| total_timesteps | 600000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.11 |\n",
+ "| explained_variance | 0.9809834 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 29999 |\n",
+ "| policy_loss | 6.45e-06 |\n",
+ "| std | 0.68 |\n",
+ "| value_loss | 4.58e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30100 |\n",
+ "| time_elapsed | 2145 |\n",
+ "| total_timesteps | 602000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.11 |\n",
+ "| explained_variance | 0.8857182 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30099 |\n",
+ "| policy_loss | 0.000569 |\n",
+ "| std | 0.68 |\n",
+ "| value_loss | 2.5e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30200 |\n",
+ "| time_elapsed | 2151 |\n",
+ "| total_timesteps | 604000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.11 |\n",
+ "| explained_variance | 0.99272937 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30199 |\n",
+ "| policy_loss | 0.00266 |\n",
+ "| std | 0.679 |\n",
+ "| value_loss | 1.14e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30300 |\n",
+ "| time_elapsed | 2158 |\n",
+ "| total_timesteps | 606000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.09 |\n",
+ "| explained_variance | 0.78605455 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30299 |\n",
+ "| policy_loss | -0.00613 |\n",
+ "| std | 0.676 |\n",
+ "| value_loss | 3.41e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30400 |\n",
+ "| time_elapsed | 2164 |\n",
+ "| total_timesteps | 608000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.1 |\n",
+ "| explained_variance | 0.9736405 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30399 |\n",
+ "| policy_loss | -0.000302 |\n",
+ "| std | 0.677 |\n",
+ "| value_loss | 3.92e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30500 |\n",
+ "| time_elapsed | 2171 |\n",
+ "| total_timesteps | 610000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.09 |\n",
+ "| explained_variance | 0.83784264 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30499 |\n",
+ "| policy_loss | -0.00721 |\n",
+ "| std | 0.676 |\n",
+ "| value_loss | 6.27e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30600 |\n",
+ "| time_elapsed | 2181 |\n",
+ "| total_timesteps | 612000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.09 |\n",
+ "| explained_variance | 0.94835174 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30599 |\n",
+ "| policy_loss | 0.00533 |\n",
+ "| std | 0.675 |\n",
+ "| value_loss | 2.49e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30700 |\n",
+ "| time_elapsed | 2187 |\n",
+ "| total_timesteps | 614000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.08 |\n",
+ "| explained_variance | 0.7656373 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30699 |\n",
+ "| policy_loss | 0.00183 |\n",
+ "| std | 0.674 |\n",
+ "| value_loss | 1.24e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30800 |\n",
+ "| time_elapsed | 2193 |\n",
+ "| total_timesteps | 616000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.07 |\n",
+ "| explained_variance | 0.96022564 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30799 |\n",
+ "| policy_loss | 0.00383 |\n",
+ "| std | 0.671 |\n",
+ "| value_loss | 2.81e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 30900 |\n",
+ "| time_elapsed | 2199 |\n",
+ "| total_timesteps | 618000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.07 |\n",
+ "| explained_variance | 0.79067695 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30899 |\n",
+ "| policy_loss | -0.00153 |\n",
+ "| std | 0.672 |\n",
+ "| value_loss | 1.15e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31000 |\n",
+ "| time_elapsed | 2206 |\n",
+ "| total_timesteps | 620000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.05 |\n",
+ "| explained_variance | 0.9759757 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 30999 |\n",
+ "| policy_loss | 0.000628 |\n",
+ "| std | 0.668 |\n",
+ "| value_loss | 1.7e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31100 |\n",
+ "| time_elapsed | 2216 |\n",
+ "| total_timesteps | 622000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.06 |\n",
+ "| explained_variance | 0.9674546 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31099 |\n",
+ "| policy_loss | 0.000612 |\n",
+ "| std | 0.67 |\n",
+ "| value_loss | 1.2e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31200 |\n",
+ "| time_elapsed | 2222 |\n",
+ "| total_timesteps | 624000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.06 |\n",
+ "| explained_variance | 0.8752002 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31199 |\n",
+ "| policy_loss | 0.00171 |\n",
+ "| std | 0.67 |\n",
+ "| value_loss | 7.12e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31300 |\n",
+ "| time_elapsed | 2229 |\n",
+ "| total_timesteps | 626000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.92981267 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31299 |\n",
+ "| policy_loss | -0.000362 |\n",
+ "| std | 0.666 |\n",
+ "| value_loss | 5.16e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31400 |\n",
+ "| time_elapsed | 2235 |\n",
+ "| total_timesteps | 628000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.53731203 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31399 |\n",
+ "| policy_loss | -0.0292 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 6.17e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 31500 |\n",
+ "| time_elapsed | 2241 |\n",
+ "| total_timesteps | 630000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.9594686 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31499 |\n",
+ "| policy_loss | -8.82e-05 |\n",
+ "| std | 0.667 |\n",
+ "| value_loss | 2.16e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31600 |\n",
+ "| time_elapsed | 2251 |\n",
+ "| total_timesteps | 632000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.97942924 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31599 |\n",
+ "| policy_loss | 0.0021 |\n",
+ "| std | 0.667 |\n",
+ "| value_loss | 1.62e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31700 |\n",
+ "| time_elapsed | 2257 |\n",
+ "| total_timesteps | 634000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.8932128 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31699 |\n",
+ "| policy_loss | -0.00245 |\n",
+ "| std | 0.667 |\n",
+ "| value_loss | 9.37e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 31800 |\n",
+ "| time_elapsed | 2263 |\n",
+ "| total_timesteps | 636000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.01 |\n",
+ "| explained_variance | 0.83874965 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31799 |\n",
+ "| policy_loss | 0.00601 |\n",
+ "| std | 0.663 |\n",
+ "| value_loss | 4.34e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 31900 |\n",
+ "| time_elapsed | 2269 |\n",
+ "| total_timesteps | 638000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.01 |\n",
+ "| explained_variance | 0.9876569 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31899 |\n",
+ "| policy_loss | -0.00323 |\n",
+ "| std | 0.662 |\n",
+ "| value_loss | 1.62e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32000 |\n",
+ "| time_elapsed | 2276 |\n",
+ "| total_timesteps | 640000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4 |\n",
+ "| explained_variance | 0.9645224 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 31999 |\n",
+ "| policy_loss | -0.00171 |\n",
+ "| std | 0.66 |\n",
+ "| value_loss | 1.17e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 32100 |\n",
+ "| time_elapsed | 2286 |\n",
+ "| total_timesteps | 642000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.99 |\n",
+ "| explained_variance | 0.9889257 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32099 |\n",
+ "| policy_loss | 0.00162 |\n",
+ "| std | 0.659 |\n",
+ "| value_loss | 8.48e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 32200 |\n",
+ "| time_elapsed | 2292 |\n",
+ "| total_timesteps | 644000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.98 |\n",
+ "| explained_variance | 0.8465643 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32199 |\n",
+ "| policy_loss | 8.41e-05 |\n",
+ "| std | 0.657 |\n",
+ "| value_loss | 7.93e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32300 |\n",
+ "| time_elapsed | 2298 |\n",
+ "| total_timesteps | 646000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.97 |\n",
+ "| explained_variance | 0.9307914 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32299 |\n",
+ "| policy_loss | 0.00455 |\n",
+ "| std | 0.655 |\n",
+ "| value_loss | 3.1e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32400 |\n",
+ "| time_elapsed | 2305 |\n",
+ "| total_timesteps | 648000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.98 |\n",
+ "| explained_variance | 0.9708448 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32399 |\n",
+ "| policy_loss | 0.00827 |\n",
+ "| std | 0.657 |\n",
+ "| value_loss | 6.87e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32500 |\n",
+ "| time_elapsed | 2311 |\n",
+ "| total_timesteps | 650000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.98 |\n",
+ "| explained_variance | 0.9863495 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32499 |\n",
+ "| policy_loss | 0.000939 |\n",
+ "| std | 0.657 |\n",
+ "| value_loss | 4.68e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 32600 |\n",
+ "| time_elapsed | 2321 |\n",
+ "| total_timesteps | 652000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.98 |\n",
+ "| explained_variance | 0.6925158 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32599 |\n",
+ "| policy_loss | 0.00332 |\n",
+ "| std | 0.657 |\n",
+ "| value_loss | 1.72e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 32700 |\n",
+ "| time_elapsed | 2327 |\n",
+ "| total_timesteps | 654000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4 |\n",
+ "| explained_variance | 0.9290561 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32699 |\n",
+ "| policy_loss | 0.00131 |\n",
+ "| std | 0.66 |\n",
+ "| value_loss | 1.3e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32800 |\n",
+ "| time_elapsed | 2334 |\n",
+ "| total_timesteps | 656000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.01 |\n",
+ "| explained_variance | 0.96650255 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32799 |\n",
+ "| policy_loss | 0.000317 |\n",
+ "| std | 0.661 |\n",
+ "| value_loss | 2.75e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 32900 |\n",
+ "| time_elapsed | 2340 |\n",
+ "| total_timesteps | 658000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.99 |\n",
+ "| explained_variance | 0.92857796 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32899 |\n",
+ "| policy_loss | -0.000708 |\n",
+ "| std | 0.658 |\n",
+ "| value_loss | 9.73e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 33000 |\n",
+ "| time_elapsed | 2347 |\n",
+ "| total_timesteps | 660000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.99 |\n",
+ "| explained_variance | 0.81251585 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 32999 |\n",
+ "| policy_loss | -0.000237 |\n",
+ "| std | 0.659 |\n",
+ "| value_loss | 2.52e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33100 |\n",
+ "| time_elapsed | 2357 |\n",
+ "| total_timesteps | 662000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4 |\n",
+ "| explained_variance | 0.8767034 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33099 |\n",
+ "| policy_loss | -0.00235 |\n",
+ "| std | 0.661 |\n",
+ "| value_loss | 2.81e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33200 |\n",
+ "| time_elapsed | 2363 |\n",
+ "| total_timesteps | 664000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.01 |\n",
+ "| explained_variance | 0.9719703 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33199 |\n",
+ "| policy_loss | 0.000635 |\n",
+ "| std | 0.661 |\n",
+ "| value_loss | 2.4e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33300 |\n",
+ "| time_elapsed | 2370 |\n",
+ "| total_timesteps | 666000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.02 |\n",
+ "| explained_variance | 0.9966027 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33299 |\n",
+ "| policy_loss | 0.00241 |\n",
+ "| std | 0.664 |\n",
+ "| value_loss | 6.73e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 33400 |\n",
+ "| time_elapsed | 2377 |\n",
+ "| total_timesteps | 668000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.9685441 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33399 |\n",
+ "| policy_loss | -0.000859 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 9.94e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 33500 |\n",
+ "| time_elapsed | 2384 |\n",
+ "| total_timesteps | 670000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.02 |\n",
+ "| explained_variance | 0.46576625 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33499 |\n",
+ "| policy_loss | -0.00206 |\n",
+ "| std | 0.664 |\n",
+ "| value_loss | 5.21e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33600 |\n",
+ "| time_elapsed | 2394 |\n",
+ "| total_timesteps | 672000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.02 |\n",
+ "| explained_variance | 0.79574704 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33599 |\n",
+ "| policy_loss | 0.00271 |\n",
+ "| std | 0.663 |\n",
+ "| value_loss | 1.14e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33700 |\n",
+ "| time_elapsed | 2401 |\n",
+ "| total_timesteps | 674000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.9673645 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33699 |\n",
+ "| policy_loss | -0.00301 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 1.64e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33800 |\n",
+ "| time_elapsed | 2408 |\n",
+ "| total_timesteps | 676000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.9559499 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33799 |\n",
+ "| policy_loss | 0.00404 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 2.61e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 33900 |\n",
+ "| time_elapsed | 2415 |\n",
+ "| total_timesteps | 678000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.03 |\n",
+ "| explained_variance | 0.8689276 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33899 |\n",
+ "| policy_loss | -0.00229 |\n",
+ "| std | 0.665 |\n",
+ "| value_loss | 3.25e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.7 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34000 |\n",
+ "| time_elapsed | 2421 |\n",
+ "| total_timesteps | 680000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.92665327 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 33999 |\n",
+ "| policy_loss | -8.24e-05 |\n",
+ "| std | 0.666 |\n",
+ "| value_loss | 7.25e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.7 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34100 |\n",
+ "| time_elapsed | 2432 |\n",
+ "| total_timesteps | 682000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.04 |\n",
+ "| explained_variance | 0.9745406 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34099 |\n",
+ "| policy_loss | -0.00209 |\n",
+ "| std | 0.666 |\n",
+ "| value_loss | 1.28e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34200 |\n",
+ "| time_elapsed | 2438 |\n",
+ "| total_timesteps | 684000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.02 |\n",
+ "| explained_variance | 0.8974001 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34199 |\n",
+ "| policy_loss | -0.00629 |\n",
+ "| std | 0.662 |\n",
+ "| value_loss | 6.24e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34300 |\n",
+ "| time_elapsed | 2445 |\n",
+ "| total_timesteps | 686000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -4.02 |\n",
+ "| explained_variance | 0.9367453 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34299 |\n",
+ "| policy_loss | -0.000219 |\n",
+ "| std | 0.663 |\n",
+ "| value_loss | 3.84e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34400 |\n",
+ "| time_elapsed | 2452 |\n",
+ "| total_timesteps | 688000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.99 |\n",
+ "| explained_variance | 0.9830403 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34399 |\n",
+ "| policy_loss | -0.000379 |\n",
+ "| std | 0.658 |\n",
+ "| value_loss | 8.19e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34500 |\n",
+ "| time_elapsed | 2458 |\n",
+ "| total_timesteps | 690000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.96 |\n",
+ "| explained_variance | 0.7310099 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34499 |\n",
+ "| policy_loss | -0.013 |\n",
+ "| std | 0.652 |\n",
+ "| value_loss | 2.52e-05 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34600 |\n",
+ "| time_elapsed | 2468 |\n",
+ "| total_timesteps | 692000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.94 |\n",
+ "| explained_variance | 0.975126 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34599 |\n",
+ "| policy_loss | 0.00412 |\n",
+ "| std | 0.65 |\n",
+ "| value_loss | 3.43e-06 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.9 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34700 |\n",
+ "| time_elapsed | 2475 |\n",
+ "| total_timesteps | 694000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.95 |\n",
+ "| explained_variance | -0.6563065 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34699 |\n",
+ "| policy_loss | 0.00318 |\n",
+ "| std | 0.651 |\n",
+ "| value_loss | 5.93e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34800 |\n",
+ "| time_elapsed | 2481 |\n",
+ "| total_timesteps | 696000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.92 |\n",
+ "| explained_variance | 0.97628003 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34799 |\n",
+ "| policy_loss | 0.000984 |\n",
+ "| std | 0.647 |\n",
+ "| value_loss | 1.03e-06 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 34900 |\n",
+ "| time_elapsed | 2488 |\n",
+ "| total_timesteps | 698000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.92 |\n",
+ "| explained_variance | 0.857736 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34899 |\n",
+ "| policy_loss | 0.000848 |\n",
+ "| std | 0.646 |\n",
+ "| value_loss | 8.62e-07 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35000 |\n",
+ "| time_elapsed | 2494 |\n",
+ "| total_timesteps | 700000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.9 |\n",
+ "| explained_variance | 0.9739769 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 34999 |\n",
+ "| policy_loss | 8.95e-05 |\n",
+ "| std | 0.643 |\n",
+ "| value_loss | 6.12e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35100 |\n",
+ "| time_elapsed | 2505 |\n",
+ "| total_timesteps | 702000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.91 |\n",
+ "| explained_variance | 0.9625768 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35099 |\n",
+ "| policy_loss | 0.000541 |\n",
+ "| std | 0.645 |\n",
+ "| value_loss | 2.43e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35200 |\n",
+ "| time_elapsed | 2511 |\n",
+ "| total_timesteps | 704000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.9 |\n",
+ "| explained_variance | 0.76877356 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35199 |\n",
+ "| policy_loss | -0.00575 |\n",
+ "| std | 0.642 |\n",
+ "| value_loss | 1.08e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35300 |\n",
+ "| time_elapsed | 2517 |\n",
+ "| total_timesteps | 706000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.89 |\n",
+ "| explained_variance | 0.84682435 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35299 |\n",
+ "| policy_loss | -0.0021 |\n",
+ "| std | 0.641 |\n",
+ "| value_loss | 3.52e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35400 |\n",
+ "| time_elapsed | 2524 |\n",
+ "| total_timesteps | 708000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.89 |\n",
+ "| explained_variance | 0.7140837 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35399 |\n",
+ "| policy_loss | -0.00864 |\n",
+ "| std | 0.642 |\n",
+ "| value_loss | 1.03e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35500 |\n",
+ "| time_elapsed | 2530 |\n",
+ "| total_timesteps | 710000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.9 |\n",
+ "| explained_variance | 0.9013965 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35499 |\n",
+ "| policy_loss | -0.00133 |\n",
+ "| std | 0.644 |\n",
+ "| value_loss | 5.98e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35600 |\n",
+ "| time_elapsed | 2541 |\n",
+ "| total_timesteps | 712000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.91 |\n",
+ "| explained_variance | 0.91648865 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35599 |\n",
+ "| policy_loss | -0.00166 |\n",
+ "| std | 0.644 |\n",
+ "| value_loss | 8.45e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35700 |\n",
+ "| time_elapsed | 2547 |\n",
+ "| total_timesteps | 714000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.89 |\n",
+ "| explained_variance | 0.78630555 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35699 |\n",
+ "| policy_loss | -0.0023 |\n",
+ "| std | 0.642 |\n",
+ "| value_loss | 3.13e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35800 |\n",
+ "| time_elapsed | 2554 |\n",
+ "| total_timesteps | 716000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.89 |\n",
+ "| explained_variance | 0.98644364 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35799 |\n",
+ "| policy_loss | -0.00116 |\n",
+ "| std | 0.642 |\n",
+ "| value_loss | 5.94e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 35900 |\n",
+ "| time_elapsed | 2561 |\n",
+ "| total_timesteps | 718000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.88 |\n",
+ "| explained_variance | 0.9824021 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35899 |\n",
+ "| policy_loss | -0.00393 |\n",
+ "| std | 0.639 |\n",
+ "| value_loss | 4.01e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36000 |\n",
+ "| time_elapsed | 2572 |\n",
+ "| total_timesteps | 720000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.87 |\n",
+ "| explained_variance | 0.9410251 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 35999 |\n",
+ "| policy_loss | -0.000505 |\n",
+ "| std | 0.638 |\n",
+ "| value_loss | 2.79e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36100 |\n",
+ "| time_elapsed | 2578 |\n",
+ "| total_timesteps | 722000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.87 |\n",
+ "| explained_variance | 0.9754824 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36099 |\n",
+ "| policy_loss | 0.000673 |\n",
+ "| std | 0.638 |\n",
+ "| value_loss | 3.42e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 36200 |\n",
+ "| time_elapsed | 2585 |\n",
+ "| total_timesteps | 724000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.86 |\n",
+ "| explained_variance | 0.84805125 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36199 |\n",
+ "| policy_loss | -0.00034 |\n",
+ "| std | 0.636 |\n",
+ "| value_loss | 2.15e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.3 |\n",
+ "| ep_rew_mean | -49.3 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 36300 |\n",
+ "| time_elapsed | 2592 |\n",
+ "| total_timesteps | 726000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.86 |\n",
+ "| explained_variance | 0.98801094 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36299 |\n",
+ "| policy_loss | -0.00244 |\n",
+ "| std | 0.637 |\n",
+ "| value_loss | 7.71e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 36400 |\n",
+ "| time_elapsed | 2598 |\n",
+ "| total_timesteps | 728000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.86 |\n",
+ "| explained_variance | 0.64739573 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36399 |\n",
+ "| policy_loss | -0.00118 |\n",
+ "| std | 0.636 |\n",
+ "| value_loss | 4.25e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36500 |\n",
+ "| time_elapsed | 2608 |\n",
+ "| total_timesteps | 730000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.84 |\n",
+ "| explained_variance | 0.9897441 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36499 |\n",
+ "| policy_loss | 0.00103 |\n",
+ "| std | 0.633 |\n",
+ "| value_loss | 8.12e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36600 |\n",
+ "| time_elapsed | 2615 |\n",
+ "| total_timesteps | 732000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.84 |\n",
+ "| explained_variance | 0.98654985 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36599 |\n",
+ "| policy_loss | -0.00401 |\n",
+ "| std | 0.634 |\n",
+ "| value_loss | 2.09e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36700 |\n",
+ "| time_elapsed | 2622 |\n",
+ "| total_timesteps | 734000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.84 |\n",
+ "| explained_variance | 0.98241895 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36699 |\n",
+ "| policy_loss | -0.00083 |\n",
+ "| std | 0.633 |\n",
+ "| value_loss | 1.19e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36800 |\n",
+ "| time_elapsed | 2629 |\n",
+ "| total_timesteps | 736000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.83 |\n",
+ "| explained_variance | 0.8045112 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36799 |\n",
+ "| policy_loss | 0.00678 |\n",
+ "| std | 0.632 |\n",
+ "| value_loss | 8.17e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 36900 |\n",
+ "| time_elapsed | 2636 |\n",
+ "| total_timesteps | 738000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.83 |\n",
+ "| explained_variance | 0.6432221 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36899 |\n",
+ "| policy_loss | -0.00117 |\n",
+ "| std | 0.632 |\n",
+ "| value_loss | 2.46e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37000 |\n",
+ "| time_elapsed | 2646 |\n",
+ "| total_timesteps | 740000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.81 |\n",
+ "| explained_variance | 0.89308023 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 36999 |\n",
+ "| policy_loss | -0.00348 |\n",
+ "| std | 0.629 |\n",
+ "| value_loss | 2.19e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.5 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37100 |\n",
+ "| time_elapsed | 2653 |\n",
+ "| total_timesteps | 742000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.81 |\n",
+ "| explained_variance | 0.97850627 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37099 |\n",
+ "| policy_loss | -0.000425 |\n",
+ "| std | 0.629 |\n",
+ "| value_loss | 4.43e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37200 |\n",
+ "| time_elapsed | 2660 |\n",
+ "| total_timesteps | 744000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.78 |\n",
+ "| explained_variance | 0.9655469 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37199 |\n",
+ "| policy_loss | 7.71e-05 |\n",
+ "| std | 0.624 |\n",
+ "| value_loss | 1.54e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37300 |\n",
+ "| time_elapsed | 2666 |\n",
+ "| total_timesteps | 746000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.78 |\n",
+ "| explained_variance | 0.92692417 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37299 |\n",
+ "| policy_loss | -0.00408 |\n",
+ "| std | 0.624 |\n",
+ "| value_loss | 5.12e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37400 |\n",
+ "| time_elapsed | 2673 |\n",
+ "| total_timesteps | 748000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.78 |\n",
+ "| explained_variance | 0.85534066 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37399 |\n",
+ "| policy_loss | -0.00534 |\n",
+ "| std | 0.624 |\n",
+ "| value_loss | 6.73e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37500 |\n",
+ "| time_elapsed | 2684 |\n",
+ "| total_timesteps | 750000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.77 |\n",
+ "| explained_variance | 0.91903675 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37499 |\n",
+ "| policy_loss | -0.00187 |\n",
+ "| std | 0.623 |\n",
+ "| value_loss | 2.31e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37600 |\n",
+ "| time_elapsed | 2690 |\n",
+ "| total_timesteps | 752000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.76 |\n",
+ "| explained_variance | 0.9927211 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37599 |\n",
+ "| policy_loss | 0.00225 |\n",
+ "| std | 0.62 |\n",
+ "| value_loss | 1.23e-06 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37700 |\n",
+ "| time_elapsed | 2697 |\n",
+ "| total_timesteps | 754000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.76 |\n",
+ "| explained_variance | 0.961677 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37699 |\n",
+ "| policy_loss | 0.00138 |\n",
+ "| std | 0.621 |\n",
+ "| value_loss | 1.04e-06 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37800 |\n",
+ "| time_elapsed | 2703 |\n",
+ "| total_timesteps | 756000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.76 |\n",
+ "| explained_variance | 0.8840703 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37799 |\n",
+ "| policy_loss | -0.00133 |\n",
+ "| std | 0.621 |\n",
+ "| value_loss | 5.31e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 37900 |\n",
+ "| time_elapsed | 2710 |\n",
+ "| total_timesteps | 758000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.76 |\n",
+ "| explained_variance | 0.9751732 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37899 |\n",
+ "| policy_loss | 0.0017 |\n",
+ "| std | 0.621 |\n",
+ "| value_loss | 8.42e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.9 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38000 |\n",
+ "| time_elapsed | 2720 |\n",
+ "| total_timesteps | 760000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.76 |\n",
+ "| explained_variance | 0.90713525 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 37999 |\n",
+ "| policy_loss | -0.000299 |\n",
+ "| std | 0.62 |\n",
+ "| value_loss | 2.18e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38100 |\n",
+ "| time_elapsed | 2727 |\n",
+ "| total_timesteps | 762000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.75 |\n",
+ "| explained_variance | 0.97773933 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38099 |\n",
+ "| policy_loss | 0.00071 |\n",
+ "| std | 0.62 |\n",
+ "| value_loss | 7.3e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.8 |\n",
+ "| ep_rew_mean | -46.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38200 |\n",
+ "| time_elapsed | 2733 |\n",
+ "| total_timesteps | 764000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.74 |\n",
+ "| explained_variance | 0.85500395 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38199 |\n",
+ "| policy_loss | -0.0188 |\n",
+ "| std | 0.616 |\n",
+ "| value_loss | 0.000115 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38300 |\n",
+ "| time_elapsed | 2740 |\n",
+ "| total_timesteps | 766000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.73 |\n",
+ "| explained_variance | 0.9707148 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38299 |\n",
+ "| policy_loss | -0.00259 |\n",
+ "| std | 0.616 |\n",
+ "| value_loss | 7.38e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38400 |\n",
+ "| time_elapsed | 2751 |\n",
+ "| total_timesteps | 768000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.73 |\n",
+ "| explained_variance | 0.9092056 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38399 |\n",
+ "| policy_loss | -0.00333 |\n",
+ "| std | 0.615 |\n",
+ "| value_loss | 4.96e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38500 |\n",
+ "| time_elapsed | 2757 |\n",
+ "| total_timesteps | 770000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.73 |\n",
+ "| explained_variance | 0.98466456 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38499 |\n",
+ "| policy_loss | 0.0108 |\n",
+ "| std | 0.616 |\n",
+ "| value_loss | 1.65e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38600 |\n",
+ "| time_elapsed | 2764 |\n",
+ "| total_timesteps | 772000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.73 |\n",
+ "| explained_variance | 0.4182393 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38599 |\n",
+ "| policy_loss | 0.0304 |\n",
+ "| std | 0.615 |\n",
+ "| value_loss | 0.000218 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38700 |\n",
+ "| time_elapsed | 2771 |\n",
+ "| total_timesteps | 774000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.72 |\n",
+ "| explained_variance | 0.86738527 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38699 |\n",
+ "| policy_loss | -0.00272 |\n",
+ "| std | 0.615 |\n",
+ "| value_loss | 3.13e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.6 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 38800 |\n",
+ "| time_elapsed | 2777 |\n",
+ "| total_timesteps | 776000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.69 |\n",
+ "| explained_variance | 0.92607296 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38799 |\n",
+ "| policy_loss | -0.00675 |\n",
+ "| std | 0.61 |\n",
+ "| value_loss | 1.55e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.2 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 38900 |\n",
+ "| time_elapsed | 2788 |\n",
+ "| total_timesteps | 778000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.68 |\n",
+ "| explained_variance | 0.3575865 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38899 |\n",
+ "| policy_loss | 0.105 |\n",
+ "| std | 0.61 |\n",
+ "| value_loss | 0.00223 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.2 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39000 |\n",
+ "| time_elapsed | 2795 |\n",
+ "| total_timesteps | 780000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.69 |\n",
+ "| explained_variance | 0.9598239 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 38999 |\n",
+ "| policy_loss | -0.00267 |\n",
+ "| std | 0.61 |\n",
+ "| value_loss | 5.62e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 39100 |\n",
+ "| time_elapsed | 2802 |\n",
+ "| total_timesteps | 782000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.69 |\n",
+ "| explained_variance | 0.8572078 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39099 |\n",
+ "| policy_loss | 0.00538 |\n",
+ "| std | 0.61 |\n",
+ "| value_loss | 5.86e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 39200 |\n",
+ "| time_elapsed | 2809 |\n",
+ "| total_timesteps | 784000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.68 |\n",
+ "| explained_variance | 0.86322427 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39199 |\n",
+ "| policy_loss | -0.000962 |\n",
+ "| std | 0.61 |\n",
+ "| value_loss | 1.02e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 39300 |\n",
+ "| time_elapsed | 2815 |\n",
+ "| total_timesteps | 786000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.67 |\n",
+ "| explained_variance | 0.98482674 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39299 |\n",
+ "| policy_loss | -0.00257 |\n",
+ "| std | 0.607 |\n",
+ "| value_loss | 2.04e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39400 |\n",
+ "| time_elapsed | 2826 |\n",
+ "| total_timesteps | 788000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.64 |\n",
+ "| explained_variance | 0.98814607 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39399 |\n",
+ "| policy_loss | -0.0132 |\n",
+ "| std | 0.603 |\n",
+ "| value_loss | 1.93e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 45.7 |\n",
+ "| ep_rew_mean | -45.6 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39500 |\n",
+ "| time_elapsed | 2832 |\n",
+ "| total_timesteps | 790000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.63 |\n",
+ "| explained_variance | 0.75104976 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39499 |\n",
+ "| policy_loss | -0.0149 |\n",
+ "| std | 0.601 |\n",
+ "| value_loss | 4.16e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39600 |\n",
+ "| time_elapsed | 2839 |\n",
+ "| total_timesteps | 792000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.61 |\n",
+ "| explained_variance | 0.9826381 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39599 |\n",
+ "| policy_loss | -0.00721 |\n",
+ "| std | 0.599 |\n",
+ "| value_loss | 8.65e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.9 |\n",
+ "| ep_rew_mean | -46.9 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39700 |\n",
+ "| time_elapsed | 2846 |\n",
+ "| total_timesteps | 794000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.59 |\n",
+ "| explained_variance | 0.91662145 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39699 |\n",
+ "| policy_loss | 0.00448 |\n",
+ "| std | 0.596 |\n",
+ "| value_loss | 5.06e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 39800 |\n",
+ "| time_elapsed | 2852 |\n",
+ "| total_timesteps | 796000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.59 |\n",
+ "| explained_variance | 0.97679144 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39799 |\n",
+ "| policy_loss | 0.00058 |\n",
+ "| std | 0.595 |\n",
+ "| value_loss | 1.8e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 39900 |\n",
+ "| time_elapsed | 2863 |\n",
+ "| total_timesteps | 798000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.58 |\n",
+ "| explained_variance | 0.99432683 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39899 |\n",
+ "| policy_loss | -0.00551 |\n",
+ "| std | 0.593 |\n",
+ "| value_loss | 2.67e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.2 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40000 |\n",
+ "| time_elapsed | 2870 |\n",
+ "| total_timesteps | 800000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.58 |\n",
+ "| explained_variance | 0.98825186 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 39999 |\n",
+ "| policy_loss | -0.002 |\n",
+ "| std | 0.594 |\n",
+ "| value_loss | 1.8e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.4 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40100 |\n",
+ "| time_elapsed | 2876 |\n",
+ "| total_timesteps | 802000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | 0.06861681 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40099 |\n",
+ "| policy_loss | 0.0233 |\n",
+ "| std | 0.592 |\n",
+ "| value_loss | 8.39e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40200 |\n",
+ "| time_elapsed | 2883 |\n",
+ "| total_timesteps | 804000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | 0.9904497 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40199 |\n",
+ "| policy_loss | -0.00605 |\n",
+ "| std | 0.592 |\n",
+ "| value_loss | 8.11e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.1 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40300 |\n",
+ "| time_elapsed | 2889 |\n",
+ "| total_timesteps | 806000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.55 |\n",
+ "| explained_variance | 0.95933515 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40299 |\n",
+ "| policy_loss | 0.00178 |\n",
+ "| std | 0.591 |\n",
+ "| value_loss | 6.86e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40400 |\n",
+ "| time_elapsed | 2899 |\n",
+ "| total_timesteps | 808000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.56 |\n",
+ "| explained_variance | 0.97932297 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40399 |\n",
+ "| policy_loss | 0.0017 |\n",
+ "| std | 0.591 |\n",
+ "| value_loss | 1.05e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40500 |\n",
+ "| time_elapsed | 2905 |\n",
+ "| total_timesteps | 810000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | 0.9931614 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40499 |\n",
+ "| policy_loss | 9.6e-05 |\n",
+ "| std | 0.593 |\n",
+ "| value_loss | 2.44e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40600 |\n",
+ "| time_elapsed | 2911 |\n",
+ "| total_timesteps | 812000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.58 |\n",
+ "| explained_variance | 0.72751546 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40599 |\n",
+ "| policy_loss | 0.00225 |\n",
+ "| std | 0.595 |\n",
+ "| value_loss | 4.93e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.6 |\n",
+ "| ep_rew_mean | -47.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 40700 |\n",
+ "| time_elapsed | 2917 |\n",
+ "| total_timesteps | 814000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.58 |\n",
+ "| explained_variance | 0.9609206 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40699 |\n",
+ "| policy_loss | 0.00484 |\n",
+ "| std | 0.595 |\n",
+ "| value_loss | 1.85e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 40800 |\n",
+ "| time_elapsed | 2923 |\n",
+ "| total_timesteps | 816000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.59 |\n",
+ "| explained_variance | 0.9776916 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40799 |\n",
+ "| policy_loss | -0.00179 |\n",
+ "| std | 0.596 |\n",
+ "| value_loss | 6.04e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 278 |\n",
+ "| iterations | 40900 |\n",
+ "| time_elapsed | 2932 |\n",
+ "| total_timesteps | 818000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.59 |\n",
+ "| explained_variance | 0.95068985 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40899 |\n",
+ "| policy_loss | 0.00321 |\n",
+ "| std | 0.596 |\n",
+ "| value_loss | 4.92e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41000 |\n",
+ "| time_elapsed | 2938 |\n",
+ "| total_timesteps | 820000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.59 |\n",
+ "| explained_variance | 0.94147617 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 40999 |\n",
+ "| policy_loss | 0.00213 |\n",
+ "| std | 0.596 |\n",
+ "| value_loss | 3.03e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41100 |\n",
+ "| time_elapsed | 2944 |\n",
+ "| total_timesteps | 822000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | -0.12963355 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41099 |\n",
+ "| policy_loss | 0.0255 |\n",
+ "| std | 0.593 |\n",
+ "| value_loss | 0.000291 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41200 |\n",
+ "| time_elapsed | 2950 |\n",
+ "| total_timesteps | 824000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | 0.5466497 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41199 |\n",
+ "| policy_loss | -0.000548 |\n",
+ "| std | 0.593 |\n",
+ "| value_loss | 1.48e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41300 |\n",
+ "| time_elapsed | 2956 |\n",
+ "| total_timesteps | 826000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.57 |\n",
+ "| explained_variance | 0.9527324 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41299 |\n",
+ "| policy_loss | 0.000149 |\n",
+ "| std | 0.593 |\n",
+ "| value_loss | 1.14e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41400 |\n",
+ "| time_elapsed | 2966 |\n",
+ "| total_timesteps | 828000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.53 |\n",
+ "| explained_variance | 0.8947157 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41399 |\n",
+ "| policy_loss | -0.000977 |\n",
+ "| std | 0.588 |\n",
+ "| value_loss | 3.34e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41500 |\n",
+ "| time_elapsed | 2972 |\n",
+ "| total_timesteps | 830000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.52 |\n",
+ "| explained_variance | 0.9557045 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41499 |\n",
+ "| policy_loss | -0.00129 |\n",
+ "| std | 0.586 |\n",
+ "| value_loss | 2.02e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.8 |\n",
+ "| ep_rew_mean | -48.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41600 |\n",
+ "| time_elapsed | 2978 |\n",
+ "| total_timesteps | 832000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.51 |\n",
+ "| explained_variance | 0.86566734 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41599 |\n",
+ "| policy_loss | -0.00558 |\n",
+ "| std | 0.585 |\n",
+ "| value_loss | 8.27e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41700 |\n",
+ "| time_elapsed | 2984 |\n",
+ "| total_timesteps | 834000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.9876392 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41699 |\n",
+ "| policy_loss | -0.00139 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 1.51e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41800 |\n",
+ "| time_elapsed | 2990 |\n",
+ "| total_timesteps | 836000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.94977826 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41799 |\n",
+ "| policy_loss | -0.00302 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 4.13e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 41900 |\n",
+ "| time_elapsed | 2996 |\n",
+ "| total_timesteps | 838000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | -0.17895353 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41899 |\n",
+ "| policy_loss | -0.000211 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 8.77e-06 |\n",
+ "---------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42000 |\n",
+ "| time_elapsed | 3005 |\n",
+ "| total_timesteps | 840000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | -16.5786 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 41999 |\n",
+ "| policy_loss | -0.00635 |\n",
+ "| std | 0.582 |\n",
+ "| value_loss | 1.32e-05 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42100 |\n",
+ "| time_elapsed | 3011 |\n",
+ "| total_timesteps | 842000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.51 |\n",
+ "| explained_variance | 0.9696886 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42099 |\n",
+ "| policy_loss | -0.000619 |\n",
+ "| std | 0.585 |\n",
+ "| value_loss | 9.09e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42200 |\n",
+ "| time_elapsed | 3017 |\n",
+ "| total_timesteps | 844000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.9301016 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42199 |\n",
+ "| policy_loss | -0.00234 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 2.62e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42300 |\n",
+ "| time_elapsed | 3023 |\n",
+ "| total_timesteps | 846000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.5389772 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42299 |\n",
+ "| policy_loss | 0.00977 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 2.91e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.4 |\n",
+ "| ep_rew_mean | -49.4 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42400 |\n",
+ "| time_elapsed | 3029 |\n",
+ "| total_timesteps | 848000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.5 |\n",
+ "| explained_variance | 0.97215235 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42399 |\n",
+ "| policy_loss | 0.000407 |\n",
+ "| std | 0.583 |\n",
+ "| value_loss | 4.69e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42500 |\n",
+ "| time_elapsed | 3039 |\n",
+ "| total_timesteps | 850000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.49 |\n",
+ "| explained_variance | 0.98842454 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42499 |\n",
+ "| policy_loss | -0.00272 |\n",
+ "| std | 0.581 |\n",
+ "| value_loss | 3.38e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.7 |\n",
+ "| ep_rew_mean | -46.6 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42600 |\n",
+ "| time_elapsed | 3045 |\n",
+ "| total_timesteps | 852000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.46 |\n",
+ "| explained_variance | 0.96813923 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42599 |\n",
+ "| policy_loss | -0.000448 |\n",
+ "| std | 0.577 |\n",
+ "| value_loss | 1.41e-05 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42700 |\n",
+ "| time_elapsed | 3051 |\n",
+ "| total_timesteps | 854000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.46 |\n",
+ "| explained_variance | 0.98248726 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42699 |\n",
+ "| policy_loss | -0.000134 |\n",
+ "| std | 0.577 |\n",
+ "| value_loss | 4.33e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 42800 |\n",
+ "| time_elapsed | 3058 |\n",
+ "| total_timesteps | 856000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.48 |\n",
+ "| explained_variance | 0.39102143 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42799 |\n",
+ "| policy_loss | 0.013 |\n",
+ "| std | 0.579 |\n",
+ "| value_loss | 0.000239 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 42900 |\n",
+ "| time_elapsed | 3063 |\n",
+ "| total_timesteps | 858000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.48 |\n",
+ "| explained_variance | 0.9779079 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42899 |\n",
+ "| policy_loss | 0.000379 |\n",
+ "| std | 0.58 |\n",
+ "| value_loss | 1.16e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 43000 |\n",
+ "| time_elapsed | 3073 |\n",
+ "| total_timesteps | 860000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.44 |\n",
+ "| explained_variance | 0.61227715 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 42999 |\n",
+ "| policy_loss | 0.0209 |\n",
+ "| std | 0.575 |\n",
+ "| value_loss | 7.89e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 43100 |\n",
+ "| time_elapsed | 3079 |\n",
+ "| total_timesteps | 862000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.42 |\n",
+ "| explained_variance | 0.9107274 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43099 |\n",
+ "| policy_loss | 0.00381 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 1.34e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 43200 |\n",
+ "| time_elapsed | 3085 |\n",
+ "| total_timesteps | 864000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.42 |\n",
+ "| explained_variance | 0.8499373 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43199 |\n",
+ "| policy_loss | 8.45e-05 |\n",
+ "| std | 0.572 |\n",
+ "| value_loss | 4.13e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43300 |\n",
+ "| time_elapsed | 3091 |\n",
+ "| total_timesteps | 866000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.41 |\n",
+ "| explained_variance | 0.9820314 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43299 |\n",
+ "| policy_loss | -0.000137 |\n",
+ "| std | 0.57 |\n",
+ "| value_loss | 1.03e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43400 |\n",
+ "| time_elapsed | 3098 |\n",
+ "| total_timesteps | 868000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.42 |\n",
+ "| explained_variance | 0.9914655 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43399 |\n",
+ "| policy_loss | 0.00113 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 1.02e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43500 |\n",
+ "| time_elapsed | 3104 |\n",
+ "| total_timesteps | 870000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.9956513 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43499 |\n",
+ "| policy_loss | -0.00204 |\n",
+ "| std | 0.568 |\n",
+ "| value_loss | 1.7e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 279 |\n",
+ "| iterations | 43600 |\n",
+ "| time_elapsed | 3115 |\n",
+ "| total_timesteps | 872000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.41 |\n",
+ "| explained_variance | 0.3942523 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43599 |\n",
+ "| policy_loss | 0.00775 |\n",
+ "| std | 0.57 |\n",
+ "| value_loss | 1.4e-05 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43700 |\n",
+ "| time_elapsed | 3121 |\n",
+ "| total_timesteps | 874000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.42 |\n",
+ "| explained_variance | 0.9855038 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43699 |\n",
+ "| policy_loss | 0.00744 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 6.56e-06 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.2 |\n",
+ "| ep_rew_mean | -47.2 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43800 |\n",
+ "| time_elapsed | 3127 |\n",
+ "| total_timesteps | 876000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.41 |\n",
+ "| explained_variance | 0.008309901 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43799 |\n",
+ "| policy_loss | 1.97 |\n",
+ "| std | 0.572 |\n",
+ "| value_loss | 3.85 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.7 |\n",
+ "| ep_rew_mean | -47.7 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 43900 |\n",
+ "| time_elapsed | 3133 |\n",
+ "| total_timesteps | 878000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.41 |\n",
+ "| explained_variance | 0.77352124 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43899 |\n",
+ "| policy_loss | -0.00231 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 7.97e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.7 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44000 |\n",
+ "| time_elapsed | 3139 |\n",
+ "| total_timesteps | 880000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.42 |\n",
+ "| explained_variance | 0.27089834 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 43999 |\n",
+ "| policy_loss | 0.0011 |\n",
+ "| std | 0.572 |\n",
+ "| value_loss | 3.99e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44100 |\n",
+ "| time_elapsed | 3148 |\n",
+ "| total_timesteps | 882000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.41 |\n",
+ "| explained_variance | 0.8952299 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44099 |\n",
+ "| policy_loss | -0.000954 |\n",
+ "| std | 0.571 |\n",
+ "| value_loss | 1.77e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44200 |\n",
+ "| time_elapsed | 3154 |\n",
+ "| total_timesteps | 884000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.99033046 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44199 |\n",
+ "| policy_loss | 0.00178 |\n",
+ "| std | 0.568 |\n",
+ "| value_loss | 1.12e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44300 |\n",
+ "| time_elapsed | 3160 |\n",
+ "| total_timesteps | 886000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.8583008 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44299 |\n",
+ "| policy_loss | -0.0018 |\n",
+ "| std | 0.568 |\n",
+ "| value_loss | 2.16e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.3 |\n",
+ "| ep_rew_mean | -48.3 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44400 |\n",
+ "| time_elapsed | 3167 |\n",
+ "| total_timesteps | 888000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.37 |\n",
+ "| explained_variance | 0.99091053 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44399 |\n",
+ "| policy_loss | 0.00132 |\n",
+ "| std | 0.564 |\n",
+ "| value_loss | 3.64e-06 |\n",
+ "--------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44500 |\n",
+ "| time_elapsed | 3173 |\n",
+ "| total_timesteps | 890000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.37 |\n",
+ "| explained_variance | 0.002827227 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44499 |\n",
+ "| policy_loss | 1.55 |\n",
+ "| std | 0.565 |\n",
+ "| value_loss | 3.89 |\n",
+ "---------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44600 |\n",
+ "| time_elapsed | 3182 |\n",
+ "| total_timesteps | 892000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.37 |\n",
+ "| explained_variance | 0.06923187 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44599 |\n",
+ "| policy_loss | -0.0339 |\n",
+ "| std | 0.565 |\n",
+ "| value_loss | 0.000171 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.3 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44700 |\n",
+ "| time_elapsed | 3188 |\n",
+ "| total_timesteps | 894000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.37 |\n",
+ "| explained_variance | -21.586582 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44699 |\n",
+ "| policy_loss | -0.0592 |\n",
+ "| std | 0.567 |\n",
+ "| value_loss | 0.00248 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.8 |\n",
+ "| ep_rew_mean | -47.8 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44800 |\n",
+ "| time_elapsed | 3194 |\n",
+ "| total_timesteps | 896000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.9866412 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44799 |\n",
+ "| policy_loss | 0.000868 |\n",
+ "| std | 0.568 |\n",
+ "| value_loss | 1.84e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 44900 |\n",
+ "| time_elapsed | 3200 |\n",
+ "| total_timesteps | 898000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.9051938 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44899 |\n",
+ "| policy_loss | 0.00217 |\n",
+ "| std | 0.567 |\n",
+ "| value_loss | 7.65e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45000 |\n",
+ "| time_elapsed | 3206 |\n",
+ "| total_timesteps | 900000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.39 |\n",
+ "| explained_variance | 0.6483333 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 44999 |\n",
+ "| policy_loss | 0.000492 |\n",
+ "| std | 0.568 |\n",
+ "| value_loss | 8.77e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45100 |\n",
+ "| time_elapsed | 3216 |\n",
+ "| total_timesteps | 902000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.36 |\n",
+ "| explained_variance | 0.9926092 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45099 |\n",
+ "| policy_loss | 0.000675 |\n",
+ "| std | 0.563 |\n",
+ "| value_loss | 6.19e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45200 |\n",
+ "| time_elapsed | 3223 |\n",
+ "| total_timesteps | 904000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.36 |\n",
+ "| explained_variance | 0.99412566 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45199 |\n",
+ "| policy_loss | 0.00288 |\n",
+ "| std | 0.564 |\n",
+ "| value_loss | 1.07e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45300 |\n",
+ "| time_elapsed | 3229 |\n",
+ "| total_timesteps | 906000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | 0.98885244 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45299 |\n",
+ "| policy_loss | 0.00442 |\n",
+ "| std | 0.562 |\n",
+ "| value_loss | 2.03e-06 |\n",
+ "--------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45400 |\n",
+ "| time_elapsed | 3235 |\n",
+ "| total_timesteps | 908000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.94177 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45399 |\n",
+ "| policy_loss | -0.00149 |\n",
+ "| std | 0.559 |\n",
+ "| value_loss | 4.01e-06 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45500 |\n",
+ "| time_elapsed | 3241 |\n",
+ "| total_timesteps | 910000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.9495938 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45499 |\n",
+ "| policy_loss | -0.00533 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 1.06e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45600 |\n",
+ "| time_elapsed | 3248 |\n",
+ "| total_timesteps | 912000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.91553783 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45599 |\n",
+ "| policy_loss | -0.000339 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 3.84e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45700 |\n",
+ "| time_elapsed | 3257 |\n",
+ "| total_timesteps | 914000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.32 |\n",
+ "| explained_variance | 0.39803714 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45699 |\n",
+ "| policy_loss | 0.0227 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 0.000166 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45800 |\n",
+ "| time_elapsed | 3264 |\n",
+ "| total_timesteps | 916000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.84038657 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45799 |\n",
+ "| policy_loss | -0.00389 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 1.08e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 45900 |\n",
+ "| time_elapsed | 3270 |\n",
+ "| total_timesteps | 918000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.9584281 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45899 |\n",
+ "| policy_loss | -0.00227 |\n",
+ "| std | 0.559 |\n",
+ "| value_loss | 1.63e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46000 |\n",
+ "| time_elapsed | 3276 |\n",
+ "| total_timesteps | 920000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.83016056 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 45999 |\n",
+ "| policy_loss | 0.00417 |\n",
+ "| std | 0.559 |\n",
+ "| value_loss | 3.25e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46100 |\n",
+ "| time_elapsed | 3283 |\n",
+ "| total_timesteps | 922000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.32 |\n",
+ "| explained_variance | 0.6802498 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46099 |\n",
+ "| policy_loss | 0.00223 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 3.24e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46200 |\n",
+ "| time_elapsed | 3292 |\n",
+ "| total_timesteps | 924000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.32 |\n",
+ "| explained_variance | 0.99392444 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46199 |\n",
+ "| policy_loss | 0.0017 |\n",
+ "| std | 0.557 |\n",
+ "| value_loss | 9.97e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.2 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46300 |\n",
+ "| time_elapsed | 3299 |\n",
+ "| total_timesteps | 926000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.32 |\n",
+ "| explained_variance | 0.11546546 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46299 |\n",
+ "| policy_loss | -0.01 |\n",
+ "| std | 0.558 |\n",
+ "| value_loss | 8.48e-05 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.7 |\n",
+ "| ep_rew_mean | -48.6 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46400 |\n",
+ "| time_elapsed | 3305 |\n",
+ "| total_timesteps | 928000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | 0.9007289 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46399 |\n",
+ "| policy_loss | 0.000824 |\n",
+ "| std | 0.561 |\n",
+ "| value_loss | 4.22e-07 |\n",
+ "-------------------------------------\n",
+ "---------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.2 |\n",
+ "| ep_rew_mean | -49.1 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46500 |\n",
+ "| time_elapsed | 3311 |\n",
+ "| total_timesteps | 930000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | -0.48946536 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46499 |\n",
+ "| policy_loss | 0.000401 |\n",
+ "| std | 0.561 |\n",
+ "| value_loss | 1.02e-06 |\n",
+ "---------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46600 |\n",
+ "| time_elapsed | 3318 |\n",
+ "| total_timesteps | 932000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.35 |\n",
+ "| explained_variance | 0.9060283 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46599 |\n",
+ "| policy_loss | 0.000685 |\n",
+ "| std | 0.561 |\n",
+ "| value_loss | 9.11e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 50 |\n",
+ "| ep_rew_mean | -50 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46700 |\n",
+ "| time_elapsed | 3327 |\n",
+ "| total_timesteps | 934000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.33 |\n",
+ "| explained_variance | 0.6672769 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46699 |\n",
+ "| policy_loss | -0.00052 |\n",
+ "| std | 0.559 |\n",
+ "| value_loss | 1.51e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46800 |\n",
+ "| time_elapsed | 3334 |\n",
+ "| total_timesteps | 936000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.31 |\n",
+ "| explained_variance | 0.7833716 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46799 |\n",
+ "| policy_loss | 0.000618 |\n",
+ "| std | 0.556 |\n",
+ "| value_loss | 9.54e-07 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 46900 |\n",
+ "| time_elapsed | 3340 |\n",
+ "| total_timesteps | 938000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.29 |\n",
+ "| explained_variance | 0.8197125 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46899 |\n",
+ "| policy_loss | -0.00295 |\n",
+ "| std | 0.554 |\n",
+ "| value_loss | 2.3e-05 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47000 |\n",
+ "| time_elapsed | 3347 |\n",
+ "| total_timesteps | 940000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.29 |\n",
+ "| explained_variance | 0.98894083 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 46999 |\n",
+ "| policy_loss | -8.07e-05 |\n",
+ "| std | 0.553 |\n",
+ "| value_loss | 1.47e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47100 |\n",
+ "| time_elapsed | 3353 |\n",
+ "| total_timesteps | 942000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.29 |\n",
+ "| explained_variance | 0.9561706 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47099 |\n",
+ "| policy_loss | -0.00176 |\n",
+ "| std | 0.553 |\n",
+ "| value_loss | 3.15e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.7 |\n",
+ "| ep_rew_mean | -49.7 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47200 |\n",
+ "| time_elapsed | 3362 |\n",
+ "| total_timesteps | 944000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.3 |\n",
+ "| explained_variance | 0.97445196 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47199 |\n",
+ "| policy_loss | -0.00119 |\n",
+ "| std | 0.555 |\n",
+ "| value_loss | 2.77e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.3 |\n",
+ "| ep_rew_mean | -47.2 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47300 |\n",
+ "| time_elapsed | 3369 |\n",
+ "| total_timesteps | 946000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.75822085 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47299 |\n",
+ "| policy_loss | 0.00987 |\n",
+ "| std | 0.551 |\n",
+ "| value_loss | 0.000115 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 46.6 |\n",
+ "| ep_rew_mean | -46.6 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47400 |\n",
+ "| time_elapsed | 3375 |\n",
+ "| total_timesteps | 948000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.9148364 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47399 |\n",
+ "| policy_loss | 0.00772 |\n",
+ "| std | 0.549 |\n",
+ "| value_loss | 8.81e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47500 |\n",
+ "| time_elapsed | 3381 |\n",
+ "| total_timesteps | 950000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.96930254 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47499 |\n",
+ "| policy_loss | -9.47e-06 |\n",
+ "| std | 0.551 |\n",
+ "| value_loss | 1.26e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 47600 |\n",
+ "| time_elapsed | 3387 |\n",
+ "| total_timesteps | 952000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.7950085 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47599 |\n",
+ "| policy_loss | -0.00454 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 8.08e-06 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47700 |\n",
+ "| time_elapsed | 3397 |\n",
+ "| total_timesteps | 954000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.781541 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47699 |\n",
+ "| policy_loss | 0.000465 |\n",
+ "| std | 0.551 |\n",
+ "| value_loss | 2.91e-06 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47800 |\n",
+ "| time_elapsed | 3403 |\n",
+ "| total_timesteps | 956000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.9763136 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47799 |\n",
+ "| policy_loss | 0.000196 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 1.38e-06 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 47900 |\n",
+ "| time_elapsed | 3410 |\n",
+ "| total_timesteps | 958000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.28 |\n",
+ "| explained_variance | 0.961331 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47899 |\n",
+ "| policy_loss | -0.00369 |\n",
+ "| std | 0.552 |\n",
+ "| value_loss | 3.65e-06 |\n",
+ "------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48000 |\n",
+ "| time_elapsed | 3416 |\n",
+ "| total_timesteps | 960000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.94155157 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 47999 |\n",
+ "| policy_loss | -0.00508 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 6.79e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48100 |\n",
+ "| time_elapsed | 3422 |\n",
+ "| total_timesteps | 962000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.95676875 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48099 |\n",
+ "| policy_loss | 0.000654 |\n",
+ "| std | 0.551 |\n",
+ "| value_loss | 1.65e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 48200 |\n",
+ "| time_elapsed | 3431 |\n",
+ "| total_timesteps | 964000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.94901574 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48199 |\n",
+ "| policy_loss | 3.81e-06 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 3.41e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.1 |\n",
+ "| ep_rew_mean | -47 |\n",
+ "| time/ | |\n",
+ "| fps | 280 |\n",
+ "| iterations | 48300 |\n",
+ "| time_elapsed | 3437 |\n",
+ "| total_timesteps | 966000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.26 |\n",
+ "| explained_variance | 0.9897378 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48299 |\n",
+ "| policy_loss | 0.00123 |\n",
+ "| std | 0.55 |\n",
+ "| value_loss | 2.62e-07 |\n",
+ "-------------------------------------\n",
+ "------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48400 |\n",
+ "| time_elapsed | 3443 |\n",
+ "| total_timesteps | 968000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.27 |\n",
+ "| explained_variance | 0.967348 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48399 |\n",
+ "| policy_loss | 0.00125 |\n",
+ "| std | 0.552 |\n",
+ "| value_loss | 4.61e-07 |\n",
+ "------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48500 |\n",
+ "| time_elapsed | 3450 |\n",
+ "| total_timesteps | 970000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.25 |\n",
+ "| explained_variance | 0.9456974 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48499 |\n",
+ "| policy_loss | -0.00673 |\n",
+ "| std | 0.548 |\n",
+ "| value_loss | 5.76e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48600 |\n",
+ "| time_elapsed | 3456 |\n",
+ "| total_timesteps | 972000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.24 |\n",
+ "| explained_variance | 0.81393284 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48599 |\n",
+ "| policy_loss | 0.0018 |\n",
+ "| std | 0.548 |\n",
+ "| value_loss | 4.43e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 47.9 |\n",
+ "| ep_rew_mean | -47.9 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48700 |\n",
+ "| time_elapsed | 3462 |\n",
+ "| total_timesteps | 974000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.23 |\n",
+ "| explained_variance | 0.97745365 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48699 |\n",
+ "| policy_loss | 0.00484 |\n",
+ "| std | 0.547 |\n",
+ "| value_loss | 3.19e-06 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.4 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48800 |\n",
+ "| time_elapsed | 3472 |\n",
+ "| total_timesteps | 976000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.23 |\n",
+ "| explained_variance | 0.9833449 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48799 |\n",
+ "| policy_loss | 0.00261 |\n",
+ "| std | 0.546 |\n",
+ "| value_loss | 3.5e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -48.9 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 48900 |\n",
+ "| time_elapsed | 3478 |\n",
+ "| total_timesteps | 978000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.23 |\n",
+ "| explained_variance | 0.96366274 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48899 |\n",
+ "| policy_loss | 0.000591 |\n",
+ "| std | 0.546 |\n",
+ "| value_loss | 4.17e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.6 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49000 |\n",
+ "| time_elapsed | 3483 |\n",
+ "| total_timesteps | 980000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.22 |\n",
+ "| explained_variance | 0.9828003 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 48999 |\n",
+ "| policy_loss | -0.00217 |\n",
+ "| std | 0.545 |\n",
+ "| value_loss | 1.52e-06 |\n",
+ "-------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49100 |\n",
+ "| time_elapsed | 3489 |\n",
+ "| total_timesteps | 982000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.23 |\n",
+ "| explained_variance | 0.9969808 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49099 |\n",
+ "| policy_loss | 7.52e-05 |\n",
+ "| std | 0.546 |\n",
+ "| value_loss | 1.92e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49200 |\n",
+ "| time_elapsed | 3495 |\n",
+ "| total_timesteps | 984000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.22 |\n",
+ "| explained_variance | 0.98948133 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49199 |\n",
+ "| policy_loss | -0.00204 |\n",
+ "| std | 0.545 |\n",
+ "| value_loss | 8.27e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49300 |\n",
+ "| time_elapsed | 3505 |\n",
+ "| total_timesteps | 986000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.21 |\n",
+ "| explained_variance | 0.97018635 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49299 |\n",
+ "| policy_loss | -0.000292 |\n",
+ "| std | 0.544 |\n",
+ "| value_loss | 4.47e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.1 |\n",
+ "| ep_rew_mean | -48.1 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49400 |\n",
+ "| time_elapsed | 3511 |\n",
+ "| total_timesteps | 988000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.21 |\n",
+ "| explained_variance | 0.9661637 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49399 |\n",
+ "| policy_loss | -0.00206 |\n",
+ "| std | 0.544 |\n",
+ "| value_loss | 9.4e-06 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49500 |\n",
+ "| time_elapsed | 3518 |\n",
+ "| total_timesteps | 990000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.2 |\n",
+ "| explained_variance | 0.82379425 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49499 |\n",
+ "| policy_loss | 0.00211 |\n",
+ "| std | 0.543 |\n",
+ "| value_loss | 2.47e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49 |\n",
+ "| ep_rew_mean | -49 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49600 |\n",
+ "| time_elapsed | 3524 |\n",
+ "| total_timesteps | 992000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.19 |\n",
+ "| explained_variance | 0.99219644 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49599 |\n",
+ "| policy_loss | -0.00165 |\n",
+ "| std | 0.542 |\n",
+ "| value_loss | 4.1e-07 |\n",
+ "--------------------------------------\n",
+ "-------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 49.5 |\n",
+ "| ep_rew_mean | -49.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49700 |\n",
+ "| time_elapsed | 3530 |\n",
+ "| total_timesteps | 994000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.2 |\n",
+ "| explained_variance | 0.9896941 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49699 |\n",
+ "| policy_loss | 0.000546 |\n",
+ "| std | 0.543 |\n",
+ "| value_loss | 1.62e-07 |\n",
+ "-------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48.5 |\n",
+ "| ep_rew_mean | -48.5 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49800 |\n",
+ "| time_elapsed | 3540 |\n",
+ "| total_timesteps | 996000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.18 |\n",
+ "| explained_variance | 0.99164146 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49799 |\n",
+ "| policy_loss | 0.000225 |\n",
+ "| std | 0.54 |\n",
+ "| value_loss | 4.13e-07 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 49900 |\n",
+ "| time_elapsed | 3545 |\n",
+ "| total_timesteps | 998000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.18 |\n",
+ "| explained_variance | 0.92336273 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49899 |\n",
+ "| policy_loss | -0.00245 |\n",
+ "| std | 0.54 |\n",
+ "| value_loss | 8.85e-06 |\n",
+ "--------------------------------------\n",
+ "--------------------------------------\n",
+ "| rollout/ | |\n",
+ "| ep_len_mean | 48 |\n",
+ "| ep_rew_mean | -48 |\n",
+ "| time/ | |\n",
+ "| fps | 281 |\n",
+ "| iterations | 50000 |\n",
+ "| time_elapsed | 3551 |\n",
+ "| total_timesteps | 1000000 |\n",
+ "| train/ | |\n",
+ "| entropy_loss | -3.18 |\n",
+ "| explained_variance | 0.95652837 |\n",
+ "| learning_rate | 0.0007 |\n",
+ "| n_updates | 49999 |\n",
+ "| policy_loss | -0.00401 |\n",
+ "| std | 0.54 |\n",
+ "| value_loss | 3.22e-06 |\n",
+ "--------------------------------------\n",
+ "argv[0]=--background_color_red=0.8745098114013672\n",
+ "argv[1]=--background_color_green=0.21176470816135406\n",
+ "argv[2]=--background_color_blue=0.1764705926179886\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit6/venv-u6/lib/python3.10/site-packages/stable_baselines3/common/evaluation.py:67: UserWarning: Evaluation environment is not wrapped with a ``Monitor`` wrapper. This may result in reporting modified episode lengths and rewards, if other wrappers happen to modify these. Consider wrapping environment first with ``Monitor`` wrapper.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean reward = -45.00 +/- 15.00\n",
+ "\u001b[38;5;4mℹ This function will save, evaluate, generate a video of your agent,\n",
+ "create a model card and push everything to the hub. It might take up to 1min.\n",
+ "This is a work in progress: if you encounter a bug, please open an issue.\u001b[0m\n",
+ "Saving video to /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n",
+ "MoviePy - Building video /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4.\n",
+ "MoviePy - Writing video /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ " \r"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MoviePy - Done !\n",
+ "MoviePy - video ready /tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n",
+ " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n",
+ " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n",
+ " libavutil 58. 29.100 / 58. 29.100\n",
+ " libavcodec 60. 31.102 / 60. 31.102\n",
+ " libavformat 60. 16.100 / 60. 16.100\n",
+ " libavdevice 60. 3.100 / 60. 3.100\n",
+ " libavfilter 9. 12.100 / 9. 12.100\n",
+ " libswscale 7. 5.100 / 7. 5.100\n",
+ " libswresample 4. 12.100 / 4. 12.100\n",
+ " libpostproc 57. 3.100 / 57. 3.100\n",
+ "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmppn3lzgfu/-step-0-to-step-1000.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf61.1.100\n",
+ " Duration: 00:00:40.00, start: 0.000000, bitrate: 190 kb/s\n",
+ " Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 720x480, 187 kb/s, 25 fps, 25 tbr, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc61.3.100 libx264\n",
+ "Stream mapping:\n",
+ " Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\n",
+ "Press [q] to stop, [?] for help\n",
+ "[libx264 @ 0x5615de034a80] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n",
+ "[libx264 @ 0x5615de034a80] profile High, level 3.0, 4:2:0, 8-bit\n",
+ "[libx264 @ 0x5615de034a80] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=15 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n",
+ "Output #0, mp4, to '/tmp/tmp2wmkgvgp/replay.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2avc1mp41\n",
+ " encoder : Lavf60.16.100\n",
+ " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 720x480, q=2-31, 25 fps, 12800 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc60.31.102 libx264\n",
+ " Side data:\n",
+ " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n",
+ "[out#0/mp4 @ 0x5615ddfb0140] video:896kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.371167%\n",
+ "frame= 1000 fps=740 q=-1.0 Lsize= 908kB time=00:00:39.88 bitrate= 186.5kbits/s speed=29.5x \n",
+ "[libx264 @ 0x5615de034a80] frame I:4 Avg QP:17.50 size: 7558\n",
+ "[libx264 @ 0x5615de034a80] frame P:287 Avg QP:25.06 size: 1464\n",
+ "[libx264 @ 0x5615de034a80] frame B:709 Avg QP:25.16 size: 657\n",
+ "[libx264 @ 0x5615de034a80] consecutive B-frames: 2.6% 5.0% 10.8% 81.6%\n",
+ "[libx264 @ 0x5615de034a80] mb I I16..4: 3.1% 79.9% 17.0%\n",
+ "[libx264 @ 0x5615de034a80] mb P I16..4: 0.2% 1.6% 2.0% P16..4: 2.4% 1.4% 0.7% 0.0% 0.0% skip:91.7%\n",
+ "[libx264 @ 0x5615de034a80] mb B I16..4: 0.1% 0.2% 0.3% B16..8: 3.9% 1.3% 0.5% direct: 0.2% skip:93.5% L0:55.0% L1:42.9% BI: 2.2%\n",
+ "[libx264 @ 0x5615de034a80] 8x8 transform intra:46.3% inter:10.9%\n",
+ "[libx264 @ 0x5615de034a80] coded y,uvDC,uvAC intra: 32.2% 3.7% 0.9% inter: 0.9% 0.0% 0.0%\n",
+ "[libx264 @ 0x5615de034a80] i16 v,h,dc,p: 54% 24% 18% 4%\n",
+ "[libx264 @ 0x5615de034a80] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 41% 12% 44% 1% 1% 0% 1% 0% 1%\n",
+ "[libx264 @ 0x5615de034a80] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 19% 28% 4% 5% 5% 7% 3% 5%\n",
+ "[libx264 @ 0x5615de034a80] i8c dc,h,v,p: 93% 3% 4% 0%\n",
+ "[libx264 @ 0x5615de034a80] Weighted P-Frames: Y:0.0% UV:0.0%\n",
+ "[libx264 @ 0x5615de034a80] ref P L0: 48.4% 5.0% 27.9% 18.8%\n",
+ "[libx264 @ 0x5615de034a80] ref B L0: 78.2% 14.5% 7.3%\n",
+ "[libx264 @ 0x5615de034a80] ref B L1: 96.4% 3.6%\n",
+ "[libx264 @ 0x5615de034a80] kb/s:183.28\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Pushing repo turbo-maikol/a2c-PandaPickAndPlace-v3 to the Hugging\n",
+ "Face Hub\u001b[0m\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Processing Files (0 / 0) : | | 0.00B / 0.00B \n",
+ "\u001b[A\n",
+ "Processing Files (1 / 1) : 0%| | 1.26kB / 1.17MB, ???B/s \n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (1 / 6) : 47%|████▋ | 545kB / 1.17MB, 680kB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (1 / 6) : 93%|█████████▎| 1.09MB / 1.17MB, 1.09MB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (6 / 6) : 100%|██████████| 1.17MB / 1.17MB, 837kB/s \n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\u001b[A\n",
+ "\n",
+ "\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\u001b[A\u001b[A\u001b[A\u001b[A\u001b[A\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "Processing Files (6 / 6) : 100%|██████████| 1.17MB / 1.17MB, 586kB/s \n",
+ "New Data Upload : 100%|██████████| 1.17MB / 1.17MB, 586kB/s \n",
+ " ...ckAndPlace-v3/pytorch_variables.pth: 100%|██████████| 1.26kB / 1.26kB \n",
+ " ...ickAndPlace-v3/policy.optimizer.pth: 100%|██████████| 55.8kB / 55.8kB \n",
+ " ...a2c-PandaPickAndPlace-v3/policy.pth: 100%|██████████| 53.7kB / 53.7kB \n",
+ " ...mkgvgp/a2c-PandaPickAndPlace-v3.zip: 100%|██████████| 129kB / 129kB \n",
+ " /tmp/tmp2wmkgvgp/replay.mp4 : 100%|██████████| 930kB / 930kB \n",
+ " /tmp/tmp2wmkgvgp/vec_normalize.pkl : 100%|██████████| 2.95kB / 2.95kB \n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:\n",
+ "https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3/tree/main/\u001b[0m\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "CommitInfo(commit_url='https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3/commit/457722bba273248332eadc56aa52d5aad99a7844', commit_message='Initial commit', commit_description='', oid='457722bba273248332eadc56aa52d5aad99a7844', pr_url=None, repo_url=RepoUrl('https://huggingface.co/turbo-maikol/a2c-PandaPickAndPlace-v3', endpoint='https://huggingface.co', repo_type='model', repo_id='turbo-maikol/a2c-PandaPickAndPlace-v3'), pr_revision=None, pr_num=None)"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
],
- "metadata": {
- "id": "G3xy3Nf3c2O1"
- }
+ "source": [
+ "# 1 2 3\n",
+ "env_id_new = \"PandaPickAndPlace-v3\"\n",
+ "env_new = make_vec_env(env_id_new, n_envs=4)\n",
+ "env_new = VecNormalize(env_new, norm_obs=True, norm_reward=True, clip_obs=10)\n",
+ "# 4\n",
+ "model_new = A2C(\"MultiInputPolicy\", env_new, verbose=1) # Create the A2C model and try to find the best parameters\n",
+ "# 5\n",
+ "model_new.learn(1_000_000)\n",
+ "# 6\n",
+ "model_name_new = f\"new-{env_id_new}\"\n",
+ "model_new.save(model_name_new)\n",
+ "env_new.save(\"vec_normalize_new.pkl\")\n",
+ "\n",
+ "\n",
+ "# 7\n",
+ "from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize\n",
+ "# Load the saved statistics\n",
+ "eval_env_new = DummyVecEnv([lambda: gym.make(f\"{env_id_new}\")])\n",
+ "eval_env_new = VecNormalize.load(\"vec_normalize_new.pkl\", eval_env_new)\n",
+ "# We need to override the render_mode\n",
+ "eval_env_new.render_mode = \"rgb_array\"\n",
+ "# do not update them at test time\n",
+ "eval_env_new.training = False\n",
+ "# reward normalization is not needed at test time\n",
+ "eval_env_new.norm_reward = False\n",
+ "# Load the agent\n",
+ "model = A2C.load(model_name_new)\n",
+ "\n",
+ "mean_reward, std_reward = evaluate_policy(model, eval_env_new)\n",
+ "\n",
+ "print(f\"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}\")\n",
+ "\n",
+ "\n",
+ "# 8\n",
+ "package_to_hub(\n",
+ " model=model,\n",
+ " model_name=f\"a2c-{env_id_new}\",\n",
+ " model_architecture=\"A2C\",\n",
+ " env_id=env_id_new,\n",
+ " eval_env=eval_env_new,\n",
+ " repo_id=f\"turbo-maikol/a2c-{env_id_new}\", # Change the username\n",
+ " commit_message=\"Initial commit\",\n",
+ ")"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "### Solution (optional)"
- ],
"metadata": {
"id": "sKGbFXZq9ikN"
- }
+ },
+ "source": [
+ "### Solution (optional)"
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "J-cC-Feg9iMm"
+ },
+ "outputs": [],
"source": [
"# 1 - 2\n",
"env_id = \"PandaPickAndPlace-v3\"\n",
@@ -735,15 +19577,15 @@
" verbose=1)\n",
"# 5\n",
"model.learn(1_000_000)"
- ],
- "metadata": {
- "id": "J-cC-Feg9iMm"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "-UnlKLmpg80p"
+ },
+ "outputs": [],
"source": [
"# 6\n",
"model_name = \"a2c-PandaPickAndPlace-v3\";\n",
@@ -779,22 +19621,48 @@
" repo_id=f\"ThomasSimonini/a2c-{env_id}\", # TODO: Change the username\n",
" commit_message=\"Initial commit\",\n",
")"
- ],
- "metadata": {
- "id": "-UnlKLmpg80p"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "usatLaZ8dM4P"
+ },
"source": [
"See you on Unit 7! 🔥\n",
"## Keep learning, stay awesome 🤗"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "collapsed_sections": [
+ "tF42HvI7-gs5"
],
- "metadata": {
- "id": "usatLaZ8dM4P"
- }
+ "include_colab_link": true,
+ "private_outputs": true,
+ "provenance": []
+ },
+ "gpuClass": "standard",
+ "kernelspec": {
+ "display_name": "venv-u6",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.18"
}
- ]
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
}
diff --git a/notebooks/unit8/unit8_part1.ipynb b/notebooks/unit8/unit8_part1.ipynb
index 653385b..3586798 100644
--- a/notebooks/unit8/unit8_part1.ipynb
+++ b/notebooks/unit8/unit8_part1.ipynb
@@ -3,8 +3,8 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -60,6 +60,9 @@
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "T6lIPYFghhYL"
+ },
"source": [
"## Objectives of this notebook 🏆\n",
"\n",
@@ -69,13 +72,13 @@
"- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.\n",
"\n",
"\n"
- ],
- "metadata": {
- "id": "T6lIPYFghhYL"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Wp-rD6Fuhq31"
+ },
"source": [
"## This notebook is from the Deep Reinforcement Learning Course\n",
"
\n",
@@ -90,82 +93,79 @@
"\n",
"\n",
"The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5"
- ],
- "metadata": {
- "id": "Wp-rD6Fuhq31"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "rasqqGQlhujA"
+ },
"source": [
"## Prerequisites 🏗️\n",
"Before diving into the notebook, you need to:\n",
"\n",
"🔲 📚 Study [PPO by reading Unit 8](https://huggingface.co/deep-rl-course/unit8/introduction) 🤗 "
- ],
- "metadata": {
- "id": "rasqqGQlhujA"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PUFfMGOih3CW"
+ },
"source": [
"To validate this hands-on for the [certification process](https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process), you need to push one model, we don't ask for a minimal result but we **advise you to try different hyperparameters settings to get better results**.\n",
"\n",
"If you don't find your model, **go to the bottom of the page and click on the refresh button**\n",
"\n",
"For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process"
- ],
- "metadata": {
- "id": "PUFfMGOih3CW"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PU4FVzaoM6fC"
+ },
"source": [
"## Set the GPU 💪\n",
"- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "PU4FVzaoM6fC"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "KV0NyFdQM9ZG"
+ },
"source": [
"- `Hardware Accelerator > GPU`\n",
"\n",
"
"
- ],
- "metadata": {
- "id": "KV0NyFdQM9ZG"
- }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "bTpYcVZVMzUI"
+ },
"source": [
"## Create a virtual display 🔽\n",
"\n",
"During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames). \n",
"\n",
"Hence the following cell will install the librairies and create and run a virtual screen 🖥"
- ],
- "metadata": {
- "id": "bTpYcVZVMzUI"
- }
+ ]
},
{
"cell_type": "code",
- "source": [
- "!pip install setuptools==65.5.0"
- ],
+ "execution_count": null,
"metadata": {
"id": "Fd731S8-NuJA"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!pip install setuptools==65.5.0"
+ ]
},
{
"cell_type": "code",
@@ -186,18 +186,18 @@
},
{
"cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ww5PQH1gNLI4"
+ },
+ "outputs": [],
"source": [
"# Virtual display\n",
"from pyvirtualdisplay import Display\n",
"\n",
"virtual_display = Display(visible=0, size=(1400, 900))\n",
"virtual_display.start()"
- ],
- "metadata": {
- "id": "ww5PQH1gNLI4"
- },
- "execution_count": null,
- "outputs": []
+ ]
},
{
"cell_type": "markdown",
@@ -211,17 +211,14 @@
},
{
"cell_type": "code",
- "source": [
- "!pip install gym==0.22\n",
- "!pip install imageio-ffmpeg\n",
- "!pip install huggingface_hub\n",
- "!pip install gym[box2d]==0.22"
- ],
+ "execution_count": null,
"metadata": {
"id": "9xZQFTPcsKUK"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "pip install gym==0.22 imageio-ffmpeg huggingface_hub gym[box2d]==0.22"
+ ]
},
{
"cell_type": "markdown",
@@ -266,7 +263,17 @@
},
"outputs": [],
"source": [
- "### Your code here:"
+ "### Your code here:\n",
+ "from ppo import "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "# EXECUTED CELLS TO UPLOAD MY MODEL TO HUGGING FACE"
]
},
{
@@ -307,7 +314,10 @@
"import imageio\n",
"\n",
"from wasabi import Printer\n",
- "msg = Printer()"
+ "msg = Printer()\n",
+ "\n",
+ "%load_ext autoreload\n",
+ "%autoreload 2"
]
},
{
@@ -319,18 +329,6 @@
"- Add new argument in `parse_args()` function to define the repo-id where we want to push the model."
]
},
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "iHQiqQEFn0QH"
- },
- "outputs": [],
- "source": [
- "# Adding HuggingFace argument\n",
- "parser.add_argument(\"--repo-id\", type=str, default=\"ThomasSimonini/ppo-CartPole-v1\", help=\"id of the model repository from the Hugging Face Hub {username/repo_name}\")"
- ]
- },
{
"cell_type": "markdown",
"metadata": {
@@ -452,17 +450,17 @@
" \"\"\"\n",
" episode_rewards = []\n",
" for episode in range(n_eval_episodes):\n",
- " state = env.reset()\n",
+ " state,_ = env.reset()\n",
" step = 0\n",
" done = False\n",
" total_rewards_ep = 0\n",
" \n",
" while done is False:\n",
" state = torch.Tensor(state).to(device)\n",
- " action, _, _, _ = policy.get_action_and_value(state)\n",
- " new_state, reward, done, info = env.step(action.cpu().numpy())\n",
+ " action, _, _, _ = policy.get_action_value(state)\n",
+ " new_state, reward, trunc, term, info = env.step(action.cpu().numpy())\n",
" total_rewards_ep += reward \n",
- " if done:\n",
+ " if trunc or term:\n",
" break\n",
" state = new_state\n",
" episode_rewards.append(total_rewards_ep)\n",
@@ -474,16 +472,16 @@
"\n",
"def record_video(env, policy, out_directory, fps=30):\n",
" images = [] \n",
- " done = False\n",
- " state = env.reset()\n",
- " img = env.render(mode='rgb_array')\n",
+ " trunc, term = False, False\n",
+ " state, _= env.reset()\n",
+ " img = env.render()\n",
" images.append(img)\n",
- " while not done:\n",
+ " while not trunc or term:\n",
" state = torch.Tensor(state).to(device)\n",
" # Take the action (index) that have the maximum expected future reward given that state\n",
- " action, _, _, _ = policy.get_action_and_value(state)\n",
- " state, reward, done, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n",
- " img = env.render(mode='rgb_array')\n",
+ " action, _, _, _ = policy.get_action_value(state)\n",
+ " state, reward, trunc, term, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n",
+ " img = env.render()\n",
" images.append(img)\n",
" imageio.mimsave(out_directory, [np.array(img) for i, img in enumerate(images)], fps=fps)\n",
"\n",
@@ -603,6 +601,36 @@
"- Finally, we call this function at the end of the PPO training"
]
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "args_repo_id = \"turbo-maikol/rl-course-unit8-ppo-LunarLander-v2\"\n",
+ "args_env_id = \"LunarLander-v3\"\n",
+ "run_name = \"LunarLander-HF\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from src.utils.model_utils import load_agent\n",
+ "from src.config import Configuration\n",
+ "\n",
+ "CONFIG = Configuration(\n",
+ " MODELS=\"../../rl-module/models\",\n",
+ " exp_name=\"lunar-lander-hf-V2\",\n",
+ " env_id = args_env_id\n",
+ ")\n",
+ "agent = load_agent(CONFIG)\n",
+ "\n",
+ "device = CONFIG.device"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -611,17 +639,26 @@
},
"outputs": [],
"source": [
+ "import gymnasium as gym\n",
+ "import torch\n",
"# Create the evaluation environment\n",
- "eval_env = gym.make(args.env_id)\n",
+ "eval_env = gym.make(args_env_id, render_mode=\"rgb_array\")\n",
"\n",
- "package_to_hub(repo_id = args.repo_id,\n",
+ "package_to_hub(repo_id = args_repo_id,\n",
" model = agent, # The model we want to save\n",
- " hyperparameters = args,\n",
- " eval_env = gym.make(args.env_id),\n",
+ " hyperparameters = CONFIG,\n",
+ " eval_env = eval_env,\n",
" logs= f\"runs/{run_name}\",\n",
" )"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "----"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {
@@ -647,7 +684,7 @@
"import time\n",
"from distutils.util import strtobool\n",
"\n",
- "import gym\n",
+ "import gymnasium as gym\n",
"import numpy as np\n",
"import torch\n",
"import torch.nn as nn\n",
@@ -840,7 +877,7 @@
" \n",
" while done is False:\n",
" state = torch.Tensor(state).to(device)\n",
- " action, _, _, _ = policy.get_action_and_value(state)\n",
+ " action, _, _, _ = policy.get_action_value(state)\n",
" new_state, reward, done, info = env.step(action.cpu().numpy())\n",
" total_rewards_ep += reward \n",
" if done:\n",
@@ -862,7 +899,7 @@
" while not done:\n",
" state = torch.Tensor(state).to(device)\n",
" # Take the action (index) that have the maximum expected future reward given that state\n",
- " action, _, _, _ = policy.get_action_and_value(state)\n",
+ " action, _, _, _ = policy.get_action_value(state)\n",
" state, reward, done, info = env.step(action.cpu().numpy()) # We directly put next_state = state for recording logic\n",
" img = env.render(mode='rgb_array')\n",
" images.append(img)\n",
@@ -1013,7 +1050,7 @@
" def get_value(self, x):\n",
" return self.critic(x)\n",
"\n",
- " def get_action_and_value(self, x, action=None):\n",
+ " def get_action_value(self, x, action=None):\n",
" logits = self.actor(x)\n",
" probs = Categorical(logits=logits)\n",
" if action is None:\n",
@@ -1023,7 +1060,7 @@
"\n",
"if __name__ == \"__main__\":\n",
" args = parse_args()\n",
- " run_name = f\"{args.env_id}__{args.exp_name}__{args.seed}__{int(time.time())}\"\n",
+ " run_name = f\"{args_env_id}__{args.exp_name}__{args.seed}__{int(time.time())}\"\n",
" if args.track:\n",
" import wandb\n",
"\n",
@@ -1052,7 +1089,7 @@
"\n",
" # env setup\n",
" envs = gym.vector.SyncVectorEnv(\n",
- " [make_env(args.env_id, args.seed + i, i, args.capture_video, run_name) for i in range(args.num_envs)]\n",
+ " [make_env(args_env_id, args.seed + i, i, args.capture_video, run_name) for i in range(args.num_envs)]\n",
" )\n",
" assert isinstance(envs.single_action_space, gym.spaces.Discrete), \"only discrete action space is supported\"\n",
"\n",
@@ -1088,7 +1125,7 @@
"\n",
" # ALGO LOGIC: action logic\n",
" with torch.no_grad():\n",
- " action, logprob, _, value = agent.get_action_and_value(next_obs)\n",
+ " action, logprob, _, value = agent.get_action_value(next_obs)\n",
" values[step] = value.flatten()\n",
" actions[step] = action\n",
" logprobs[step] = logprob\n",
@@ -1150,7 +1187,7 @@
" end = start + args.minibatch_size\n",
" mb_inds = b_inds[start:end]\n",
"\n",
- " _, newlogprob, entropy, newvalue = agent.get_action_and_value(b_obs[mb_inds], b_actions.long()[mb_inds])\n",
+ " _, newlogprob, entropy, newvalue = agent.get_action_value(b_obs[mb_inds], b_actions.long()[mb_inds])\n",
" logratio = newlogprob - b_logprobs[mb_inds]\n",
" ratio = logratio.exp()\n",
"\n",
@@ -1216,12 +1253,12 @@
" writer.close()\n",
"\n",
" # Create the evaluation environment\n",
- " eval_env = gym.make(args.env_id)\n",
+ " eval_env = gym.make(args_env_id)\n",
"\n",
- " package_to_hub(repo_id = args.repo_id,\n",
+ " package_to_hub(repo_id = args_repo_id,\n",
" model = agent, # The model we want to save\n",
" hyperparameters = args,\n",
- " eval_env = gym.make(args.env_id),\n",
+ " eval_env = gym.make(args_env_id),\n",
" logs= f\"runs/{run_name}\",\n",
" )\n",
" "
@@ -1290,21 +1327,21 @@
},
{
"cell_type": "markdown",
- "source": [
- "
"
- ],
"metadata": {
"id": "Sq0My0LOjPYR"
- }
+ },
+ "source": [
+ "
"
+ ]
},
{
"cell_type": "markdown",
- "source": [
- "
"
- ],
"metadata": {
"id": "A8C-Q5ZyjUe3"
- }
+ },
+ "source": [
+ "
"
+ ]
},
{
"cell_type": "markdown",
@@ -1319,14 +1356,14 @@
},
{
"cell_type": "code",
- "source": [
- "!python ppo.py --env-id=\"LunarLander-v2\" --repo-id=\"YOUR_REPO_ID\" --total-timesteps=50000"
- ],
+ "execution_count": null,
"metadata": {
"id": "KXLih6mKseBs"
},
- "execution_count": null,
- "outputs": []
+ "outputs": [],
+ "source": [
+ "!python ppo.py --env-id=\"LunarLander-v2\" --repo-id=\"YOUR_REPO_ID\" --total-timesteps=50000"
+ ]
},
{
"cell_type": "markdown",
@@ -1350,22 +1387,32 @@
}
],
"metadata": {
+ "accelerator": "GPU",
"colab": {
- "private_outputs": true,
- "provenance": [],
"history_visible": true,
- "include_colab_link": true
+ "include_colab_link": true,
+ "private_outputs": true,
+ "provenance": []
},
"gpuClass": "standard",
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "venv",
+ "language": "python",
"name": "python3"
},
"language_info": {
- "name": "python"
- },
- "accelerator": "GPU"
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.18"
+ }
},
"nbformat": 4,
"nbformat_minor": 0
-}
\ No newline at end of file
+}
diff --git a/notebooks/unit8/unit8_part2.ipynb b/notebooks/unit8/unit8_part2.ipynb
index 7c38b10..59eb35b 100644
--- a/notebooks/unit8/unit8_part2.ipynb
+++ b/notebooks/unit8/unit8_part2.ipynb
@@ -3,8 +3,8 @@
{
"cell_type": "markdown",
"metadata": {
- "id": "view-in-github",
- "colab_type": "text"
+ "colab_type": "text",
+ "id": "view-in-github"
},
"source": [
"
"
@@ -244,21 +244,9 @@
"source": [
"# install python libraries\n",
"# thanks toinsson\n",
- "!pip install faster-fifo==1.4.2\n",
- "!pip install vizdoom"
+ "!pip install faster-fifo==1.4.2 vizdoom sample-factory==2.1.1"
]
},
- {
- "cell_type": "code",
- "source": [
- "!pip install sample-factory==2.1.1"
- ],
- "metadata": {
- "id": "alxUt7Au-O8e"
- },
- "execution_count": null,
- "outputs": []
- },
{
"cell_type": "markdown",
"metadata": {
@@ -270,7 +258,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"metadata": {
"id": "bCgZbeiavcDU"
},
@@ -358,11 +346,210 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"metadata": {
"id": "y_TeicMvyKHP"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\u001b[33m[2025-08-29 19:52:59,093][32845] Environment doom_basic already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,095][32845] Environment doom_two_colors_easy already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,096][32845] Environment doom_two_colors_hard already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,098][32845] Environment doom_dm already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,098][32845] Environment doom_dwango5 already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,099][32845] Environment doom_my_way_home_flat_actions already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,100][32845] Environment doom_defend_the_center_flat_actions already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,100][32845] Environment doom_my_way_home already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,101][32845] Environment doom_deadly_corridor already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,102][32845] Environment doom_defend_the_center already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,103][32845] Environment doom_defend_the_line already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering_supreme already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,105][32845] Environment doom_battle already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,106][32845] Environment doom_battle2 already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,106][32845] Environment doom_duel_bots already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,107][32845] Environment doom_deathmatch_bots already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,107][32845] Environment doom_duel already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,108][32845] Environment doom_deathmatch_full already registered, overwriting...\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,109][32845] Environment doom_benchmark already registered, overwriting...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:52:59,109][32845] register_encoder_factory: \u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:52:59,191][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:52:59,209][32845] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:52:59,224][32845] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:52:59,225][32845] Weights and Biases integration disabled\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:52:59,235][32845] Environment var CUDA_VISIBLE_DEVICES is 0\n",
+ "\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,426][43033] Doom resolution: 160x120, resize resolution: (128, 72)\u001b[0m\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.num_agents to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.num_agents` for environment variables or `env.get_wrapper_attr('num_agents')` that will search the reminding wrappers.\u001b[0m\n",
+ " logger.warn(\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.is_multiagent to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.is_multiagent` for environment variables or `env.get_wrapper_attr('is_multiagent')` that will search the reminding wrappers.\u001b[0m\n",
+ " logger.warn(\n",
+ "\u001b[36m[2025-08-29 19:53:01,428][43033] Env info: EnvInfo(obs_space=Dict('obs': Box(0, 255, (3, 72, 128), uint8)), action_space=Discrete(5), num_agents=1, gpu_actions=False, gpu_observations=True, action_splits=None, all_discrete=None, frameskip=4, reward_shaping_scheme=None, env_info_protocol_version=1)\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,760][32845] Starting experiment with the following configuration:\n",
+ "help=False\n",
+ "algo=APPO\n",
+ "env=doom_health_gathering_supreme\n",
+ "experiment=default_experiment\n",
+ "train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir\n",
+ "restart_behavior=resume\n",
+ "device=gpu\n",
+ "seed=None\n",
+ "num_policies=1\n",
+ "async_rl=True\n",
+ "serial_mode=False\n",
+ "batched_sampling=False\n",
+ "num_batches_to_accumulate=2\n",
+ "worker_num_splits=2\n",
+ "policy_workers_per_policy=1\n",
+ "max_policy_lag=1000\n",
+ "num_workers=10\n",
+ "num_envs_per_worker=8\n",
+ "batch_size=16384\n",
+ "num_batches_per_epoch=1\n",
+ "num_epochs=1\n",
+ "rollout=64\n",
+ "recurrence=32\n",
+ "shuffle_minibatches=False\n",
+ "gamma=0.99\n",
+ "reward_scale=1.0\n",
+ "reward_clip=1000.0\n",
+ "value_bootstrap=False\n",
+ "normalize_returns=True\n",
+ "exploration_loss_coeff=0.001\n",
+ "value_loss_coeff=0.5\n",
+ "kl_loss_coeff=0.0\n",
+ "exploration_loss=symmetric_kl\n",
+ "gae_lambda=0.95\n",
+ "ppo_clip_ratio=0.2\n",
+ "ppo_clip_value=0.2\n",
+ "with_vtrace=False\n",
+ "vtrace_rho=1.0\n",
+ "vtrace_c=1.0\n",
+ "optimizer=adam\n",
+ "adam_eps=1e-06\n",
+ "adam_beta1=0.9\n",
+ "adam_beta2=0.999\n",
+ "max_grad_norm=4.0\n",
+ "learning_rate=0.0002\n",
+ "lr_schedule=constant\n",
+ "lr_schedule_kl_threshold=0.008\n",
+ "lr_adaptive_min=1e-06\n",
+ "lr_adaptive_max=0.01\n",
+ "obs_subtract_mean=0.0\n",
+ "obs_scale=255.0\n",
+ "normalize_input=True\n",
+ "normalize_input_keys=None\n",
+ "decorrelate_experience_max_seconds=0\n",
+ "decorrelate_envs_on_one_worker=True\n",
+ "actor_worker_gpus=[]\n",
+ "set_workers_cpu_affinity=True\n",
+ "force_envs_single_thread=False\n",
+ "default_niceness=0\n",
+ "log_to_file=True\n",
+ "experiment_summaries_interval=10\n",
+ "flush_summaries_interval=30\n",
+ "stats_avg=100\n",
+ "summaries_use_frameskip=True\n",
+ "heartbeat_interval=20\n",
+ "heartbeat_reporting_interval=600\n",
+ "train_for_env_steps=30000000\n",
+ "train_for_seconds=10000000000\n",
+ "save_every_sec=120\n",
+ "keep_checkpoints=2\n",
+ "load_checkpoint_kind=latest\n",
+ "save_milestones_sec=-1\n",
+ "save_best_every_sec=5\n",
+ "save_best_metric=reward\n",
+ "save_best_after=100000\n",
+ "benchmark=False\n",
+ "encoder_mlp_layers=[512, 512]\n",
+ "encoder_conv_architecture=convnet_simple\n",
+ "encoder_conv_mlp_layers=[512]\n",
+ "use_rnn=True\n",
+ "rnn_size=512\n",
+ "rnn_type=gru\n",
+ "rnn_num_layers=1\n",
+ "decoder_mlp_layers=[]\n",
+ "nonlinearity=elu\n",
+ "policy_initialization=orthogonal\n",
+ "policy_init_gain=1.0\n",
+ "actor_critic_share_weights=True\n",
+ "adaptive_stddev=True\n",
+ "continuous_tanh_scale=0.0\n",
+ "initial_stddev=1.0\n",
+ "use_env_info_cache=False\n",
+ "env_gpu_actions=False\n",
+ "env_gpu_observations=True\n",
+ "env_frameskip=4\n",
+ "env_framestack=1\n",
+ "pixel_format=CHW\n",
+ "use_record_episode_statistics=False\n",
+ "with_wandb=False\n",
+ "wandb_user=None\n",
+ "wandb_project=sample_factory\n",
+ "wandb_group=None\n",
+ "wandb_job_type=SF\n",
+ "wandb_tags=[]\n",
+ "with_pbt=False\n",
+ "pbt_mix_policies_in_one_env=True\n",
+ "pbt_period_env_steps=5000000\n",
+ "pbt_start_mutation=20000000\n",
+ "pbt_replace_fraction=0.3\n",
+ "pbt_mutation_rate=0.15\n",
+ "pbt_replace_reward_gap=0.1\n",
+ "pbt_replace_reward_gap_absolute=1e-06\n",
+ "pbt_optimize_gamma=False\n",
+ "pbt_target_objective=true_objective\n",
+ "pbt_perturb_min=1.1\n",
+ "pbt_perturb_max=1.5\n",
+ "num_agents=-1\n",
+ "num_humans=0\n",
+ "num_bots=-1\n",
+ "start_bot_difficulty=None\n",
+ "timelimit=None\n",
+ "res_w=128\n",
+ "res_h=72\n",
+ "wide_aspect_ratio=False\n",
+ "eval_env_frameskip=1\n",
+ "fps=35\n",
+ "command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000\n",
+ "cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}\n",
+ "git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042\n",
+ "git_repo_name=git@github.com:huggingface/deep-rl-class.git\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,762][32845] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,831][32845] Rollout worker 0 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,832][32845] Rollout worker 1 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,832][32845] Rollout worker 2 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,833][32845] Rollout worker 3 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,833][32845] Rollout worker 4 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,834][32845] Rollout worker 5 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,836][32845] Rollout worker 6 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,836][32845] Rollout worker 7 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,837][32845] Rollout worker 8 uses device cpu\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:53:01,837][32845] Rollout worker 9 uses device cpu\u001b[0m\n"
+ ]
+ },
+ {
+ "ename": "KeyboardInterrupt",
+ "evalue": "",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+ "\u001b[31mKeyboardInterrupt\u001b[39m Traceback (most recent call last)",
+ "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[3]\u001b[39m\u001b[32m, line 31\u001b[39m\n\u001b[32m 6\u001b[39m env = \u001b[33m\"\u001b[39m\u001b[33mdoom_health_gathering_supreme\u001b[39m\u001b[33m\"\u001b[39m\n\u001b[32m 7\u001b[39m cfg = parse_vizdoom_cfg(argv=[\n\u001b[32m 8\u001b[39m \u001b[33mf\u001b[39m\u001b[33m\"\u001b[39m\u001b[33m--env=\u001b[39m\u001b[38;5;132;01m{\u001b[39;00menv\u001b[38;5;132;01m}\u001b[39;00m\u001b[33m\"\u001b[39m,\n\u001b[32m 9\u001b[39m \n\u001b[32m (...)\u001b[39m\u001b[32m 28\u001b[39m \n\u001b[32m 29\u001b[39m ])\n\u001b[32m---> \u001b[39m\u001b[32m31\u001b[39m status = \u001b[43mrun_rl\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m)\u001b[49m\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/train.py:37\u001b[39m, in \u001b[36mrun_rl\u001b[39m\u001b[34m(cfg)\u001b[39m\n\u001b[32m 32\u001b[39m cfg, runner = make_runner(cfg)\n\u001b[32m 34\u001b[39m \u001b[38;5;66;03m# here we can register additional message or summary handlers\u001b[39;00m\n\u001b[32m 35\u001b[39m \u001b[38;5;66;03m# see sf_examples/dmlab/train_dmlab.py for example\u001b[39;00m\n\u001b[32m---> \u001b[39m\u001b[32m37\u001b[39m status = \u001b[43mrunner\u001b[49m\u001b[43m.\u001b[49m\u001b[43minit\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 38\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m status == ExperimentStatus.SUCCESS:\n\u001b[32m 39\u001b[39m status = runner.run()\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/runners/runner_parallel.py:21\u001b[39m, in \u001b[36mParallelRunner.init\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m 20\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34minit\u001b[39m(\u001b[38;5;28mself\u001b[39m) -> StatusCode:\n\u001b[32m---> \u001b[39m\u001b[32m21\u001b[39m status = \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m.\u001b[49m\u001b[43minit\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 22\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m status != ExperimentStatus.SUCCESS:\n\u001b[32m 23\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m status\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/runners/runner.py:557\u001b[39m, in \u001b[36mRunner.init\u001b[39m\u001b[34m(self)\u001b[39m\n\u001b[32m 554\u001b[39m \u001b[38;5;28mself\u001b[39m._save_cfg()\n\u001b[32m 555\u001b[39m save_git_diff(experiment_dir(\u001b[38;5;28mself\u001b[39m.cfg))\n\u001b[32m--> \u001b[39m\u001b[32m557\u001b[39m \u001b[38;5;28mself\u001b[39m.buffer_mgr = \u001b[43mBufferMgr\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43menv_info\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 559\u001b[39m \u001b[38;5;28mself\u001b[39m._observers_call(AlgoObserver.on_init, \u001b[38;5;28mself\u001b[39m)\n\u001b[32m 561\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m ExperimentStatus.SUCCESS\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:215\u001b[39m, in \u001b[36mBufferMgr.__init__\u001b[39m\u001b[34m(self, cfg, env_info)\u001b[39m\n\u001b[32m 208\u001b[39m num_buffers = \u001b[38;5;28mmax\u001b[39m(\n\u001b[32m 209\u001b[39m num_buffers,\n\u001b[32m 210\u001b[39m \u001b[38;5;28mself\u001b[39m.max_batches_to_accumulate * \u001b[38;5;28mself\u001b[39m.trajectories_per_training_iteration * cfg.num_policies,\n\u001b[32m 211\u001b[39m )\n\u001b[32m 213\u001b[39m \u001b[38;5;28mself\u001b[39m.traj_buffer_queues[device] = get_queue(cfg.serial_mode)\n\u001b[32m--> \u001b[39m\u001b[32m215\u001b[39m \u001b[38;5;28mself\u001b[39m.traj_tensors_torch[device] = \u001b[43malloc_trajectory_tensors\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 216\u001b[39m \u001b[43m \u001b[49m\u001b[43menv_info\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 217\u001b[39m \u001b[43m \u001b[49m\u001b[43mnum_buffers\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 218\u001b[39m \u001b[43m \u001b[49m\u001b[43mcfg\u001b[49m\u001b[43m.\u001b[49m\u001b[43mrollout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 219\u001b[39m \u001b[43m \u001b[49m\u001b[43mrnn_size\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 220\u001b[39m \u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 221\u001b[39m \u001b[43m \u001b[49m\u001b[43mshare\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 222\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 223\u001b[39m \u001b[38;5;28mself\u001b[39m.policy_output_tensors_torch[device], output_names, output_sizes = alloc_policy_output_tensors(\n\u001b[32m 224\u001b[39m cfg, env_info, rnn_size, device, share\n\u001b[32m 225\u001b[39m )\n\u001b[32m 226\u001b[39m \u001b[38;5;28mself\u001b[39m.output_names, \u001b[38;5;28mself\u001b[39m.output_sizes = output_names, output_sizes\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:91\u001b[39m, in \u001b[36malloc_trajectory_tensors\u001b[39m\u001b[34m(env_info, num_traj, rollout, rnn_size, device, share)\u001b[39m\n\u001b[32m 89\u001b[39m \u001b[38;5;66;03m# we need to allocate an extra rollout step here to calculate the value estimates for the last step\u001b[39;00m\n\u001b[32m 90\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m space_name, space \u001b[38;5;129;01min\u001b[39;00m obs_space.spaces.items():\n\u001b[32m---> \u001b[39m\u001b[32m91\u001b[39m tensors[\u001b[33m\"\u001b[39m\u001b[33mobs\u001b[39m\u001b[33m\"\u001b[39m][space_name] = \u001b[43minit_tensor\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43mnum_traj\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrollout\u001b[49m\u001b[43m \u001b[49m\u001b[43m+\u001b[49m\u001b[43m \u001b[49m\u001b[32;43m1\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mspace\u001b[49m\u001b[43m.\u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mspace\u001b[49m\u001b[43m.\u001b[49m\u001b[43mshape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mshare\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 92\u001b[39m tensors[\u001b[33m\"\u001b[39m\u001b[33mrnn_states\u001b[39m\u001b[33m\"\u001b[39m] = init_tensor([num_traj, rollout + \u001b[32m1\u001b[39m], torch.float32, [rnn_size], device, share)\n\u001b[32m 94\u001b[39m num_actions, num_action_distribution_parameters = action_info(env_info)\n",
+ "\u001b[36mFile \u001b[39m\u001b[32m~/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/utils/shared_buffers.py:43\u001b[39m, in \u001b[36minit_tensor\u001b[39m\u001b[34m(leading_dimensions, tensor_type, tensor_shape, device, share)\u001b[39m\n\u001b[32m 40\u001b[39m tensor_shape = [x \u001b[38;5;28;01mfor\u001b[39;00m x \u001b[38;5;129;01min\u001b[39;00m tensor_shape \u001b[38;5;28;01mif\u001b[39;00m x]\n\u001b[32m 42\u001b[39m final_shape = leading_dimensions + \u001b[38;5;28mlist\u001b[39m(tensor_shape)\n\u001b[32m---> \u001b[39m\u001b[32m43\u001b[39m t = \u001b[43mtorch\u001b[49m\u001b[43m.\u001b[49m\u001b[43mzeros\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfinal_shape\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[43mtensor_type\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 45\u001b[39m \u001b[38;5;66;03m# fill with magic values to make it easy to spot if we ever use unintialized data\u001b[39;00m\n\u001b[32m 46\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m t.is_floating_point():\n",
+ "\u001b[31mKeyboardInterrupt\u001b[39m: "
+ ]
+ }
+ ],
"source": [
"## Start the training, this should take around 15 minutes\n",
"register_vizdoom_components()\n",
@@ -370,7 +557,29 @@
"# The scenario we train on today is health gathering\n",
"# other scenarios include \"doom_basic\", \"doom_two_colors_easy\", \"doom_dm\", \"doom_dwango5\", \"doom_my_way_home\", \"doom_deadly_corridor\", \"doom_defend_the_center\", \"doom_defend_the_line\"\n",
"env = \"doom_health_gathering_supreme\"\n",
- "cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=8\", \"--num_envs_per_worker=4\", \"--train_for_env_steps=4000000\"])\n",
+ "cfg = parse_vizdoom_cfg(argv=[\n",
+ " f\"--env={env}\",\n",
+ "\n",
+ " # Parallelism / speed\n",
+ " \"--num_workers=10\", # more CPU workers if you have cores\n",
+ " \"--num_envs_per_worker=8\", # more envs per worker (GPU permitting)\n",
+ "\n",
+ " # Training length\n",
+ " \"--train_for_env_steps=30000000\", # 20M steps → better convergence\n",
+ "\n",
+ " # Rollouts\n",
+ " \"--rollout=64\", # longer rollouts = better advantage estimates\n",
+ "\n",
+ " # PPO / optimizer\n",
+ " \"--batch_size=16384\", # bigger batch for more stable updates\n",
+ " \"--learning_rate=0.0002\", # slightly higher than doom default\n",
+ " \"--ppo_clip_ratio=0.2\", # more conservative clipping\n",
+ "\n",
+ " # Model / memory\n",
+ " \"--recurrence=32\", # add LSTM memory (important for Doom)\n",
+ " \"--use_rnn=True\",\n",
+ "\n",
+ "])\n",
"\n",
"status = run_rl(cfg)"
]
@@ -386,11 +595,184 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "import numpy \n",
+ "torch.serialization.add_safe_globals([\n",
+ " numpy.core.multiarray.scalar,\n",
+ " numpy.dtype,\n",
+ " numpy.dtypes.Float64DType\n",
+ "])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
"metadata": {
"id": "MGSA4Kg5_i0j"
},
- "outputs": [],
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\u001b[33m[2025-08-29 19:09:28,003][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,004][15827] Overriding arg 'num_workers' with value 1 passed from command line\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,004][15827] Adding new argument 'no_render'=True that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,005][15827] Adding new argument 'save_video'=True that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,006][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,006][15827] Adding new argument 'video_name'=None that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,008][15827] Adding new argument 'push_to_hub'=False that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,008][15827] Adding new argument 'hf_repository'=None that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,009][15827] Adding new argument 'policy_index'=0 that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,010][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,011][15827] Adding new argument 'train_script'=None that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,011][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file!\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,012][15827] Using frameskip 1 and render_action_repeat=4 for evaluation\u001b[0m\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.num_agents to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.num_agents` for environment variables or `env.get_wrapper_attr('num_agents')` that will search the reminding wrappers.\u001b[0m\n",
+ " logger.warn(\n",
+ "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/gymnasium/core.py:311: UserWarning: \u001b[33mWARN: env.is_multiagent to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do `env.unwrapped.is_multiagent` for environment variables or `env.get_wrapper_attr('is_multiagent')` that will search the reminding wrappers.\u001b[0m\n",
+ " logger.warn(\n",
+ "\u001b[36m[2025-08-29 19:09:28,068][15827] RunningMeanStd input shape: (3, 72, 128)\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,070][15827] RunningMeanStd input shape: (1,)\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,078][15827] ConvEncoder: input_channels=3\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,110][15827] Conv encoder output size: 512\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,111][15827] Policy head output size: 512\u001b[0m\n",
+ "\u001b[33m[2025-08-29 19:09:28,147][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth...\u001b[0m\n",
+ "[W][05112.308343] pw.conf | [ conf.c: 1031 try_load_conf()] can't load config client-rt.conf: No such file or directory\n",
+ "[E][05112.308453] pw.conf | [ conf.c: 1060 pw_conf_load_conf_for_context()] can't load config client-rt.conf: No such file or directory\n",
+ "[ALSOFT] (EE) Failed to create PipeWire event context (errno: 2)\n",
+ "\u001b[36m[2025-08-29 19:09:28,678][15827] Num frames 100...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:28,901][15827] Num frames 200...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:29,082][15827] Num frames 300...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:29,271][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:29,273][15827] Avg episode reward: 3.840, avg true_objective: 3.840\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:29,303][15827] Num frames 400...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:29,488][15827] Num frames 500...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:29,669][15827] Num frames 600...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:29,866][15827] Num frames 700...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:30,068][15827] Num frames 800...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:30,279][15827] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:30,281][15827] Avg episode reward: 5.320, avg true_objective: 4.320\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:30,350][15827] Num frames 900...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:30,519][15827] Num frames 1000...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:30,721][15827] Num frames 1100...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:30,932][15827] Num frames 1200...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:31,091][15827] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:31,093][15827] Avg episode reward: 4.827, avg true_objective: 4.160\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:31,207][15827] Num frames 1300...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:31,409][15827] Num frames 1400...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:31,677][15827] Num frames 1500...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:31,882][15827] Num frames 1600...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:32,000][15827] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:32,002][15827] Avg episode reward: 4.580, avg true_objective: 4.080\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:32,154][15827] Num frames 1700...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:32,364][15827] Num frames 1800...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:32,596][15827] Avg episode rewards: #0: 4.176, true rewards: #0: 3.776\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:32,597][15827] Avg episode reward: 4.176, avg true_objective: 3.776\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:32,628][15827] Num frames 1900...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:32,853][15827] Num frames 2000...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:33,054][15827] Num frames 2100...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:33,264][15827] Num frames 2200...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:33,433][15827] Num frames 2300...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:33,602][15827] Avg episode rewards: #0: 4.393, true rewards: #0: 3.893\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:33,603][15827] Avg episode reward: 4.393, avg true_objective: 3.893\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:33,741][15827] Num frames 2400...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:33,951][15827] Num frames 2500...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:34,199][15827] Num frames 2600...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:34,376][15827] Num frames 2700...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:34,540][15827] Avg episode rewards: #0: 4.549, true rewards: #0: 3.977\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:34,541][15827] Avg episode reward: 4.549, avg true_objective: 3.977\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:34,566][15827] Num frames 2800...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:34,788][15827] Num frames 2900...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:34,990][15827] Num frames 3000...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,103][15827] Num frames 3100...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,292][15827] Num frames 3200...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:35,394][15827] Avg episode rewards: #0: 4.665, true rewards: #0: 4.040\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:35,396][15827] Avg episode reward: 4.665, avg true_objective: 4.040\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,502][15827] Num frames 3300...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,645][15827] Num frames 3400...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,752][15827] Num frames 3500...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:35,878][15827] Num frames 3600...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:35,951][15827] Avg episode rewards: #0: 4.573, true rewards: #0: 4.018\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:35,952][15827] Avg episode reward: 4.573, avg true_objective: 4.018\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:36,061][15827] Num frames 3700...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:36,168][15827] Num frames 3800...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:36,298][15827] Num frames 3900...\u001b[0m\n",
+ "\u001b[36m[2025-08-29 19:09:36,417][15827] Num frames 4000...\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:36,468][15827] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000\u001b[0m\n",
+ "\u001b[37m\u001b[1m[2025-08-29 19:09:36,469][15827] Avg episode reward: 4.500, avg true_objective: 4.000\u001b[0m\n",
+ "ffmpeg version 6.1.1-3ubuntu5 Copyright (c) 2000-2023 the FFmpeg developers\n",
+ " built with gcc 13 (Ubuntu 13.2.0-23ubuntu3)\n",
+ " configuration: --prefix=/usr --extra-version=3ubuntu5 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --disable-omx --enable-gnutls --enable-libaom --enable-libass --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libharfbuzz --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-openal --enable-opencl --enable-opengl --disable-sndio --enable-libvpl --disable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-ladspa --enable-libbluray --enable-libjack --enable-libpulse --enable-librabbitmq --enable-librist --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libx264 --enable-libzmq --enable-libzvbi --enable-lv2 --enable-sdl2 --enable-libplacebo --enable-librav1e --enable-pocketsphinx --enable-librsvg --enable-libjxl --enable-shared\n",
+ " libavutil 58. 29.100 / 58. 29.100\n",
+ " libavcodec 60. 31.102 / 60. 31.102\n",
+ " libavformat 60. 16.100 / 60. 16.100\n",
+ " libavdevice 60. 3.100 / 60. 3.100\n",
+ " libavfilter 9. 12.100 / 9. 12.100\n",
+ " libswscale 7. 5.100 / 7. 5.100\n",
+ " libswresample 4. 12.100 / 4. 12.100\n",
+ " libpostproc 57. 3.100 / 57. 3.100\n",
+ "Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/sf2_mique/replay.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2mp41\n",
+ " encoder : Lavf59.27.100\n",
+ " Duration: 00:01:54.57, start: 0.000000, bitrate: 1373 kb/s\n",
+ " Stream #0:0[0x1](und): Video: mpeg4 (Simple Profile) (mp4v / 0x7634706D), yuv420p, 240x180 [SAR 1:1 DAR 4:3], 1372 kb/s, 35 fps, 35 tbr, 17920 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ "Stream mapping:\n",
+ " Stream #0:0 -> #0:0 (mpeg4 (native) -> h264 (libx264))\n",
+ "Press [q] to stop, [?] for help\n",
+ "[libx264 @ 0x55ba4d6002c0] using SAR=1/1\n",
+ "[libx264 @ 0x55ba4d6002c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n",
+ "[libx264 @ 0x55ba4d6002c0] profile High, level 1.3, 4:2:0, 8-bit\n",
+ "[libx264 @ 0x55ba4d6002c0] 264 - core 164 r3108 31e19f9 - H.264/MPEG-4 AVC codec - Copyleft 2003-2023 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\n",
+ "Output #0, mp4, to '/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4':\n",
+ " Metadata:\n",
+ " major_brand : isom\n",
+ " minor_version : 512\n",
+ " compatible_brands: isomiso2mp41\n",
+ " encoder : Lavf60.16.100\n",
+ " Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 240x180 [SAR 1:1 DAR 4:3], q=2-31, 35 fps, 17920 tbn (default)\n",
+ " Metadata:\n",
+ " handler_name : VideoHandler\n",
+ " vendor_id : [0][0][0][0]\n",
+ " encoder : Lavc60.31.102 libx264\n",
+ " Side data:\n",
+ " cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\n",
+ "[out#0/mp4 @ 0x55ba4d5e8500] video:5518kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.778290%\n",
+ "frame= 4010 fps=1218 q=-1.0 Lsize= 5561kB time=00:01:54.48 bitrate= 397.9kbits/s speed=34.8x \n",
+ "[libx264 @ 0x55ba4d6002c0] frame I:27 Avg QP:22.74 size: 5791\n",
+ "[libx264 @ 0x55ba4d6002c0] frame P:1674 Avg QP:25.95 size: 1916\n",
+ "[libx264 @ 0x55ba4d6002c0] frame B:2309 Avg QP:28.24 size: 990\n",
+ "[libx264 @ 0x55ba4d6002c0] consecutive B-frames: 19.7% 7.2% 9.8% 63.2%\n",
+ "[libx264 @ 0x55ba4d6002c0] mb I I16..4: 13.1% 76.4% 10.4%\n",
+ "[libx264 @ 0x55ba4d6002c0] mb P I16..4: 2.7% 9.7% 3.2% P16..4: 41.7% 24.5% 10.6% 0.0% 0.0% skip: 7.7%\n",
+ "[libx264 @ 0x55ba4d6002c0] mb B I16..4: 0.2% 1.8% 1.2% B16..8: 45.8% 14.1% 3.3% direct: 6.6% skip:27.1% L0:51.0% L1:38.8% BI:10.2%\n",
+ "[libx264 @ 0x55ba4d6002c0] 8x8 transform intra:62.0% inter:65.3%\n",
+ "[libx264 @ 0x55ba4d6002c0] coded y,uvDC,uvAC intra: 60.9% 71.5% 40.1% inter: 35.6% 12.3% 2.4%\n",
+ "[libx264 @ 0x55ba4d6002c0] i16 v,h,dc,p: 62% 5% 32% 1%\n",
+ "[libx264 @ 0x55ba4d6002c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 9% 34% 4% 4% 4% 6% 5% 7%\n",
+ "[libx264 @ 0x55ba4d6002c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 66% 5% 10% 3% 4% 3% 4% 2% 3%\n",
+ "[libx264 @ 0x55ba4d6002c0] i8c dc,h,v,p: 56% 18% 23% 2%\n",
+ "[libx264 @ 0x55ba4d6002c0] Weighted P-Frames: Y:9.7% UV:0.7%\n",
+ "[libx264 @ 0x55ba4d6002c0] ref P L0: 61.8% 15.0% 13.9% 8.1% 1.2%\n",
+ "[libx264 @ 0x55ba4d6002c0] ref B L0: 85.0% 11.8% 3.2%\n",
+ "[libx264 @ 0x55ba4d6002c0] ref B L1: 95.5% 4.5%\n",
+ "[libx264 @ 0x55ba4d6002c0] kb/s:394.53\n",
+ "\u001b[36m[2025-08-29 19:09:41,440][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!\u001b[0m\n"
+ ]
+ }
+ ],
"source": [
"from sample_factory.enjoy import enjoy\n",
"cfg = parse_vizdoom_cfg(argv=[f\"--env={env}\", \"--num_workers=1\", \"--save_video\", \"--no_render\", \"--max_num_episodes=10\"], evaluation=True)\n",
@@ -417,7 +799,7 @@
"from base64 import b64encode\n",
"from IPython.display import HTML\n",
"\n",
- "mp4 = open('/content/train_dir/default_experiment/replay.mp4','rb').read()\n",
+ "mp4 = open('/train_dir/default_experiment/replay.mp4','rb').read()\n",
"data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
"HTML(\"\"\"\n",
"